Kafka API
Instead of sending documents through the HTTP Ingester it is also possible to place messages with prepared content directly into the Kafka topic configured for extract requests. In this case the URI and content need to be prepared in advance.
The URI should be unique for the original document. It is recommended to use a SHA-256 hash of the document content itself as the local name. This ensures the same document will always have the same URI and will prevent duplication of documents in the Elasticsearch or OpenSearch index should the same document be ingested more than once.
{
"uri": "https://example.org/ns#473287f8298dba7163a897908958f7c0eae733e25d2e027992ea2edc9bed2fa8",
"filename": "example.pdf",
"content": "VGhlIHF1aWNrIGJyb3duIGZveCBqdW1wcyBvdmVyIHRoZSBsYXp5IGRvZy4g"
}
The following Kafka headers are also expected:
Header name | Description | Required |
---|---|---|
Exec-Path | The name of the service or application that created the message. | No |
Distribution-Id | The distribution ID for the documents being uploaded. | No |
Policy-Information | The EDH or IDH policy information for the documents being uploaded. | No |
Request-ID | A unique ID for the request. | No |
Security-Label | The default security label that applies to the documents being uploaded. | Yes |
Content-Type | The MIME type of the document. | No |
Kafka header example:
{
"Owner": "Platform Team",
"Exec-Path": "smart-cache-documents-http-ingester",
"Distribution-Id": "13bce3bf-7edb-4efb-a54f-574327458dd7",
"Policy-Information": "{\"EDH\":{\"classification\":\"O\",\"permittedNats\":[\"GBR\"],\"permittedOrgs\":[\"Telicent\"],\"orGroups\":[],\"andGroups\":[]}}",
"Request-ID": "33002c05-a04e-4915-9bf8-9fdf6e9096e6",
"Security-Label": "test=true",
"Content-Type": "application/pdf"
}