Kafka API

Instead of sending documents through the HTTP Ingester it is also possible to place messages with prepared content directly into the Kafka topic configured for extract requests. In this case the URI and content need to be prepared in advance.

The URI should be unique for the original document. It is recommended to use a SHA-256 hash of the document content itself as the local name. This ensures the same document will always have the same URI and will prevent duplication of documents in the Elasticsearch or OpenSearch index should the same document be ingested more than once.

{
  "uri": "https://example.org/ns#473287f8298dba7163a897908958f7c0eae733e25d2e027992ea2edc9bed2fa8",
  "filename": "example.pdf",
  "content": "VGhlIHF1aWNrIGJyb3duIGZveCBqdW1wcyBvdmVyIHRoZSBsYXp5IGRvZy4g"
}

The following Kafka headers are also expected:

Header name Description Required
Exec-Path The name of the service or application that created the message. No
Distribution-Id The distribution ID for the documents being uploaded. No
Policy-Information The EDH or IDH policy information for the documents being uploaded. No
Request-ID A unique ID for the request. No
Security-Label The default security label that applies to the documents being uploaded. Yes
Content-Type The MIME type of the document. No

Kafka header example:

{
  "Owner": "Platform Team",
  "Exec-Path": "smart-cache-documents-http-ingester",
  "Distribution-Id": "13bce3bf-7edb-4efb-a54f-574327458dd7",
  "Policy-Information": "{\"EDH\":{\"classification\":\"O\",\"permittedNats\":[\"GBR\"],\"permittedOrgs\":[\"Telicent\"],\"orGroups\":[],\"andGroups\":[]}}",
  "Request-ID": "33002c05-a04e-4915-9bf8-9fdf6e9096e6",
  "Security-Label": "test=true",
  "Content-Type": "application/pdf"
}

[EARLY DRAFT RELEASE] Copyright 2020-2025 Telicent Limited. All rights reserved