Data Security Labelling
A security label is a definition of the set of attributes that a user must have to access data and are an essential part of CORE’s Attribute-based Access Control (ABAC). Security labels can be generated from a policy object know as an Information Data Header (IDH).
More information on Security-Labels can be found in the Getting Started section.
In order to generate security labels with ease, we have developed a Python library called telicent-label-builder.
telicent-label-builder
Installation
pip install telicent-label-builder
Usage
telicent-label-builder
can be used to help build security label expressions which are added to a Kafka message using the Security-Label
header.
We can generate such label by first creating a IDH policy using the IDHModel
class. The details of all the arguments associated to a IDHModel
class and its nested elements can be found below:
IDH Model parameters
Parameter | Description | Type |
---|---|---|
apiVersion | The version of the policy format being used as a string. Currently the only version in use is v1alpha | str |
uuid | Universally Unique Identifier string for this policy | str |
creationDate | An ISO8601 timestamp representing when the policy was created | str |
containsPii | A boolean flag to indicate if the associated data contains Personally Identifiable Information (PII) | bool |
dataSource | Optional string description of data origin | str or None |
ownership | Metadata which defines the owner of the data, see Ownership Model table | OwnershipModel |
access | The access policy defining who can view the data, see Access Model table | AccessModel |
Ownership Model parameter
Parameter | Description | Type |
---|---|---|
originatingOrg | Name of organisation data has originated from. | str |
user | ID of user that owns the data (optional). | str or None |
Access Model parameter
Parameter | Description | Type |
---|---|---|
classification | Classification level of the data. | Literal["O", "OS", "S", "TS"] |
allowedNats | Set of nationalities a user can be, (at least 1), to access the data. Must be a list of ISO3166-1 Alpha-3 country codes. | List[str] |
allowedOrgs | Set of organisations which a user has to be at least assigned to one of, to access the data. | List[str] |
groups | Set of AND groups a user has to have-they must have all groups listed here to access the data | List[str] |
Example usage
Below, we demonstrate how you can use telicent-label-builder
to first create a IDH policy which in this example has the security access restrictions of:
- User has to be assigned to be able to access Secret (
S
) material or higher AND; - must be either British (
GBR
) or American (USA
) nationality AND; - must be assigned to the
Telicent
organisation AND; - must be assigned to the group
dataset:example_dataset
from telicent_labels import IDHModel
idh_as_dict = {
"apiVersion": "v1alpha",
"uuid": str(uuid.uuid4()),
"creationDate": datetime.now(UTC).isoformat(),
"containsPii": False,
"dataSource": "example_dataset",
"access": {
"classification": "S",
"allowedOrgs": ["Telicent"],
"allowedNats": ["GBR", "USA"],
"groups": ["dataset:example_dataset"],
},
"ownership": {"originatingOrg": "Telicent"},
}
idh_policy = IDHModel(**idh)
We can then serialize this policy into a security label string using .build_security_labels()
which is available on IDHModel
objects like so:
security_label = idh_policy.build_security_labels()
This gives us a security label that looks like this:
(classification=S&(permitted_organisations=Telicent)&(permitted_nationalities=GBR|permitted_nationalities=USA)&dataset:example_dataset:and)
We then use this string as the value for the Security-Label
key within the headers of Kafka messages we push to CORE. Below is a code snippet of what that might look like when used in an adapter.
def create_record(data):
return Record(
RecordUtils.to_headers(
{
"Content-Type": "text/csv",
"Data-Source": "example_dataset",
"Data-Producer": "example_producer",
"Security-Label": security_label
}
),
None,
data,
)
Complex Security Labelling
Security Labelling becomes more complex if specific parts of a data record requires different security labelling. The normal approach to such is to divide parts of a data record into two or more messages, where the separation is made based on the security labelling needs. This is straightforward for traditional data e.g. JSON, CSV etc.-but can be more complex for knowledge graphs/RDF. An example of how you might need to employ complex labelling in a knowledge graph can be an entity or event that has different security access requirements to the time and place that the event occurred at. The need for such can crop-up where you have multiple entities/events that reference the same common part like time or place, but which all those entities/events have different security labelling to one another.
The solution to this is by using multiple graphs in your Knowledge Mapper code. To understand this section, prior knowledge of Creating Knowledge is required.
A graph for each label
Both telicent-ies-tool and rdflib allow the instantiation of multiple graphs, which can be used as containers for portions of knowledge which require different security labelling. Below we provide an example of how you can instantiate two graphs for the purposes of labelling them differently, the first of which using telicent-ies-tool
and the second, using rdflib
.
import ies_tool.ies_tool as ies
from rdflib import Graph
# instantiate two separate graphs using telicent-ies-tool
graph_with_label_A = ies.IESTool()
graph_with_label_B = ies.IESTool()
# instantiate two separate graphs using rdflib
graph_with_label_A = Graph()
graph_with_label_B = Graph()
From then on, whether you are using telicent-ies-tool
or rdflib
, you then selectively add your triples to the appropriate graph. This is straightforwardly done by calling methods available on those graphs, but its worth noting with telicent-ies-tool
, if you are using base classes like ies.Person
, ies.Organisation
etc, you will need to specify the desired graph you intend the triples to be added to using the tool
argument. See the below examples:
# add person to graph_with_label_A
person = ies.Person(
tool = graph_with_label_A,
uri = f'{data_ns}person_001',
given_name="Anne",
surname="Smith"
)
person.add_related_object("http://ies.data.gov.uk/ontology/ies4#hasAccessTo", account_1)
# add organisation to graph_with_label_B
org = ies.Organisation(
tool = graph_with_label_B,
name = works_for,
uri = f'{data_ns}org_001',
)
...
Also, once an object has been instantiated with a given tool/graph, all methods called on that object will be added to that specified graph. For example, above, we have used the .add_related_object()
method on Person
which adds the associated triple to the graph_with_label_A
graph.
You can then return these graphs to your mapper ready for pushing to CORE. For example here we return our two graphs as a tuple:
# return both graphs
return (
graph_with_label_A.graph.serialize(format="turtle"),
graph_with_label_B.graph.serialize(format="turtle")
)
Pushing n-number of graphs to CORE
In the mapper, separate messages are needed for each graph-label pair. telicent-lib
can handle being given an array of messages to send to CORE, so you can do something like this in your mapper:
graph_with_label_A, graph_with_label_B = map_func(data)
messages = []
# create and append for the graph A data using security label A
messages.append(
Record(
RecordUtils.to_headers(
headers= {"Security-Label": security_label_A},
existing_headers = previous_headers
),
record.key,
graph_with_label_A,
None
)
)
# create and append for the graph B data using security label B
messages.append(
Record(
RecordUtils.to_headers(
headers= {"Security-Label": security_label_B},
existing_headers = previous_headers
),
record.key,
graph_with_label_B,
None
)
)
return messages