Link Distributions of Data to a Dataset
A Dataset within the Telicent CATALOG describes the general properties of a set of data. The physical set of data corresponding to that dataset ingested into the Telicent CORE platform is the ‘distribution’ of data.
In the previous version of the Telicent CATALOG the line between dataset and its distribution of data was blurred, with properties for both mixed together. All of these properties, for dataset and distribution of data, were brought into the CATALOG with the distribution of data.
The updated version of the Telicent CATALOG is now the focal point for managing the records for Datasets and their Distributions of Data. Both of these elements have distinct identities which need tying together in the Telicent CATALOG.
There are two different scenarios where datasets and their distributions can be tied together:
- A Dataset is added to the Telicent CATALOG prior to data being received
- Data is received prior to a Dataset being added to the Telicent CATALOG
Scenario 1: A Dataset is added to the Telicent CATALOG prior to data being received
This is the recommended way to manage Datasets in the Telicent CATALOG and their distributions of data.
- First, a Dataset is added to the Telicent CATALOG and the relevant detail is added, as described in the add new resource section, above.
- If a distribution of data for the dataset is available and ready to be ingested then the Distribution ID can be generated, or added, during this stage of adding a Dataset.
However, if the distribution of data will be received at some later point then the Distribution ID need not be generated, or added, when the Dataset is added to the Telicent CATALOG. Instead, the relevant Dataset can be edited and a Distribution ID generated, or added, at that point.
Scenario 2: Data is received prior to a Dataset being added to the Telicent CATALOG
Distributions of data can be ingested into the Telicent CORE platform prior to a dataset record being created in the Telicent CATALOG. Purely from an information governance point of view this is not recommended. It is better to have a good record of the dataset prior to distributions of data being ingested. In some cases, though, this may not be practical.
- First a unique distribution ID must be obtained, obviously the easiest way to do this is via the Telicent CATALOG as described above. But as this scenario is being followed this is unlikely to be possible.
- The distribution ID is added to the distribution of data when it is ingested. NB: At this point the distribution of data will be invisible in the Telicent CATALOG because it hasn’t been linked to a Dataset.
- To link the distribution of data to a Dataset, add a Dataset to the Telicent CATALOG as per the scenario, above.
- BUT rather than generating a Distribution ID, populate the Distribution ID field with the distribution ID used when the distribution of data was ingested.
In all these scenarios the key thing is that a Distribution ID is added to the distribution of data as a header when it is ingested.