Unstructured Discovery

Unstructured Discovery uncovers and classifies personal data living in non-conventional data models, for comprehensive governance.

Unstructured Discovery identifies and maps personal data within unstructured data stores. Unstructured data stores are those that don't have a predefined schema or order, such as file systems. This feature operates through a two-step process, whereby we first scan and discover the files in your unstructured data store, and then scan and classify the content of those files.

You can use Unstructured Discovery to discover datapoints within those systems.

To classify this personal data, first you need to connect your data stores; there are three ways to do this:

From Unstructured Discovery, click “Add Data Silo" and select the desired Data Silo.

Navigate to the specific vendor from Infrastructure > Integrations and follow connection instructions. You may be prompted to connect with OAuth or another authorization protocol.

Once connected, click on the Unstructured Discovery tab and turn on the "Datapoint schema discovery" and “Datapoint classification" plugins.

After turning Datapoint Schema Discovery on for a specific data store, you can adjust how often Transcend runs this plugin to scan for datapoints. Navigate to the specific data store from Infrastructure > Integrations, and then change the frequency inputs and start time under Unstructured Discovery.

After turning it on, you can see the status of the scan and the last time it ran. You can also manually trigger a scan by clicking the “Run Now” button.

Note: volumes scanned here are counted towards usage credits. If you are looking for your current scan volume, remaining allocation or want to adjust your plan, check with your Transcend account manager.

From here, you can navigate to the Unstructured Discovery tab to see the files that have been discovered and classified.

In the classification phase, Unstructured Discovery classifies the data inside each file with data categories (the same data categories found in your Data Inventory). The system can identify a wide range of personal data, providing a detailed overview of the personal data within the data store.

  1. What is the potential cost of this scan?

    • We enumerate the entire filesystem you choose to scan. The cost will largely depend on how expensive accessing the filesystem is for you + Sombra hosting costs if you are using self-hosted Sombra.
  2. What file types are supported?

    • PDFs and text-based files like CSVs, TSVs, TXTs, and JSON.
  3. Is is scanning all data?

    • It scans all files, but only up to the first 50MB of any given file.