Unstructured Discovery

Unstructured Discovery uncovers and classifies personal data living in non-conventional data models, for comprehensive governance.

Unstructured Discovery identifies and maps personal data within unstructured data stores. Unstructured data stores are those that don't have a predefined schema or order, such as file systems. This feature operates through a two-step process, whereby we first scan and discover the files in your unstructured data store, and then scan and classify the content of those files.

In the discovery phase, we scan through an unstructured data store and identify the files stored inside. This process uses a combination of sampling and pattern recognition techniques to flag potential data of interest.

In the classification phase, Unstructured Discovery classifies the data inside each file with data categories (the same data categories found in your Data Inventory). The system can identify a wide range of personal data, providing a detailed overview of the personal data within the data store.

  1. What is the potential cost of this scan?

    • We enumerate the entire filesystem you choose to scan. The cost will largely depend on how expensive accessing the filesystem is for you + Sombra hosting costs if you are using self-hosted Sombra.
  2. What file types are supported?

    • To start, PDFs and text-based files like CSVs, TSVs, TXTs, and JSON. We plan to rapidly expand support for a variety of unstructured file types.
  3. Is is scanning all data?

    • It scans all files, but only up to the first 50MB of any given file. We plan to make this configurable, such that you can more or less of each file (including the entire file).