Structured Discovery

Structured Discovery continuously detects and classifies all objects and properties inside a given database.

Now that you have used Silo Discovery to discover data systems with personal data, you can use Structured Discovery to discover datapoints within those systems.

To classify this personal data, first you need to connect your data stores; there are three ways to do this:

Navigate to Structured Discovery on the left side menu. To add a data silo for datapoints scanning, click “Add Data Silo".

You’ll now see a filtered list of data silos that are compatible with Structured Discovery. Add as many as you need by hovering over the data silos and selecting “Quick Add”. This will take you to the Integrations view under the platform’s Infrastructure section. Find and select the data silo you just added.

From the “Connection” tab in the data silo, click “Connect” and follow the connection instructions (see database integration documentation), such as entering your server, database, and login information.

Once connected, click on the Structured Discovery tab and turn on the "Datapoint schema discovery", and “Datapoint classification” plugins.

From here, click back to Structured Discovery to see the results of this data silo scan.

P.S. Alternatively, you can add and configure data silos one by one. Select your desired data silo, scroll down and click the “Add” button. Then, click “View Database” to open up the view for this specific integration. From here, follow the connection instructions and turn on the "Datapoint schema discovery" and “Datapoint classification" plugins, as before.

Follow a similar process to connect SaaS tools, like Salesforce. From Structured Discovery, click “Add Database” and select the desired SaaS vendor.

As with your database connection, navigate to the specific vendor from Infrastructure > Integrations and follow connection instructions. You may be prompted to connect with OAuth or another authorization protocol.

Once connected, click on the Structured Discovery tab and turn on the "Datapoint schema discovery" and “Datapoint classification" plugins.

After turning Datapoint Schema Discovery on for a specific data store, you can adjust how often Transcend runs this plugin to scan for datapoints. Navigate to the specific data store from Infrastructure > Integrations, and then change the frequency inputs and start time under Structured Discovery.

Note: volumes scanned here are counted towards usage credits. If you are looking for your current scan volume, remaining allocation or want to adjust your plan, check with your Transcend account manager.

From here, clicking on “View Datapoints” will take you back into Structured Discovery to a filtered view of the specific datapoints discovered from this data silo.

We will continue to scan your data based on the frequency set in Infrastructure > Integrations. You can see the status of a current scan, scheduled scan, or future scan date in the Structured Discovery view.

The view you see above is the count of all objects of different types (differs per integration) found as part of the most recent scan run by Transcend on the data silo. This view is intended to provide you a progress indicator of how the scan Transcend is running is going.

The counts you see here may differ from the actual number of objects visible to you in the "Browse" view for the silo. This can be due to any number of reasons, some of which are: there is a change in permissions granted to Transcend for the data silo, or changes were made in the data silo's schema on your end.

You can click into the integration to see more details on the progress of the scan, as well as operational metrics around datapoints found, confirmed data categories, and progress on tables needing confirmation.

We allow users to define custom regexes to help with classification. This can be done by navigating to the "Inventory" tab in the "Data Inventory" view. Here you can add, edit, and delete custom regexes to help with classification for each data category.

The custom regexes you define here will be used to help with classification in Structured Discovery. If a column in your data silo matches a custom regex, it will be classified as the data category you have defined in the "Data Inventory" view.

The results will appear with the labeled method of classification as "Regex Matching" in the "Datapoints" view in Structured Discovery.

Transcend leverages machine learning techniques to quickly determine exactly where each and every personal datapoint lives within your individual data silo. With Structured Discovery, Transcend eliminates the need to derive queries for an internal database and maintain them through inevitable database schema changes.

We do this by prompting you to answer a series of simple questions related to the database's content. This trains our classifier and allows us to quickly learn your database schema and reliably detect the tables and rows that contain personal data.

Note: If you would like to try the newest classifier leveraging a Large Language Model (LLM), please check with your Transcend account manager.

From Structured Discovery, navigate to a specific data silo, then click on the “Train” tab. Answer our prompted question, Is the NAME datapoint a CATEGORY|SUBCATEGORY?” by either:

  • Confirming by clicking on the button "Confirm Category" or pressing c on the keyboard.
  • Selecting a different category from the dropdown, and then confirming it
  • Skipping the category by clicking on the button "Skip Category" or pressing s on the keyboard.

As you answer these questions, Transcend will improve our classifier for the various data categories in your data silos.

You can also confirm classifications in bulk for data points with the same name as the one presented. You can see all instances of datapoints with the same name by clicking on the “View all instances of 'name'” button, which will open a new tab with all instances of datapoints with the same name. You can then confirm the category for all instances of the datapoint by clicking on the “Bulk Confirm” button.

The breadcrumbs you saw earlier can be traced through the “Browse” tab for each individual data silo. Here you can select the main folder and subfolders all the way down to a specific table.

The “Datapoints” tab lists out all datapoints in this data silo alongside their respective Data Category. This includes completed categories, those still in progress under the “Train” tab, and Unspecified categories. You can add notes and more information by clicking in the Description field and editing directly in line.

If you are still in the process of training Transcend on this data silo, you will see potential classification categories for each column alongside our confidence score for each category. We also include sample data below each column for you to reference.

Hover over each column to directly delete, add and edit categories. This will bypass the need for you to train Transcend on that specific table column.

Select “Filter” in the top right to filter datapoints by Data Category, Purpose of Processing or classification status.