Structured DiscoveryStructured Discovery continuously detects and classifies all objects and properties inside a given database.
Now that you have used Silo Discovery to discover data systems with personal data, you can use Structured Discovery to discover datapoints within those systems.
To classify this personal data, first you need to connect your data stores; there are three ways to do this:
Navigate to Structured Discovery on the left side menu. To add a data silo for datapoints scanning, click “Add Data Silo".
You’ll now see a filtered list of data silos that are compatible with Structured Discovery. Add as many as you need by hovering over the data silos and selecting “Quick Add”. This will take you to the Integrations view under the platform’s Infrastructure section. Find and select the data silo you just added.
Once connected, click on the “Configuration” tab and turn on the "Datapoint schema discovery", and “Datapoint classification” plugins.
From here, click back to Structured Discovery to see the results of this data silo scan.
P.S. Alternatively, you can add and configure data silos one by one. Select your desired data silo, scroll down and click the “Add” button. Then, click “View Database” to open up the view for this specific integration. From here, follow the connection instructions and turn on the "Datapoint schema discovery" and “Datapoint classification" plugins, as before.
Follow a similar process to connect SaaS tools, like Salesforce. From Structured Discovery, click “Add Database” and select the desired SaaS vendor.
As with your database connection, navigate to the specific vendor from Infrastructure > Integrations and follow connection instructions. You may be prompted to connect with OAuth or another authorization protocol.
Once connected, click on the “Configuration” tab and turn on the "Datapoint schema discovery" and “Datapoint classification" plugins.
After turning Datapoint Schema Discovery on for a specific data store, you can adjust how often Transcend runs this plugin to scan for datapoints. Navigate to the specific data store from Infrastructure > Integrations, and then change the frequency inputs and start time under Configuration, for the plugins.
Note: volumes scanned here are counted towards usage credits. If you are looking for your current scan volume, remaining allocation or want to adjust your plan, check with your Transcend account manager.
From here, clicking on “View Datapoints” will take you back into Structured Discovery to a filtered view of the specific datapoints discovered from this data silo.
We will continue to scan your data based on the frequency set in Infrastructure > Integrations. You can see the status of a current scan, scheduled scan, or future scan date in the Structured Discovery view.
The view you see above is the count of all objects of different types (differs per integration) found as part of the most recent scan run by Transcend on the data silo. This view is intended to provide you a progress indicator of how the scan Transcend is running is going.
The counts you see here may differ from the actual number of objects visible to you in the "Browse" view for the silo. This can be due to any number of reasons, some of which are: there is a change in permissions granted to Transcend for the data silo, or changes were made in the data silo's schema on your end.
You can click into the integration to see more details on the progress of the scan, as well as operational metrics around datapoints found, confirmed data categories, and progress on tables needing confirmation.
Transcend leverages machine learning techniques to quickly determine exactly where each and every personal datapoint lives within your individual data silo. With Structured Discovery, Transcend eliminates the need to derive queries for an internal database and maintain them through inevitable database schema changes.
We do this by prompting you to answer a series of simple questions related to the database's content. This trains our classifier and allows us to quickly learn your database schema and reliably detect the tables and rows that contain personal data.
From Structured Discovery, navigate to a specific data silo, then click on the “Train” tab. Answer our prompted question, “Do these columns match the SUBCATEGORY|CATEGORY category?” by either:
- Confirming individual columns with a check
- Marking them inaccurate with an “X”
- Selecting “?” if you are unsure
Once you have assessed each column, you can click the “Confirm Selections” button above. If you aren’t sure, you can click “Skip Category” to move onto the next training prompt.
As you answer these questions, Transcend will improve our classifier for the various data categories in your data silos.
Hover to the right of each column to see the reverse arrow and the breadcrumbs to that section of your data schema. For instance, here we see that the “Name” column comes from Transcend’s Redshift instance > the transcend_bi folder > account table.
The breadcrumbs you saw earlier can be traced through the “Browse” tab for each individual data silo. Here you can select the main folder and subfolders all the way down to a specific table.
The “Datapoints” tab lists out all datapoints in this data silo alongside their respective Data Category. This includes completed categories, those still in progress under the “Train” tab, and Unspecified categories. You can add notes and more information by clicking in the Description field and editing directly in line.
If you are still in the process of training Transcend on this data silo, you will see potential classification categories for each column alongside our confidence score for each category. We also include sample data below each column for you to reference.
Hover over each column to directly delete, add and edit categories. This will bypass the need for you to train Transcend on that specific table column.
Select “Filter” in the top right to filter datapoints by Data Category, Purpose of Processing or classification status.