Content Classification

Now that you have used Data Silo Discovery to discover data systems with personal data, you can use Content Classification to discover datapoints within those systems.

To classify this personal data, first you need to connect your data stores; there are three ways to do this:

Navigate to Data Mapping on the left side menu and click on Content Classification. To add a data silo for datapoints scanning, click “Add Data Silo".

You’ll now see a filtered list of data silos that are compatible with content classification. Add as many as you need by hovering over the data silos and selecting “Quick Add”. This will take you to the Integrations view under the platform’s Infrastructure section. Find and select the data silo you just added.

From the “Connection” tab in the data silo, click “Connect” and follow the connection instructions, such as entering your server, database, and login information.

Once connected, click on the “Configuration” tab and turn on the "Datapoint schema discovery", and “Datapoint classification” plugins.

From here, click back to “Data Mapping > Content Classification” to see the results of this data silo scan.

P.S. Alternatively, you can add and configure data silos one by one. Select your desired data silo, scroll down and click the “Add” button. Then, click “View Database” to open up the view for this specific integration. From here, follow the connection instructions and turn on the "Datapoint schema discovery" and “Datapoint classification" plugins, as before.

Follow a similar process to connect SaaS tools, like Salesforce. From Data Mapping > Content Classification, click “Add Database” and select the desired SaaS vendor.

As with your database connection, navigate to the specific vendor from Infrastructure > Integrations and follow connection instructions. You may be prompted to connect with OAuth or another authorization protocol.

Once connected, click on the “Configuration” tab and turn on the "Datapoint schema discovery" and “Datapoint classification" plugins.

After turning Datapoint Discovery for a specific data store, you can adjust how often Transcend runs this plugin to scan for datapoints. Navigate to the specific data store from Infrastructure > Integrations, and then change the frequency inputs and start time under Configuration, for the plugins.

Note: volumes scanned here are counted towards usage credits. If you are looking for your current scan volume, remaining allocation or want to adjust your plan, check with your Transcend account manager.

From here, clicking on “View Datapoints” will take you back into Data Mapping > Content Classification to a filtered view of the specific datapoints discovered from this data silo.

Transcend leverages machine learning techniques to quickly determine exactly where each and every personal datapoint lives within your individual data silo. With Content Classification, Transcend eliminates the need to derive queries for an internal database and maintain them through inevitable database schema changes.

We do this by prompting you to answer a series of simple questions related to the database's content. This trains our classifier and allows us to quickly learn your database schema and reliably detect the tables and rows that contain personal data.

From Data Mapping > Content Classification, navigate to a specific data silo, then click on the “Train” tab. Answer our prompted question, “Do these columns match the SUBCATEGORY|CATEGORY category?” by either:

  • Confirming individual columns with a check
  • Marking them inaccurate with an “X”
  • Selecting “?” if you are unsure

Once you have assessed each column, you can click the “Confirm Selections” button above. If you aren’t sure, you can click “Skip Category” to move onto the next training prompt.

As you answer these questions, Transcend will improve our classifier for the various data categories in your data silos.

Hover to the right of each column to see the reverse arrow and the breadcrumbs to that section of your data schema. For instance, here we see that the “Name” column comes from Transcend’s Redshift instance > the transcend_bi folder > account table.

The breadcrumbs you saw earlier can be traced through the “Browse” tab for each individual data silo. Here you can select the main folder and subfolders all the way down to a specific table.

The “Datapoints” tab lists out all datapoints in this data silo alongside their respective Data Category. This includes completed categories, those still in progress under the “Train” tab, and Unspecified categories. You can add notes and more information by clicking in the Description field and editing directly in line.

If you are still in the process of training Transcend on this data silo, you will see potential classification categories for each column alongside our confidence score for each category. We also include sample data below each column for you to reference.

Hover over each column to directly delete, add and edit categories. This will bypass the need for you to train Transcend on that specific table column.

Select “Filter” in the top right to filter datapoints by Data Category, Purpose of Processing or classification status.