Transcend has the ability to integrate directly with your databases. This integration is built on top of the framework. This framework allows for Transcend to quickly integrate with any SQL-flavored database such as MySQL, PostgreSQL, Snowflake, Redshift, Aurora, BigQuery and .
By connecting your database, Transcend can:
- Scan your database to identify datapoints that contain personal information
- Programmatically classify the data category and storage purpose of datapoints
- Generate SQL statements that query or delete personal data
- Define and execute DSRs directly against your database
Before connecting your database to Transcend, there are a handful of security considerations that you should make before connecting your database.
The best security practice for production databases is to avoid granting direct access to a database. For this reason, people often put a web application or a as a layer between your database and anything that queries the database. In order to do this with Transcend, we recommend self-hosting a security module we call . Sombra is a stateless gateway that can be deployed at the edge of your . Instead of Transcend querying your database directly, Transcend queries Sombra, which in turn queries your database.
If you are not self-hosting Sombra, you can still connect a Database integration, however, you will need to expose your Database directly to the internet. See the following section on restricting database access by IP address.
Using network settings, Sombra can be restricted to the IP ranges of , and you then grant the Sombra gateway network level access to query your database. This allows for you to keep your database within a private subnet and avoid∏ direct access.
Transcend end-to-end encrypts database credentials and all sensitive information used to connect to your databases. This ensures that our core backend servers never see your plaintext keys—you (the admin) and Sombra (the system connecting to your DB) are the only two parties that can access the credentials, and you can self-host Sombra.
Sombra manages access keys to your systems and has a KMS (key management system) built in. Optionally, you can delegate key management to another KMS (like AWS KMS) to manage keys on hardware security modules.
Your use cases and the Transcend products you use will inform which permissions you choose to grant to the Transcend user used in your database connection.
If you are inventorying your data with our Data Inventory product, you will need to grant the user access to read the schema at a minimum. Granting access to read the data will allow Transcend to sample the tables, which will enhance Structured Discovery.
If you are fulfilling data access requests with our DSR Automation product, you will need to grant read access to the tables you want to query. If you are fulfilling Data Erasure Requests or opt outs, the Transcend user will need write access to your tables.
Transcend can execute pre-approved SQL queries, inserting a user's identifiers as parameters. You (the admin) can approve these SQL queries if you can sign into Transcend with a third-party SSO provider. These statements are cryptographically signed by you and verified by Sombra. This prevents Transcend from being able to query the database with arbitrary queries, or tamper with the statements you've approved. Only queries approved in the Manage Datapoints view can be executed.
It is generally recommended to run deterministic SQL queries that map to a unique user by a field such as email, phone number or a user ID.
You can connect your database by using the Admin Dashboard or our Terraform Provider.
Navigate to Structured Discovery on the left side menu. To add a data silo for datapoints scanning, click “Add Data Silo".
You’ll now see a filtered list of data silos that are compatible with Structured Discovery. Add as many as you need by hovering over the data silos and selecting “Quick Add”. This will take you to the Integrations view under the platform’s Infrastructure section. Find and select the data silo you just added.
From the “Connection” tab in the data silo, click “Connect” and follow the connection instructions. You'll need the server name, database name, username, password and port to create an ODBC Connection string in the format of
Please do not include the "Driver" parameter in the string. Note that by connecting your database to Transcend, users in Transcend may run queries with the permissions granted to the database user.
Once connected, click on the “Configuration” tab and turn on the "Datapoint schema discovery", and “Datapoint classification” plugins.
From here, click back to “Structured Discovery” to see the results of this data silo scan.
Alternatively, you can add and configure data silos one by one. Select your desired data silo, scroll down and click the “Add” button. Then, click “View Database” to open up the view for this specific integration. From here, follow the connection instructions and turn on the "Datapoint schema discovery" and “Datapoint classification" plugins, as before.
Using the Terraform provider will ensure that database credentials are uploaded to and encrypted by Sombra. Transcend will store encrypted values of these keys but will not have access to your internal KMS (key management service) to decrypt these values.
If you are connecting to Snowflake, you'll need to provision the Transcend user with the appropriate permissions. At minimum, Transcend needs to be assigned a
default_warehouse as well as a role that has permission to access that warehouse.
There is also an option to configure a Database Enricher, which allows the ability to populate a DSR by specifying a SQL query that will be run prior to the SQL query for the request.
- Go to the
- Select the
+button beside the
Preflight & Identity Enrichmentsection, which will open up a sidebar.
- Select the
Database Enricherprefligh check and enter all the necessary information, which includes:
Database Data Silo— Integration to run.
Title— Title of preflight check
Description— Description of preflight check
Input Identifier— Initial identifier that should trigger the preflight check. If the request does not include this identifier, the preflight check will not be triggered.
Output Identifier— The output identifier that will be mapped from the result of the SQL query
SQL Query— The SQL query to run.
- This field only appear after filling in all the prior fields.
- Query has to be of type
SELECT name FROM database WHERE email=?
?will be replaced with the
Input Identifiervalue e.g.
Field to Identifier Mapping— This is where we will map the result of the SQL query to the output identifier.
- This field only appears after filling a valid
- Mapping will be column names from the SQL query to the output identifier.
- Each output identifier can be mapped to many column names
- If the column name is a function, make sure to add an
ASclause to map the result of the function
- For any
ASclause, only the word after
ASwill be used for mapping. Ex.
COUNT(name) AS totalthen only
totalwill be used for mapping
- This field only appears after filling a valid
You can use the Database integration to programmatically identify personal information in your Database and pull it into Transcend as datapoints. Those datapoints are assigned a data category and processing purpose to help you classify internal data. Additionally, you can assign custom tags to your datapoints to allow for further classification, tracking and reporting. In this way, Transcend helps you discover, classify and label data in a Database automatically to ensure your Data Inventory is current.
The Datapoint Schema Discovery plugin in the Database integration allows you to programmatically scan your database to identify the pieces of data in your DB and pull them into Transcend as datapoints and sub-datapoints. Once the datapoints are in Transcend, they can be classified, labeled, and configured for DSRs.
The plugin works by scanning the database information schemas to detect all tables and columns.
To enable the datapoint schema discovery plugin, navigate to the Configuration tab within the Database data silo and toggle the plugin on. From there, you'll be able to set the frequency for which the plugin will run to discover new datapoints and sub-datapoints as they are added to the database. Note: We recommend scheduling the plugin to run at times when the load on the database is lightest.
Transcend's Structured Discovery tool automatically classifies the data discovered in your database. By leveraging machine learning techniques, we can categorize and recommend the data category for each piece of data discovered. With Structured Discovery, Transcend helps you keep your Data Inventory up to date through inevitable database schema changes. Check out our full for more information about how it works.
With the Transcend Database integration, you can fulfill DSRs directly against a Database by running SQL queries.
Infrastructure / Integrations, select the database of interest. On the
Manage Datapoints page, you will find the tables in the database as datapoints. Datapoints can be discovered automatically by enabling the Datapoint Schema Discovery plugin for the integration or by manually creating datapoints in this interface.
The property settings column in the
Manage Datapoints interface will allow you to view the columns of each datapoint table and assign each column a category of personal data. This process of assigning categories to columns will enable Transcend to generate access and erasure SQL statements for you automatically.
In this view the
Access Request Visibility toggle determines whether a
SELECT * statement or
SELECT <field> statement.
Erasure Properties to Redact toggle determines whether the erasure request is a
Categories can also be set in the
Structured Discovery / Datapoints interface by editing the
Data Categories column. If a datapoint is declared as
Personal Identifier User ID,
Contact Email, or
Contact Phone, then the same autogeneration of SQL statements can be applied for various DSR types.
You also have the option of manually setting SQL queries for each table in the
Prepared Statement column text fields. This can be used to facilitate more complex queries e.g. joining two tables for an enrichment.
The workflow involves:
- Using the
tr-pullcommand to pull existing SQL statements down from your Transcend instance into a
yarn tr-pull --auth=$TRANSCEND_API_KEY --dataSiloIds=965b0eb2-b87d-464c-a849-571e31384001 --resources=dataSilos
To use this workflow, the required scopes for your Transcend API Key are:
Manage Data Inventory
Manage Data Subject Request Settings
Manage API Keys
After a test request has been made, select your database in the
Infrastructure / Integrations view and view
Processed Requests to see a log of requests that interacted with your database. Selecting these will allow you to drill down on details pertaining to the the request. Requests can also be restarted from this view.