Database Integration

Transcend has the ability to integrate directly with your databases. This integration is built on top of the ODBC framework. This framework allows for Transcend to quickly integrate with any SQL-flavored database such as MySQL, PostgreSQL, Snowflake, Redshift, Aurora, BigQuery and many more.

Note: We also support Google BigQuery, but it has its own separate integration. The doc can be found here.

By connecting your database, Transcend can:

  • Scan your database to identify datapoints that contain personal information
  • Programmatically classify the data category and storage purpose of datapoints
  • Generate SQL statements that query or delete personal data
  • Define and execute DSRs directly against your database

Before connecting your database to Transcend, there are a handful of security considerations that you should make before connecting your database.

The best security practice for production databases is to avoid granting direct access to a database. For this reason, people often put a web application or a bastion host as a layer between your database and anything that queries the database. In order to do this with Transcend, we recommend self-hosting a security module we call Sombra. Sombra is a stateless gateway that can be deployed at the edge of your VPC. Instead of Transcend querying your database directly, Transcend queries Sombra, which in turn queries your database.

If you are not self-hosting Sombra, you can still connect a Database integration, however, you will need to expose your Database directly to the internet. See the following section on restricting database access by IP address.

Using network settings, Sombra can be restricted to the IP ranges of Transcend's backend, and you then grant the Sombra gateway network level access to query your database. This allows for you to keep your database within a private subnet and avoid∏ direct access.

Transcend end-to-end encrypts database credentials and all sensitive information used to connect to your databases. This ensures that our core backend servers never see your plaintext keys—you (the admin) and Sombra (the system connecting to your DB) are the only two parties that can access the credentials, and you can self-host Sombra.

Sombra manages access keys to your systems and has a KMS (key management system) built in. Optionally, you can delegate key management to another KMS (like AWS KMS) to manage keys on hardware security modules.

Your use cases and the Transcend products you use will inform which permissions you choose to grant to the Transcend user used in your database connection.

If you are inventorying your data with our Data Inventory product, you will need to grant the user access to read the schema at a minimum. Granting access to read the data will allow Transcend to sample the tables, which will enhance Structured Discovery.

If you are fulfilling data access requests with our DSR Automation product, you will need to grant read access to the tables you want to query. If you are fulfilling Data Erasure Requests or opt outs, the Transcend user will need write access to your tables.

Transcend can execute pre-approved SQL queries, inserting a user's identifiers as parameters. You (the admin) can approve these SQL queries if you can log in to Transcend with a third-party SSO provider. These statements are cryptographically signed by you and verified by Sombra. This prevents Transcend from being able to query the database with arbitrary queries, or tamper with the statements you've approved. Only queries approved in the Manage Datapoints view can be executed.

It is generally recommended to run deterministic SQL queries that map to a unique user by a field such as email, phone number or a user ID.

You can connect your database by using the Admin Dashboard or our Terraform Provider.

Navigate to Structured Discovery on the left side menu. To add a data silo for datapoints scanning, click “Add Data Silo".

You’ll now see a filtered list of data silos that are compatible with Structured Discovery. Add as many as you need by hovering over the data silos and selecting “Quick Add”. This will take you to the Integrations view under the platform’s Infrastructure section. Find and select the data silo you just added.

From the “Connection” tab in the data silo, click “Connect” and follow the connection instructions. You'll need the server name, database name, username, password and port to create an ODBC Connection string in the format of Server=myServerAddress;Database=myDataBase;Uid=myUsername;Pwd=myPassword;Port=myPort.

Please do not include the "Driver" parameter in the string. Note that by connecting your database to Transcend, users in Transcend may run queries with the permissions granted to the database user.

Once connected, click on the "Structured Discovery" tab and turn on the "Datapoint schema discovery", and “Datapoint classification” plugins.

From here, click "Browse Data Silo Schema" to see the results of this scan.

Alternatively, you can add and configure data silos one by one. Select your desired data silo, scroll down and click the “Add” button. Then, click “View Database” to open up the view for this specific integration. From here, follow the connection instructions and turn on the "Datapoint schema discovery" and “Datapoint classification" plugins, as before.

Database integrations can also be declaratively created and updated using Transcend's Terraform Provider. See the docs here and check out our blog post here.

Using the Terraform provider will ensure that database credentials are uploaded to and encrypted by Sombra. Transcend will store encrypted values of these keys but will not have access to your internal KMS (key management service) to decrypt these values.

If you are connecting to Snowflake, you'll need to provision the Transcend user with the appropriate permissions. At minimum, Transcend needs to be assigned a default_warehouse as well as a role that has permission to access that warehouse.

There is also an option to configure a Database Enricher, which allows the ability to populate a DSR by specifying a SQL query that will be run prior to the SQL query for the request.

  1. Go to the Identifiers section underneath DSR Automation
  2. Select the + button beside the Preflight & Identity Enrichment section, which will open up a sidebar.
  3. Select the Database Enricher prefligh check and enter all the necessary information, which includes:
  • Database Data Silo — Integration to run.

  • Title — Title of preflight check

  • Description — Description of preflight check

  • Input Identifier — Initial identifier that should trigger the preflight check. If the request does not include this identifier, the preflight check will not be triggered.

  • Output Identifier — The output identifier that will be mapped from the result of the SQL query

  • SQL Query — The SQL query to run.

    • This field only appear after filling in all the prior fields.
    • Query has to be of type SELECT. Ex. SELECT name FROM database WHERE email=?
    • The ? will be replaced with the Input Identifier value e.g. user@acme.com
  • Field to Identifier Mapping — This is where we will map the result of the SQL query to the output identifier.

    • This field only appears after filling a valid SELECT SQL query
    • Mapping will be column names from the SQL query to the output identifier.
    • Each output identifier can be mapped to many column names
    • If the column name is a function, make sure to add an AS clause to map the result of the function
    • For any AS clause, only the word after AS will be used for mapping. Ex. COUNT(name) AS total then only total will be used for mapping

Server=localhost;Database=example;Uid=admin;Pwd=testing;Port=5432;sslmode=disable;

You can use the Database integration to programmatically identify personal information in your Database and pull it into Transcend as datapoints. Those datapoints are assigned a data category and processing purpose to help you classify internal data. Additionally, you can assign custom tags to your datapoints to allow for further classification, tracking and reporting. In this way, Transcend helps you discover, classify and label data in a Database automatically to ensure your Data Inventory is current.

The Datapoint Schema Discovery plugin in the Database integration allows you to programmatically scan your database to identify the pieces of data in your DB and pull them into Transcend as datapoints and sub-datapoints. Once the datapoints are in Transcend, they can be classified, labeled, and configured for DSRs.

The plugin works by scanning the database information schemas to detect all tables and columns.

To enable the datapoint schema discovery plugin, navigate to the Structured Discovery tab within the Database integration and toggle the plugin on. From there, you'll be able to set the frequency for which the plugin will run to discover new datapoints and sub-datapoints as they are added to the database. Note: We recommend scheduling the plugin to run at times when the load on the database is lightest.

Transcend's Structured Discovery tool automatically classifies the data discovered in your database. By leveraging machine learning techniques, we can categorize and recommend the data category for each piece of data discovered. With Structured Discovery, Transcend helps you keep your Data Inventory up to date through inevitable database schema changes. Check out our full Structured Discovery guide for more information about how it works.

With the Transcend Database integration, you can fulfill DSRs directly against a Database by running SQL queries.

In Infrastructure / Integrations, select the database of interest. On the Manage Datapoints page, you will find the tables in the database as datapoints. Datapoints can be discovered automatically by enabling the Datapoint Schema Discovery plugin for the integration or by manually creating datapoints in this interface.

The property settings column in the Manage Datapoints interface will allow you to view the columns of each datapoint table and assign each column a category of personal data. This process of assigning categories to columns will enable Transcend to generate access and erasure SQL statements for you automatically.

In this view the Access Request Visibility toggle determines whether a SELECT * statement or SELECT <field> statement.

The Erasure Properties to Redact toggle determines whether the erasure request is a DELETE or UPDATE statement.

Categories can also be set in the Structured Discovery / Datapoints interface by editing the Data Categories column. If a datapoint is declared as Personal Identifier User ID, Contact Email, or Contact Phone, then the same autogeneration of SQL statements can be applied for various DSR types.

You also have the option of manually setting SQL queries for each table in the Prepared Statement column text fields. This can be used to facilitate more complex queries e.g. joining two tables for an enrichment.

SQL statements for facillitating DSR Automation can also be configured in code using Transcend's CLI.

The workflow involves:

  1. Using the tr-pull command to pull existing SQL statements down from your Transcend instance into a trancend.yml file.
yarn tr-pull --auth=$TRANSCEND_API_KEY --dataSiloIds=965b0eb2-b87d-464c-a849-571e31384001 --resources=dataSilos
  1. Edit the SQL commands in transcend.yml. See this example.
  2. Commit and push your changes using the tr-push command

To use this workflow, the required scopes for your Transcend API Key are:

  • Manage Data Inventory
  • Manage Data Subject Request Settings
  • Manage API Keys

Once you have a workflow for updating and publishing SQL statements using the CLI, you may want to automate this process using CI. Here is an example for how to accomplish this using a GitHub Action.

Instructions for manually submitting DSRs can be found in this documentation.

After a test request has been made, select your database in the Infrastructure / Integrations view and view Processed Requests to see a log of requests that interacted with your database. Selecting these will allow you to drill down on details pertaining to the the request. Requests can also be restarted from this view.