AWS Integration

Transcend can continuously discover and classify your data stores hosted on AWS, as well as the data inside each store. This guide will help you connect your Amazon Web Services account to continously discover data stores hosted on AWS. Then we will show you how to enable content classification for each data store to discover and classify the data inside. Finally, we will show you how to configure privacy request automation for each data store.

Transcend’s AWS integration automates the process of identifying data stores across your AWS cloud infrastructure—a process we call data silo discovery. By programmatically scanning an AWS account for data storage services, we remove the manual process needed to identify databases, data warehouse and object storage systems across AWS accounts. From there, the data itself can be surfaced and classified for each system.

The integration is built to surface commonly-used AWS data storage services, including S3, DynamoDB, and RDS databases. The sections below go into more detail with specifics about how each service is identified and surfaced.

The AWS integration uses the ListBuckets method to identify whether any S3 buckets exist. This serves to validate whether your company uses S3 for object storage.

If the integration discovers any buckets, an S3 data silo will be recommended for Data Inventory. It’s worth noting that the integration can scan for buckets independently of a specified region. This means we will only recommend a single S3 data silo per AWS account.

The AWS integration uses the ListTables method to identify DynamoDB tables in the AWS account. Because DynamoDB tables are region-specific, the integration scans every available AWS region for tables.

Once a DynamoDB table is encountered, Transcend will recommend the DynamoDB data silo in your Data Inventory. A silo will be recommended for each region that contains a DynamoDB table.

To illustrate an example, let’s look at the diagram below. There are three DynamoDB tables in the AWS account across two regions. The AWS integration's silo discovery plugin will recommend a DynamoDB data silo that corresponds to each region containing a table.

The integration scans AWS for all instances of RDS (Amazon Relational Database Service) to surface all databases hosted in the account. The integration uses the DescribeDBInstances method to surface every instance of a database. Similar to DynamoDB, RDS is a region-specific service in AWS. This means that each RDS instance is specific to an AWS region. The integration scans across every region to ensure complete discovery.

For each database instance found, a database data silo will be recommended for Data Inventory. RDS supports several database engines, including MySQL, PostgreSQL, SQL Server, Oracle, Amazon Aurora, and MariaDB. Each recommended database data silo will correspond to the detected database engine. Let’s look at the example depicted below. There are three instances of RDS across two regions. In this case, the silo discovery plugin will recommend a database silo for each RDS instance encountered, regardless of what region it’s hosted in.

Once all of the AWS data stores & systems are identified and added to Data Inventory, the next step is understanding what data is stored in each and the purpose of that data.

In addition to the global AWS silo discovery integration, Transcend has separate integrations with each AWS service that gets discovered. Once the relevant data silos have been discovered, each one can be enabled for datapoint schema discovery & content classification.

The S3 integration supports a datapoint schema discovery plugin to programmatically create datapoints in Transcend that represent pieces of data in S3. When enabled, the plugin scans the AWS account to identify all S3 buckets using the listBuckets] command. Each bucket found is recommended as a datapoint, and a new datapoint is surfaced when new buckets are created. Each datapoint discovered is classified as to the type of data it represents to help customers identify which datapoints may contain personal information and which don’t.

Additionally, if you have S3 buckets that are holding Parquet files, you can use the S3 Parquet Integration to index the schemas of those parquet files as if you were indexing a proper SQL or no-SQL Database. When you configure and enable one S3 Parquet integration in Transcend, this will scan all buckets the credentials you entered has access to, and count as one integration.

The DynamoDB integration can be enabled for datapoint schema discovery as well. The integration will scan for all tables using the ListTables method and recommend each table as a datapoint, and get the attributes for each table using DescribeTable and surface them as sub-datapoints. The discovered sub-datapoints are classified through Content Classification. Remember that if the DynamoDB silo was discovered through the AWS integration scan, the silo is scoped to a single AWS region and not a single Dynamo Table.

Content Classification for the database integrations works similarly to the DynamoDB integration. Database data silos discovered & added during an AWS integration scan can be enabled for datapoint schema discovery and classification. Each table in the database will correspond to a recommended datapoint, and the columns on each table will correspond to sub-datapoints for the database silo. Discovered sub-datapoints are classified through Content Classification to help customers prioritize datapoints that may contain personal information.

More information about the database integration can be found here.

The integration is scoped to a single AWS account. If you have multiple AWS accounts to connect, please add a data silo for each account and follow the steps below.

Authenticating the AWS integration requires a new IAM user to be created in the AWS account to be connected. One of the benefits of using an IAM user to integrate AWS is the ability to designate only the specific permissions Transcends needs and define a custom trust policy. To create a new IAM role, login in to the AWS console and navigate to Roles → Create Role. More information about IAM roles can be found in the AWS documentation.

  1. Create a new role in the AWS IAM console.

    • Choose AWS account as the role type.
    • Select This account or enter the Account ID for the AWS account that will be connecting to Transcend.
  2. Select the role type and account.

    • In Select trusted entity, choose AWS account as the role type and select This account or enter the Account ID for the AWS account to be connected to Transcend.
  3. Add Transcend’s AWS Account ID to the role’s trust policy.

    • Next, select Custom trust policy and enter Transcend’s AWS Account ID (829095311197).
  4. Add the External ID to the trust policy.

    • Select Require external ID

    • Transcend auto-generates an external ID to be shared between Transcend and the customer. Copy/paste the External ID provided in the Transcend AWS data silo.

    • Including an External ID in the trust policy adds an additional level of security to the integration. It ensures that even with the correct IAM role, Transcend cannot access AWS resources without the external ID.

    • Click Next.

  5. Define the permissions for the role.

    • Select Create Policy and select the JSON tab.

    • Transcend has created a JSON policy with the permissions needed for the integration. Copy & paste this policy from Transcend's connection form under AWS IAM Role, or manually add the following permissions: dynamodb:ListTables, rds:DescribeDBInstances, s3:ListAllMyBuckets.

      • If your organization would like to use content classification for a DynamoDB database, please also include the additional permission dynamodb:DescribeTable.
      • If your organization would like to fulfill DynamoDB privacy requests through custom PartiQL queries, please also include some of the following permissions according to which PartiQL query you use. Likely, the mapping from action type to required permissions will be the following:
        • Access requests: dynamodb:PartiQLSelect
        • Erasure requests: dynamodb:PartiQLDelete
        • All other requests: dynamodb:PartiQLUpdate, and/or dynamodb:PartiQLInsert
    • Click Next.

  6. Add tags (Optional).

    • Adding tags is not required for the integration, but it won’t interfere either if tagging roles is part of an internal business process.
    • Click Next.
  7. Review and name the policy.

    • In Review policy, give the policy a distinct name (ex: AWSDataSiloDiscoveryPlugin or TranscendAWSIntegration).
    • Copy this name to Transcend's connection form input named AWS IAM Role.
    • Click Create Policy.
  8. Finally, enter your Account ID to Transcend's connection form input named AWS Account ID.

  9. Connect the integration.

Once the integration is connected, enable the silo discovery plugin to start scanning AWS.

Once the scan is complete, select View Data Inventory to review and approve the discovered AWS resources.

The AWS resources discovered by the plugin are available for review by selecting X Resources Found. From there, review each discovered AWS resource to decide if it should be approved.

Each discovered resource contains additional Metadata from AWS. This includes information like name, resourceID and the region of the resource, which can be helpful in understand which discovered resource corresponds to a specific configured data store.

To approve a recommended data silo, select Add to Data Inventory.

Once a resource has been approved, it’s added to Data Inventory. Data silos that are discovered & added through the AWS integration will inherit the IAM role and account ID used to authenticate the AWS integration. This allows Transcend to connect data silos for additional discovered resources like S3 automatically.

Once a discovered data silo has been approved and added to Data Inventory, it can be configured to further scan the individual resources to identify and classify information stored within. To enable content classification for a resource, simply navigate to the Configuration tab of desired data silo and enable the Datapoint Schema Discovery plugin.

The plugin works by scanning a resource or system to identify datapoints within the system and classify them. The example below shows a scan of a PostgreSQL database discovered by the AWS plugin. In this case the plugin scans the tables in the database, and recommends a datapoint for every column in each table.

With the Transcend DynamoDB integration, you can fullfil privacy requests directly against a DynamoDB database by running custom PartiQL queries for the desired data actions on each datapoint.

The first step to setting up privacy requests against a DynamoDB database is creating the datapoints in the data silo that should be queried. We typically recommend creating a datapoint for each table in the database that stores personal data (or any tables you want to action privacy requests against). For example, let's say there is a table called Conversations that contains all the messages sent back and forth from a customer. You could create a datapoint for Conversations in the data silo and enable the specific data actions needed. If you're using Transcend Data Mapping, you can enable the Datapoint Schema Discovery plugin to create the datapoints for you automatically.

For each data action enabled for a datapoint in the DynamoDb data silo, you can define a PartiQL query that will execute a database operation. Using the previous Conversations example, let's say you want to enable the datapoint to support access requests. With the “access” data action enabled, you can define a specific query that executes the request to find the Conversation for a user against the database.

For example, assuming the Conversation table has an email attribute, a custom query could be

SELECT * from "Conversation" where email = ?

When fulfilling a privacy request for a given user, Trascend will replace all ? characters in the query by the actual identifier of the user.

Important: as explained in the Connecting the integration section above, you must add the necessary permissions to the role you created, depending on which PartiQL operations you are executing.