Scanning for AWS services with Silo Discovery

Transcend’s AWS integration automates the process of identifying data stores across your AWS cloud infrastructure—a process we call Silo Discovery. By programmatically scanning an AWS account for data storage services, we remove the manual process needed to identify databases, data warehouse and object storage systems across AWS accounts. From there, the data itself can be surfaced and classified for each system.

The integration is built to surface commonly-used AWS data storage services, including S3, DynamoDB, and RDS databases. The sections below go into more detail with specifics about how each service is identified and surfaced.

The AWS integration uses the ListBuckets method to identify whether any S3 buckets exist. This serves to validate whether your company uses S3 for object storage.

The AWS integration can scan across all regions, or one or more regions can be specified. If the integration discovers any buckets, an S3 data silo will be recommended for Data Inventory for each region that an S3 bucket is found on.

In addition, once an S3 bucket is added from the Data Inventory, the data silo would immediately run an S3 Silo Discovery, which will look for either a parquet or JSON file within buckets in the region specified. For each type of file found, an S3 Parquet or S3 JSONL data silo will be recommended for Data Inventory as well.

The AWS integration uses the ListTables method to identify DynamoDB tables in the AWS account. Because DynamoDB tables are region-specific, the integration scans every available AWS region for tables.

Once a DynamoDB table is encountered, Transcend will recommend the DynamoDB data silo in your Data Inventory. A silo will be recommended for each region that contains a DynamoDB table.

To illustrate an example, let’s look at the diagram below. There are three DynamoDB tables in the AWS account across two regions. The AWS integration's Silo Discovery plugin will recommend a DynamoDB data silo that corresponds to each region containing a table.

Diagram depicting the DynamoDB Integration scan.

The integration scans AWS for all instances of RDS (Amazon Relational Database Service) to surface all databases hosted in the account. The integration uses the DescribeDBInstances method to surface every instance of a database. Similar to DynamoDB, RDS is a region-specific service in AWS. This means that each RDS instance is specific to an AWS region. The integration scans across every region to ensure complete discovery.

For each database instance found, a database data silo will be recommended for Data Inventory. RDS supports several database engines, including MySQL, PostgreSQL, SQL Server, Oracle, Amazon Aurora, and MariaDB. Each recommended database data silo will correspond to the detected database engine. Let’s look at the example depicted below. There are three instances of RDS across two regions. In this case, the Silo Discovery plugin will recommend a database silo for each RDS instance encountered, regardless of what region it’s hosted in.

Diagram showing the RDS Integration scan process.

With scanning of the data stores complete and added to the Data Inventory, the next step is understanding what data is stored in each and the purpose of that data. To learn more about this process, view our article on Scanning AWS data stores with Structured Discovery.

Scanning for AWS services with Silo Discovery

Overview

S3

DynamoDB

RDS

Next Steps