Amazon S3 Plugins Configuration

Please ensure you have connected an Amazon S3 integration before continuing with this guide.

Once the integration is connected, enable the Datapoint schema discovery plugin to start scanning AWS. You can also schedule when to run the next scan.

Enable S3 Parquet schema discovery.

Once the scan is complete, select Browse Data Silo Schema to review and approve the discovered data points.

View discovered data points.

This is not available for Amazon S3, since the integration specifically handles unstructured data. After enabling Datapoint schema discovery, you can also enable Datapoint classification. When this plugin runs, it will read samples of data from the discovered dataPoints, then suggest data categories that you can tag them with.

Enable S3 Parquet Datapoint Classification.

In addition to Structured Discovery, you can also run a Data Silo Discovery in the Amazon S3 Integration in order to determine whether an additional S3 Parquet or S3 JSONL integration is recommended, by checking if the buckets contains any Parquet or JSON file.

Similar to the Structured Discovery, connect the integration first and then enable the Silo Discovery to start scanning for the additional S3 integrations.

Enable S3 Silo discovery.

To look at the results, you can click on the Triage Discovered Silos, which would display the integrations. From there, you can either add the Data Silo or reject it.

S3 Silo Discovery Result.