Amazon S3 Plugins Configuration

Please ensure you have connected an Amazon S3 integration before continuing with this guide.

Once the integration is connected, enable the Datapoint schema discovery plugin to start scanning AWS. You can also schedule when to run the next scan.

Enable S3 Parquet schema discovery.

Once the scan is complete, select Browse Data Silo Schema to review and approve the discovered data points.

View discovered data points.

Unstructured data in S3 can be classified using our unstructured content classification system. Check out our full Unstructured Discovery guide for more information about how it works.

For structured data, particularly JSONL or Parquet, you can also enable Datapoint classification. When this plugin runs, it will read samples of data from the discovered dataPoints, then suggest data categories that you can tag them with. Check out our full Structured Discovery guide for more information about how it works.

Enable S3 Parquet Datapoint Classification.

In addition to Structured Discovery, you can also run a Data Silo Discovery in the Amazon S3 Integration in order to determine whether an additional S3 Parquet or S3 JSONL integration is recommended, by checking if the buckets contains any Parquet or JSON file.

Similar to the Structured Discovery, connect the integration first and then enable the Silo Discovery to start scanning for the additional S3 integrations.

Enable S3 Silo discovery.

To look at the results, you can click on the Triage Discovered Silos, which would display the integrations. From there, you can either add the Data Silo or reject it.

S3 Silo Discovery Result.