Google Cloud Platform and BigQuery Integration
Transcend's Google Cloud Platform (GCP) integration scans GCP projects to identify data storage systems that may contain personal data. Transcend also supports a DSR integration for Google's BigQuery database, alongside scanning your database to discover and classify data. This guide provides an overview on how the integrations work as well as detailed setup instructions.
Transcend’s GCP integration automates the process of identifying data stores across Google cloud infrastructure. It includes things like BigQuery, CloudSQL, Cloud Storage, etc.
For each service discovered, the integration will recommend a data silo representing the service. It's probable that more than one data silo will be recommended for the same service if it's used in multiple projects. For example, if two BigQuery instances are used in two different projects, the integration will recommend two BigQuery data silos. In this way a silo is recommended for each distinct data store.
The integration is authenticated with a Service User created for a dedicated GCP Project. Using a service account to connect the integration is a more secure option for this integration, as it allows for sensitive permissions to be assigned without giving a person user the same permissions. Additionally, it doesn't count as a user seat in the Google Organization. Continue to the next section for additional details about authentication and setting up the integration.
Transcend uses a client credentials method to connect to your organization's Google Cloud Platform projects. There are a few steps involved to generate credentials specific to your Google organization.
If a project was previously created for another Transcend Google integration, there's no need to create another project. Feel free to use the existing project.
Create a service account. Transcend recommends creating a dedicated service account to connect this integration, even if another service user has been configured for another Transcend integration. Creating a service user with limited scope for each integration reduces the risk of superpowered accounts.
Navigate to the "IAM & Admin" tab for the desired project and select "Service Accounts" > select Create Service Account. Give the service account a name you'll remember, for example, "transcend-integration".
- Make note of the email address associated with this service account — you'll need it to grant access to the GCP projects
Generate a private key. A set of public-private key pairs for this account is needed to be used in the Transcend Connection form. You can create the key by:
- Visiting the "Key" tab in the service account's settings page and selecting Add Key. Make sure to select JSON as the key type.
- This will download a key file to your computer. You will need the JSON key file during the connection phase for the integration - Transcend only supports key files generated in the JSON format.
Grant permissions. Give the newly created service user access to GCP projects you want scanned or, if using Transcend's BigQuery integration, the BigQuery project you want scanned and classified.
- For GCP Roles:
- Create a custom role with resourcemanager.projects.get, servicemanagement.services.bind, and serviceusage.services.list permissions.
- For BigQuery Roles:
- For each project, navigate to the
IAMsection and select + GRANT ACCESS to add a user for the project.
- Enter the email address of the the service account and assign it the appropriate roles for GCP or BigQuery
- Save the permissions and repeat for each additional project desired.
- For GCP Roles:
BigQuery has many predefined roles that can be used in order to fit your many needs, such as:
BigQuery Admin— Grants full admin access
BigQuery Data Viewer— Grants read-only access
BigQuery Data Editor— Grants read-only + write access
BigQuery Data Owner— Grants full access
BigQuery Job User- Grants access to query jobs
- Navigate to the
Rolessection, which can be found under
IAM & Admin
- Edit all the necessary information (title, description, etc.)
- Add the permissions that you want the account to have. You can find the common permissions
BigQuery Job User— In order to create queries with the BigQuery API
BigQuery Data Viewer— To have access to read BigQuery datasets and tables, to enable schema discovery and classification, and access-based privacy requests
At minimum, these are the permissions required:
Note: Privacy requests that require modifying data, will require the
BigQuery Data Editor role instead of the
BigQuery Data Viewer role, to allow both read and write access
To complete authentication for the integration, navigate back to the Transcend dashboard and enter the following fields in the integration connection form:
- Service Account's JSON Key File
- Google Cloud Project ID
- Enter the project ID that contains your BigQuery Database
- Service Account's JSON Key File
Once the integration is authenticated, navigate to the Configuration tab and enable the data silo discovery plugin to programmatically discover the GCP resources used across projects in your organization's account. The plugin is specifically looking for data storage systems like databases, data warehouses and object/file storage systems.
Once the scan is complete, select View Data Inventory to review and approve the discovered GCP resources.
The discovered resources are available for review by selecting X Resources Found. From there, review each service to decide if it should be approved as a data silo. Resources can be configured for content classification and privacy requests after they have been approved.
Once a discovered data silo has been approved and added to Data Inventory, it can be configured to further scan the individual resources to identify and classify information stored within. This is particularly valuable for databases and data storage systems, where Content Classification can programmatically identify datapoints, provide classification recommendations and identify personal data. To enable content classification for a resource, simply navigate to the Configuration tab of desired data silo and enable the Datapoint Schema Discovery plugin.