API to Write to Data Inventory

This guide runs through the API calls and developer tooling that can be used to push data into your Data Inventory from other sources.

Some useful resources to review before this guide:

  • Check out this article to understand some potential motivations for using this API.
  • Check out this article to understand the tabs and concepts in the Data Inventory like "Data Silos", "Vendors", "Data Categories", "Processing Purposes, "Identifiers", "Data Subjects" and more.
  • Check out this tab to learn more about Developer Tools.

In general, there are 2 strategies for pushing data into the Data Inventory:

  1. Transcend CLI - This command line interface allows for you to generate an API key, pull configuration down from Transcend, modify that file locally, then push data back into Transcend. This avoids the need for you to implement pagination or understand the Transcend API shapes to push and pull data. This is the simplest way to push data.
  2. Transcend GraphQL API - Everything that you can do in the Transcend Admin Dashboard can also be done using this API. This allows for building highly specific queries and deeper to push data into Transcend. This is best for internal dashboards or higher-volume read & write activity.

Using the Transcend CLI is a great way to export your data from Transcend Data Inventory without needing to write code.

To start:

  1. Install the CLI using npm or yarn.
  2. Generate an API key in the Transcend Dashboard. You will need to assing the API key scopes. These scopes will vary depending on which resource types you intend to push and pull.
  3. Run the CLI command in your terminal to export your Data Inventory to a yml file.

You can see all examples of usage here as well as some example transcend.yml files here.

Some common example:

If you have multiple Transcend instances (e.g. a Production instance and a Sandbox instance), you can use the CLI to sync configuration between two accounts. You will need to generate an API for the source account to pull from TRANSCEND_API_KEY_SOURCE and an API key for the destination account TRANSCEND_API_KEY_DESTINATION

tr-pull --auth=$TRANSCEND_API_KEY_SOURCE --resources=dataSilos,vendors,identifiers --file=./transcend.yml
tr-push --auth=$TRANSCEND_API_KEY_DESTINATION --file=./transcend.yml

Whatever resources are defined in the Transcend file are pushed up on the tr-push command. If necessary, you can modify the transcend.yml file manually or programmaticaly in between pulling and pushing the configuration across instance. The CLI is designed to generally only upsert data based on unique, human readable identifiers which makes it great for syncing data between accounts in a safe manner.

Some resource types have extra arguments that can be explored here. Some examples include:

Auto-classify the Data Flow service if not specified:

tr-pull --auth=$TRANSCEND_API_KEY_SOURCE --resources=dataFlows
tr-push --auth=$TRANSCEND_API_KEY_DESTINATION --resources=dataFlows --classifyService=true

Custom Fields (also know as attributes in our APIS), have an extra flag to delete any attribute values not specified rather than defaulting to only adding new values to the existing customer field.

tr-pull --auth=$TRANSCEND_API_KEY_SOURCE --resources=attributes
tr-push --auth=$TRANSCEND_API_KEY_DESTINATION --resources=attributes --deleteExtraAttributeValues=true

In some situations, you may wish to define parts of your Transcend configuration in code. You can view a variety of examples in the CLI GitHub repository.

One common example of this would be if you wanted to define some Data Silos metadata, with their respective DSR Automation configuration in the codebase:

# yaml-language-server: $schema=https://raw.githubusercontent.com/transcend-io/cli/main/transcend-yml-schema-v4.json

# Manage at: https://app.transcend.io/infrastructure/api-keys
# See https://docs.transcend.io/docs/authentication
# Define API keys that may be shared across data silos
# in the data map. When creating new data silos through the yaml
# cli, it is possible to specify which API key should be associated
# with the newly created data silo.
api-keys:
  - title: Webhook Key
  - title: Analytics Key

# Manage at: https://app.transcend.io/privacy-requests/email-templates
# See https://docs.transcend.io/docs/privacy-requests/configuring-requests/email-templates
# Define email templates here.
templates:
  - title: Your Data Report is Ready
  - title: We Love Data Rights And You Do Too

# Manage at: https://app.transcend.io/privacy-requests/identifiers
# See https://docs.transcend.io/docs/identity-enrichment
# Define enricher or pre-flight check webhooks that will be executed
# prior to privacy request workflows. Some examples may include:
#   - identity enrichment: look up additional identifiers for that user.
#                          i.e. map an email address to a user ID
#   - fraud check: auto-cancel requests if the user is flagged for fraudulent behavior
#   - customer check: auto-cancel request for some custom business criteria
enrichers:
  - title: Basic Identity Enrichment
    description: Enrich an email address to the userId and phone number
    url: https://example.acme.com/transcend-enrichment-webhook
    input-identifier: email
    output-identifiers:
      - userId
      - phone
      - myUniqueIdentifier
  - title: Fraud Check
    description: Ensure the email address is not marked as fraudulent
    url: https://example.acme.com/transcend-fraud-check
    input-identifier: email
    output-identifiers:
      - email
    # Only call the webhook on certain action types
    privacy-actions:
      - ERASURE
  # Note: description is an optional field
  - title: Analytics Enrichment
    url: https://analytics.acme.com/transcend-enrichment-webhook
    input-identifier: userId
    output-identifiers:
      - advertisingId

# Manage at: https://app.transcend.io/privacy-requests/connected-services
# See https://docs.transcend.io/docs/the-data-map#data-silos
# Define the data silos in your data map. A data silo can be a database,
# or a web service that may use a collection of different data stores under the hood.
data-silos:
  # Note: title is the only required top-level field for a data silo
  - title: Redshift Data Warehouse
    description: The mega-warehouse that contains a copy over all SQL backed databases
    # The webhook URL to notify for data privacy requests
    url: https://example.acme.com/transcend-webhook
    headers:
      - name: test
        value: value
        isSecret: true
      - name: dummy
        value: dummy
        isSecret: false
    integrationName: server
    # The title of the API key that will be used to respond to privacy requests
    api-key-title: Webhook Key
    # Specify which data subjects may have personally-identifiable-information (PII) within this system
    # This field can be omitted, and the default assumption will be that the system may potentially
    # contain PII for any potential data subject type.
    data-subjects:
      - customer
      - employee
      - newsletter-subscriber
      - b2b-contact
    # When this data silo implements a privacy request, these are the identifiers
    # that should be looked up within this system.
    identity-keys:
      - email
      - userId
    # When a data erasure request is being performed, this data silo should not be deleted from
    # until all of the following data silos were deleted first. This list can contain other internal
    # systems defined in this file, as well as any of the SaaS tools connected in your Transcend instance.
    deletion-dependencies:
      - Identity Service
    # The email addresses of the employees within your company that are the go-to individuals
    # for managing this data silo
    owners:
      - alice@transcend.io
    teams:
      - Developer
      - Request Management
    # Datapoints are the different types of data models that existing within your data silo.
    # If the data silo is a database, these would be your tables.
    # Note: These are currently called "datapoints" in the Transcend UI and documentation.
    # See: https://docs.transcend.io/docs/the-data-map#datapoints
    datapoints:
      - title: Webhook Notification
        key: _global
        # The types of privacy actions that this webhook can implement
        # See "RequestActionObjectResolver": https://github.com/transcend-io/privacy-types/blob/main/src/actions.ts
        #   AUTOMATED_DECISION_MAKING_OPT_OUT: Opt out of automated decision making
        #   CONTACT_OPT_OUT: Opt out of all communication
        #   SALE_OPT_OUT: Opt-out of the sale of personal data
        #   TRACKING_OPT_OUT: Opt out of tracking
        #   ACCESS: Data Download request
        #   ERASURE: Erase the profile from the system
        #   ACCOUNT_DELETION: Run an account deletion instead of a fully compliant deletion
        #   RECTIFICATION: Make an update to an inaccurate record
        #   RESTRICTION: Restrict processing
        privacy-actions:
          - ACCESS
          - ERASURE
          - SALE_OPT_OUT
      - title: User Model
        description: The centralized user model user
        # AKA table name
        key: users
        owners:
          - test@transcend.io
        teams:
          - Platform
          - Customer Experience
        # The types of privacy actions that this datapoint can implement
        # See "InternalDataSiloObjectResolver": https://github.com/transcend-io/privacy-types/blob/main/src/actions.ts
        #   ACCESS: Data Download request
        privacy-actions:
          - ACCESS
        # Provide field-level metadata for this datapoint.
        fields:
          - key: firstName
            title: First Name
            description: The first name of the user, inputted during onboarding
            # The category of personal data for this datapoint
            # See: https://github.com/transcend-io/privacy-types/blob/main/src/objects.ts
            #   FINANCIAL: Financial information
            #   HEALTH: Health information
            #   CONTACT: Contact information
            #   LOCATION: Geo-location information
            #   DEMOGRAPHIC: Demographic Information
            #   ID: Identifiers that uniquely identify a person
            #   ONLINE_ACTIVITY: The user's online activities on the first party website/app or other websites/apps
            #   USER_PROFILE: The user’s profile on the first-party website/app and its contents
            #   SOCIAL_MEDIA: User profile and data from a social media website/app or other third party service
            #   CONNECTION: Connection information for the current browsing session, e.g. device IDs, MAC addresses, IP addresses, etc.
            #   TRACKING: Cookies and tracking elements
            #   DEVICE: Computer or device information
            #   SURVEY: Any data that is collected through surveys
            #   OTHER: A specific type of information not covered by the above categories
            #   UNSPECIFIED: The type of information is not explicitly stated or unclear
            categories:
              - category: USER_PROFILE
                name: Name
            # What is the purpose of processing for this datapoint/table?
            # See: https://github.com/transcend-io/privacy-types/blob/main/src/objects.ts
            #   ESSENTIAL: Provide a service that the user explicitly requests and that is part of the product's basic service or functionality
            #   ADDITIONAL_FUNCTIONALITY: Provide a service that the user explicitly requests but that is not a necessary part of the product's basic service
            #   ADVERTISING: To show ads that are either targeted to the specific user or not targeted
            #   MARKETING: To contact the user to offer products, services, or other promotions
            #   ANALYTICS: For understanding the product’s audience, improving the product, inform company strategy, or general research
            #   PERSONALIZATION: For providing user with a personalized experience
            #   OPERATION_SECURITY: For product operation and security, enforcement of terms of service, fraud prevention, protecting users and property, etc.
            #   LEGAL: For compliance with legal obligations
            #   TRANSFER: For data that was transferred as part of a change in circumstance (e.g. a merger or acquisition)
            #   SALE: For selling the data to third parties
            #   HR: For personnel training, recruitment, payroll, management, etc.
            #   OTHER: Other specific purpose not covered above
            #   UNSPECIFIED: The purpose is not explicitly stated or is unclear
            purposes:
              - purpose: PERSONALIZATION
                name: Other
          - key: email
            title: Email
            description: The email address of the user
            categories:
              - category: CONTACT
                name: Email
              - category: USER_PROFILE
                name: Name
            purposes:
              - purpose: ESSENTIAL
                name: Login
              - purpose: OPERATION_SECURITY
                name: Other
      - title: Demo Request
        key: demos
        description: A demo request by someone that does not have a formal account
        privacy-actions:
          - ACCESS
        fields:
          - key: companyName
            title: Company Name
            description: The name of the company of the person requesting the demo
            categories:
              - category: CONTACT
                name: Other
            purposes:
              - purpose: ESSENTIAL
                name: Demo
          - key: email
            title: Email
            description: The email address to use for coordinating the demo
            categories:
              - category: CONTACT
                name: Email
            purposes:
              - purpose: ESSENTIAL
                name: Demo
      - title: Password
        key: passwords
        description: A password for a user
        fields:
          - key: hash
            title: Password Hash
            description: Hash of the password
            categories:
              - category: OTHER
                name: Password
            purposes:
              - purpose: OPERATION_SECURITY
                name: Other
          - key: email
            title: Email
            description: The email address to use for coordinating the demo
            categories:
              - category: CONTACT
                name: Email
            purposes:
              - purpose: OPERATION_SECURITY
                name: Other

  # Defined web services in addition or instead of databases.
  # The examples below define a subset of the fields.
  # The remainder of the fields will be managed through the Transcend admin dashboard
  # at the link: https://app.transcend.io/privacy-requests/connected-services
  - title: Identity Service
    description: Micro-service that stores user metadata
    integrationName: server
    # Share the API key between services
    api-key-title: Webhook Key
    teams:
      - Developer
    datapoints:
      - title: Webhook Notification
        key: _global
        privacy-actions:
          - ACCESS
          - CONTACT_OPT_OUT
      - title: Auth User
        description: The centralized authentication object for a user
        key: auth
        privacy-actions:
          - ACCESS
  - title: Analytics Service
    integrationName: server
    data-subjects:
      - customer
    # The title of the API key that will be used to respond to privacy requests
    api-key-title: Analytics Key
    # Specify this flag if the data silo is under development and should not be included
    # in production privacy request workflows. Will still sync metadata to app.transcend.io.
    disabled: true

Since the CLI is able to do an upsert, you would define only the configuration that you care to control in your codebase in this file, and you can leave all other fields blank that you wish to manage in the Admin Dashboard.

You can then setup a CI job that would run sync this file into your Transcend instance after changes go through code review and merge to your base development branch.

Below is an example of how to set this up using a GitHub action:

name: Transcend Data Map Syncing
# See https://app.transcend.io/privacy-requests/connected-services

on:
  push:
    branches:
      - 'main'

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Node.js
        uses: actions/setup-node@v2
        with:
          node-version: '16'

      - name: Install Transcend cli
        run: npm i -D @transcend-io/cli

      # If you have a script that generates your transcend.yml file from
      # an ORM or infrastructure configuration, add that step here
      # Leave this step commented out if you want to manage your transcend.yml manually
      # - name: Generate transcend.yml
      #   run: ./scripts/generate_transcend_yml.py

      - name: Push Transcend config
        run: npx tr-push --auth=${{ secrets.TRANSCEND_API_KEY }}

If you want to sync the same configuration into multiple Transcend instances, you may find the need to make minor modifications to your configurations between environments. The most notable difference would be the domain where your webhook URLs are hosted on which may vary between staging and production.

The tr-push command takes in a parameter variables. This is a CSV of key:value pairs.

tr-push --auth=$TRANSCEND_API_KEY --variables=domain:acme.com,stage:staging

This command could would insert the variables into your YAML file wherever you use the <<parameters.param_name>> syntax.

api-keys:
  - title: Webhook Key
enrichers:
  - title: Basic Identity Enrichment
    description: Enrich an email address to the userId and phone number
    # The data silo webhook URL is the same in each environment,
    # except for the base domain in the webhook URL.
    url: https://example.<<parameters.domain>>/transcend-enrichment-webhook
    input-identifier: email
    output-identifiers:
    - userId
    - phone
    - myUniqueIdentifier
  - title: Fraud Check
    description: Ensure the email address is not marked as fraudulent
    url: https://example.<<parameters.domain>>/transcend-fraud-check
    input-identifier: email
    output-identifiers: - email
    privacy-actions: - ERASURE
    data-silos:
  - title: Redshift Data Warehouse
    integrationName: server
    description: The mega-warehouse that contains a copy over all SQL backed databases - <<parameters.stage>>
    url: https://example.<<parameters.domain>>/transcend-webhook
    api-key-title: Webhook Key

Oftentimes, your development team may define database definitions in code, often in application-specific tools called Database ORMs. Some common JavaScript examples include Sequelize and Prisma, other Python examples include SQL Alchemy and Alembic. You may even have your own general abstraction layer over your database definitions such that you have the ability to read in metadata about your database in your application code.

Database ORM Extraction

To achieve this, you will likely need to write a custom script that can extract the necessary information from you codebase. Below are some examples in JavaScript, however you can likely copy these scripts into ChatGPT and provide further promting to create a script that is custom to your use case and in the language of your choice.

A simple example of this in JavaScript may look like:

import { readFileSync, writeFileSync } from 'fs';
import yaml from 'js-yaml';
import { join } from 'path';

import type { TranscendInput } from '@transcend-io/cli';

import { MAIN_ROOT } from '@main/node-utils';

import { MODELS } from './models';

/**
 * Read in the contents of a yaml file and validate that the shape
 * of the yaml file matches the codec API
 *
 * This function can also be imported from `@transcend-io/cli` with extra runtime type validation.
 *
 * @param filePath - Path to yaml file
 * @returns The contents of the yaml file, type-checked
 */
function readTranscendYaml(filePath: string): TranscendInput {
  // Read in contents
  const fileContents = readFileSync(filePath, 'utf-8');

  // Validate shape
  return yaml.load(fileContents);
}

/**
 * Write a Transcend configuration to disk
 *
 * This function can also be imported from `@transcend-io/cli` with extra runtime type validation.
 *
 * @param filePath - Path to yaml file
 * @param input - The input to write out
 */
function writeTranscendYaml(filePath: string, input: TranscendInput): void {
  writeFileSync(filePath, yaml.dump(input));
}

/**
 * This function generates our `transcend.yml` that is then synced into the Transcend Data Inventory
 *
 * @see https://app.transcend.io/data-map/data-inventory/data-silos
 *
 * The function extracts metadata about our database tables and columns, reads in the transcend.yml file
 * then writes the updated metadata back to the transcend.yml.
 *
 * This file is then pushed up using the `tr-push` command
 * @see https://github.com/transcend-io/cli?tab=readme-ov-file#tr-push
 */
async function generateTranscendYml() {
  const transcendYml = join(MAIN_ROOT, 'transcend.yml');

  // Generate trancsend.yml
  const transcendYmlDefinition = readTranscendYaml(transcendYml);
  const dataSilos = transcendYmlDefinition['data-silos'] || [];
  const titles = dataSilos.map(({ title }) => title);
  const postgresDbIndex = titles.indexOf('Postgres RDS Database');
  if (postgresDbIndex < 0) {
    throw new Error(`Failed to find database definitions in transcend.yml`);
  }
  const postgres = dataSilos[postgresDbIndex];
  postgres.disabled = true;

  postgres.datapoints = Object.entries(MODELS).map(([modelName, model]) => ({
    title: modelName,
    key: model.key,
    description: model.comment,
    fields: Object.entries(model.schema).map(([fieldName, field]) => ({
      key: fieldName,
      title: field.displayName,
    })),
  }));
  console.success(`Found ${postgres.datapoints.length} datapoints in Postgres`);

  // Write out transcend.yml file
  writeTranscendYaml(transcendYml, transcendYmlDefinition);
}

generateTranscendYml();

A more complex example may involve regular express parsing on the contents of a file, using external libraries like doctrine to parse documentation formats, or mapping multiple databases into your transcend.yml file.

An example of this includes:

import doctrine from 'doctrine';
import type { Schema } from 'dynamoose/dist/Schema';
import { existsSync, readFileSync, writeFileSync } from 'fs';
import glob from 'glob';
import yaml from 'js-yaml';
import difference from 'lodash/difference';
import { join } from 'path';

import type { TranscendInput } from '@transcend-io/cli';

import { DYNAMO_MODEL_DEFINITIONS, MIGRATION_MODEL_MAP, MAIN_ROOT, SEEDERS_MODEL_MAP } from './models';
import { MIGRATION_MODEL_MAP } from '@main/migration-helpers';
import { MAIN_ROOT, readFile } from '@main/node-utils';
import { SEEDERS_MODEL_MAP } from '@main/seeders';

const POSTGRES_DATABASE_MODELS = {
  ...SEEDERS_MODEL_MAP,
  ...MIGRATION_MODEL_MAP,
};

const GET_CLASS_COMMENT = /(\/\*\*[\s\S]*\*\/)\nexport class/;

/**
 * Index of database metadata
 */
type IndexedDatabaseModels = {
  [modelName in string]: {
    /** Name of package where model lives */
    packageName: string;
    /** Path to definition file */
    definitionFilePath: string;
    /** Path to model definition */
    modelFilePath: string;
    /** Database table description */
    description: string;
    /** Dynamo model definition */
    dynamoModel?: unknown;
    /** Postgres model definition */
    postgresModel?: unknown;
  };
};

/**
 * Read in the contents of a yaml file and validate that the shape
 * of the yaml file matches the codec API
 *
 * This function can also be imported from `@transcend-io/cli` with extra runtime type validation.
 *
 * @param filePath - Path to yaml file
 * @returns The contents of the yaml file, type-checked
 */
function readTranscendYaml(filePath: string): TranscendInput {
  // Read in contents
  const fileContents = readFileSync(filePath, 'utf-8');

  // Validate shape
  return yaml.load(fileContents);
}

/**
 * Write a Transcend configuration to disk
 *
 * This function can also be imported from `@transcend-io/cli` with extra runtime type validation.
 *
 * @param filePath - Path to yaml file
 * @param input - The input to write out
 */
function writeTranscendYaml(filePath: string, input: TranscendInput): void {
  writeFileSync(filePath, yaml.dump(input));
}

/**
 * Index all database models found in the codebase
 * Assumes that each database definition is defined in a file
 * named definition.ts and file named <ModelName>.ts
 * Postgres models using wildebeest should also have a Base.ts file
 *
 * @returns The map of model name to metadata about that model
 */
function indexDatabaseModels(): IndexedDatabaseModels {
  // Find all database model definitions
  const definitionFiles = glob.sync(
    `${MAIN_ROOT}/backend-support/**/definition.ts`,
  );
  return definitionFiles.reduce((acc, fileName) => {
    const splitFile = fileName.split('/');
    const srcIndex = splitFile.indexOf('src');
    if (srcIndex < 1) {
      throw new Error(`Could not identify package for model: ${fileName}`);
    }
    const packageName = splitFile[srcIndex - 1];
    const modelName = splitFile[splitFile.length - 2];

    // Grab model definitions
    const postgresModel =
      POSTGRES_DATABASE_MODELS[
        modelName as keyof typeof POSTGRES_DATABASE_MODELS
      ];
    const dynamoModel =
      DYNAMO_MODEL_DEFINITIONS[
        modelName as keyof typeof DYNAMO_MODEL_DEFINITIONS
      ];
    if (postgresModel && dynamoModel) {
      throw new Error(
        `Found a postgres and dynamodb model with the same name -- please make globally unique: ${modelName}`,
      );
    }

    const modelDefinition = fileName.replace(
      '/definition.ts',
      `/${modelName}.ts`,
    );
    let description = '';
    if (!existsSync(modelDefinition)) {
      console.error(`Expected a model to be defined at: ${modelDefinition}`);
    } else {
      const fileContents = readFile(modelDefinition);
      const comment = GET_CLASS_COMMENT.exec(fileContents);
      if (!comment) {
        throw new Error(
          `Failed to extract class comment in file: ${modelDefinition}`,
        );
      }
      const ast = doctrine.parse(
        [`/**${comment[1].split('/**').pop()}`].join('\n'),
        {
          unwrap: true,
        },
      );
      description = ast.description;
    }

    if (postgresModel && !postgresModel.definition) {
      const detectedGql =
        (postgresModel as any).fields && (postgresModel as any).comment;
      throw new Error(
        `Model ${modelName} missing definition file, ${
          detectedGql
            ? 'but detected GQL schema props.'
            : 'did you import the correct file?'
        } Ensure you are importing the Wildebeest model into model maps and not the GQL types${
          detectedGql ? '' : ' or dynamoDb models'
        }`,
      );
    }

    return Object.assign(acc, {
      [modelName]: {
        packageName,
        definitionFilePath: fileName,
        modelFilePath: modelDefinition,
        dynamoModel,
        description,
        postgresModel,
      },
    });
  }, {} as IndexedDatabaseModels);
}

/**
 * This function generates our `transcend.yml` that is then synced into the Transcend Data Inventory
 *
 * @see https://app.transcend.io/data-map/data-inventory/data-silos
 *
 * The function extracts metadata about our database tables and columns, reads in the transcend.yml file
 * then writes the updated metadata back to the transcend.yml.
 *
 * This file is then pushed up using the `tr-push` command
 * @see https://github.com/transcend-io/cli?tab=readme-ov-file#tr-push
 */
async function generateTranscendYml() {
  const transcendYml = join(MAIN_ROOT, 'transcend.yml');

  // Mapping of modelName -> metadata about that model
  const modelNameToBaseLocation = indexDatabaseModels();

  // Ensure we found definition.ts files for each model
  const allModelNames = [
    ...Object.keys(DYNAMO_MODEL_DEFINITIONS),
    ...Object.keys(POSTGRES_DATABASE_MODELS),
  ];
  const missingModels = difference(
    allModelNames,
    Object.keys(modelNameToBaseLocation),
  );
  const extraModels = difference(
    Object.keys(modelNameToBaseLocation),
    allModelNames,
  );
  if (extraModels.length > 0 || missingModels.length > 0) {
    console.log({
      missingModels,
      extraModels,
    });
    throw new Error(
      `Database models are not all accounted for in the generate_mixins file!`,
    );
  }

  // Generate trancsend.yml
  const transcendYmlDefinition = readTranscendYaml(transcendYml);
  const dataSilos = transcendYmlDefinition['data-silos'] || [];
  const titles = dataSilos.map(({ title }) => title);
  const postgresDbIndex = titles.indexOf('Postgres RDS Database');
  const dynamoDbIndex = titles.indexOf('Dynamo DB Tables');
  if (postgresDbIndex < 0 || dynamoDbIndex < 0) {
    throw new Error(`Failed to find database definitions in transcend.yml`);
  }
  const postgres = dataSilos[postgresDbIndex];
  postgres.disabled = true;
  const dynamo = dataSilos[dynamoDbIndex];
  dynamo.disabled = true;

  postgres.datapoints = Object.keys(POSTGRES_DATABASE_MODELS).map(
    (modelName) => {
      try {
        const { packageName, description, postgresModel } =
          modelNameToBaseLocation[modelName];

        if (!postgresModel) {
          throw new Error(`Could not find postgres model: ${modelName}`);
        }
        const schema = Object.keys({
          ...(postgresModel.definition.attributes || {}),
          ...Object.entries(
            (postgresModel.definition.associations || {}).belongsTo || {},
          ).reduce(
            (acc, [key, value]) =>
              Object.assign(acc, {
                [typeof value?.foreignKey === 'string'
                  ? value.foreignKey
                  : value?.foreignKey?.name || `${key}Id`]: true,
              }),
            {},
          ),
        });
        return {
          title: modelName,
          key: `${packageName}.${modelName}`,
          description,
          fields:
            schema.length > 0
              ? schema.map((attribute) => ({
                  key: attribute,
                  title: attribute,
                }))
              : undefined,
        };
      } catch (e) {
        console.error(
          `Error occurred while generating mixins for ${modelName}`,
        );
        throw e;
      }
    },
  );
  console.success(`Found ${postgres.datapoints.length} datapoints in Postgres`);

  dynamo.datapoints = Object.keys(DYNAMO_MODEL_DEFINITIONS).map((modelName) => {
    const { packageName, description, dynamoModel } =
      modelNameToBaseLocation[modelName];

    if (!dynamoModel) {
      throw new Error(`Could not find dynamo model: ${modelName}`);
    }

    const schema = Object.keys((dynamoModel as Schema).schemaObject);
    const fields =
      schema.length > 0
        ? schema.map((fieldName) => ({
            key: fieldName,
            title: fieldName,
          }))
        : undefined;

    return {
      title: modelName,
      key: `${packageName}.${modelName}`,
      description,
      fields,
    };
  });
  console.success(`Found ${dynamo.datapoints.length} datapoints in Dynamo`);

  // Write out transcend.yml file
  writeTranscendYaml(transcendYml, transcendYmlDefinition);
}

generateTranscendYml();

Once you've written your script, you can pull this all together with a CI job that runs whenever code is merged into the base branch:

name: Transcend Data Map Syncing
# See https://app.transcend.io/privacy-requests/connected-services

on:
  push:
    branches:
      - 'main'

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Node.js
        uses: actions/setup-node@v2
        with:
          node-version: '16'

      - name: Install Transcend cli
        run: npm i -D @transcend-io/cli

      # Generate transcend.yml
      - name: Generate transcend.yml
        run: ./scripts/generate_transcend_yml.py

      - name: Push Transcend config
        run: npx tr-push --auth=${{ secrets.TRANSCEND_API_KEY }}

Another common example you may encounter is the desire to programmatically tag data in your Data Inventory. You may want to do this by:

  1. Writing some code that transforms your transcend.yml file using the desired auto-tagging rules
  2. Defining a configuration file (e.g. JSON or YML) that encodes your rules, with a custom script that combines your configuration with your transcend.yml configuration.

Step 1) Pull down your transcend.yml for the data silos that you want to manage.

yarn tr-pull --auth=$TRANSCEND_API_KEY --integrationNames=salesforce,googleBigQuery

this produces a transcend.yaml like this:

data-silos:
  - title: Postgres RDS Database
    integrationName: server
    description: Main database that powers the backend monolith
    datapoints:
      - title: Team
        key: access-control-models.team
        description: A team of users sharing the same scopes
        fields:
          - key: description
            description: The description of the team
            title: description
          - key: name
            description: The name of the team
            title: name
      - title: User
        key: access-control-models.user
        description: A user to the admin dashboard
        fields:
          - key: createdAt
            description: null
            purposes: []
            access-request-visibility-enabled: true
            erasure-request-redaction-enabled: true
            categories: []
          - key: email
            description: null
            purposes: []
            access-request-visibility-enabled: false
            erasure-request-redaction-enabled: true
            categories:
              - name: Email
                category: CONTACT
          - key: isAdmin
            title: isAdmin
          - key: isInviteSent
            title: isInviteSent
          - key: email
            title: email
            description: The email of the user
          - key: name
            title: name
          - key: onboarded
            title: onboarded

Step 2) Define a separate configuration file containing your rules

# rules for automatically tagging data points
# with specific data categories if the datapoint name
# follows a certain format
- datapoint-data-categories:
   - column_name: email
     category_name: email
     category: CONTACT

# rules for automatically tagging data subject
# whenever a datapoint is tagged with a specific data category
- datapoint-data-subjects:
   - category: CONTACT
     category_name: Government Email
     data_subject: Government User

Step 3) Write the custom script to update your transcend.yml configuration according to these rules. Here is a brief prompt asking ChatGPT to provide such a script.

Step 4) You could set up this script to automatically run

name: Transcend Data Inventory Auto Tagging
# See https://app.transcend.io/privacy-requests/connected-services

on:
  # Run daily
  schedule:
    - cron: '0 16 * * *'

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Node.js
        uses: actions/setup-node@v2
        with:
          node-version: '16'

      - name: Install Transcend cli
        run: npm i -D @transcend-io/cli

      - name: Pull Transcend config
        run: npx tr-pull --auth=${{ secrets.TRANSCEND_API_KEY }}

      - name: Generate transcend.yml
        run: ./scripts/generate_transcend_yml.py

      - name: Push Transcend config
        run: npx tr-push --auth=${{ secrets.TRANSCEND_API_KEY }}

The Transcend GraphQL API is the most flexible way to push data into Transcend. This is the same API that Transcend's Admin Dashboard uses, so any query or filter you can make in the Transcend Dashboard is also available in this API.

To start:

  1. Generate an API key in the Transcend Dashboard. You will need to assing the API key scopes. These scopes will vary depending on which resource types you intend to push.
  2. Run the CLI command in your terminal to export your Data Inventory to a yml file.

To start, go into the Transcend Admin Dashboard to create an API key.

The scopes that you will grant to the API key will depend on the routes you plan to use. You can visualize the relationship between Scopes and GraphQL Routes on the Scopes tab of the Admin Dashboard.

View Scopes and GraphQL routes relationship

GraphQL APIs have a lot of great open source development tooling built around them. You can call the GraphQL API with any standard request tooling, however you find it nice to use a GraphQL client package within the language of your choice. For example see graphql-request for JavaScript.

You would instantiate your GraphQL client providing the API key as a bearer token.

import { GraphQLClient } from 'graphql-request';

const transcendUrl = 'https://api.transcend.io'; // for EU hosting
// const transcendUrl = 'https://api.us.transcend.io'; // for US hosting

const client = new GraphQLClient(`${transcendUrl}/graphql`, {
  headers: {
    Authorization: `Bearer ${process.env.TRANSCEND_API_KEY}`,
  },
});

Once you've created your GraphQL client, the next step is to create the GraphQL that pushes the data that you care to export. The best way to do this is using the GraphQL playground to construct and test your payload. You will first what to make sure that you are logged in to the Transcend Admin Dashboard and if you have multiple Trancsend accounts, ensure that you have switched into the correct account.

You can then navigate to either the Default EU Hosted GraphQL Playground or the US Hosted GraphQL Playground.

On the right hand side of the screen, you can open up the "Docs" tab to view all of the GraphQL routes avilable, and all of the input/output document types.

View GraphQL Playground

You should use the GraphQL playground to get the most up to date set of parameters for each Data Inventory routes. GraphQL gives you the ability to specify exactly which fields you want to pull. By specifying the minimal set of parameters in the API, you will achieve faster request times.

To understand the useful API calls for pulling data and IDs from the Transcend Data Inventory, check out this article to see some example GraphQL calls for reading data from the Data Inventory.

Some good starting points for GraphQL mutations include:

mutation DataInventoryGraphQLExampleUpdateDataSilos {
  updateDataSilos(
    input: {
      dataSilos: [
        { id: "2b9396b0-7061-4aed-9c70-44fd8cb50d2f", description: "My Bucket" }
      ]
    }
  ) {
    clientMutationId
    dataSilos {
      id
      title
      description
    }
  }
}

This may produce an output like

{
  "data": {
    "updateDataSilos": {
      "clientMutationId": null,
      "dataSilos": [
        {
          "id": "2b9396b0-7061-4aed-9c70-44fd8cb50d2f",
          "title": "Amazon.com",
          "description": "My Bucket"
        }
      ]
    }
  }
}
mutation DataInventoryGraphQLExampleUpdateVendors {
  updateVendors(
    input: {
      vendors: [
        {
          id: "fbe81d3d-e2f5-4d28-bf4b-11884328c724"
          description: "My new description"
        }
      ]
    }
  ) {
    clientMutationId
    vendors {
      id
      title
      description
    }
  }
}

This may produce an output like

{
  "data": {
    "updateVendors": {
      "clientMutationId": null,
      "vendors": [
        {
          "id": "fbe81d3d-e2f5-4d28-bf4b-11884328c724",
          "title": "15Five, Inc.",
          "description": "My new description"
        }
      ]
    }
  }
}
mutation DataInventoryGraphQLExampleUpdateDataCategories {
  updateDataSubCategories(
    input: {
      dataSubCategories: [
        {
          id: "e0945fda-6c34-4766-80ef-b454ad77b728"
          teamNames: ["Developers"]
        }
      ]
    }
  ) {
    clientMutationId
    dataSubCategories {
      id
      name
      description
    }
  }
}

This may produce an output like:

{
  "data": {
    "updateDataSubCategories": {
      "clientMutationId": null,
      "dataSubCategories": [
        {
          "id": "e0945fda-6c34-4766-80ef-b454ad77b728",
          "name": "",
          "description": "Generic health information"
        }
      ]
    }
  }
}
mutation DataInventoryGraphQLExampleUpdateProcessingPurposes {
  updateProcessingPurposeSubCategories(
    input: {
      processingPurposeSubCategories: [
        { id: "4586c6ee-9142-4d4b-9851-aaafbd56ba9d", description: "test" }
      ]
    }
  ) {
    clientMutationId
    processingPurposeSubCategories {
      id
      name
      description
    }
  }
}

This may produce an output like:

{
  "data": {
    "updateProcessingPurposeSubCategories": {
      "clientMutationId": null,
      "processingPurposeSubCategories": [
        {
          "id": "4586c6ee-9142-4d4b-9851-aaafbd56ba9d",
          "name": "",
          "description": "test"
        }
      ]
    }
  }
}
mutation DataInventoryGraphQLExampleUpdateBusinessEntities {
  updateBusinessEntities(
    input: {
      businessEntities: [
        { id: "88b067a8-cc02-483e-a44b-8caeafb41c5e", description: "test" }
      ]
    }
  ) {
    clientMutationId
    businessEntities {
      id
      title
      description
    }
  }
}

This may produce an output like:

{
  "data": {
    "updateBusinessEntities": {
      "clientMutationId": null,
      "businessEntities": [
        {
          "id": "88b067a8-cc02-483e-a44b-8caeafb41c5e",
          "title": "Acme",
          "description": "test"
        }
      ]
    }
  }
}
mutation DataInventoryGraphQLExampleUpdateIdentifier {
  updateIdentifier(
    input: { id: "717284bc-1b17-4543-bc32-0d64aafe12d6", placeholder: "test" }
  ) {
    clientMutationId
    identifier {
      id
      placeholder
    }
  }
}

This may return a result like:

{
  "data": {
    "updateIdentifier": {
      "clientMutationId": null,
      "identifier": {
        "id": "717284bc-1b17-4543-bc32-0d64aafe12d6",
        "placeholder": "test"
      }
    }
  }
}
mutation DataInventoryGraphQLExampleUpdateDataSubject {
  updateSubject(
    input: { id: "2b882d1e-430a-4b37-a6e8-067e18b101d6", title: "New Customer" }
  ) {
    clientMutationId
    subject {
      id
      title {
        defaultMessage
      }
    }
  }
}

This my return a result like:

{
  "data": {
    "updateSubject": {
      "clientMutationId": null,
      "subject": {
        "id": "2b882d1e-430a-4b37-a6e8-067e18b101d6",
        "title": {
          "defaultMessage": "New Customer"
        }
      }
    }
  }
}

Once you have your GraphQL query constructed, you will want to define it as a constant in your code. For example:

import { gql } from 'graphql-request';

const UPDATE_DATA_SILOS = gql`
  mutation DataInventoryGraphQLExampleUpdateDataSilos(
    $input: UpdateDataSilosInput!
  ) {
    updateDataSilos(input: $input) {
      clientMutationId
      dataSilos {
        id
        title
        description
      }
    }
  }
`;

You then will want to use your GraphQL client from Step 2) to make this request:

const paged = await client.request(UPDATE_DATA_SILOS, {
  input: {
    dataSilos: [
      {
        id: '2b9396b0-7061-4aed-9c70-44fd8cb50d2f',
        description: 'My Bucket',
      },
    ],
  },
});