Google Firebase Plugins Configuration

Use the Google Cloud Firestore integration with Transcend’s Structured Discovery to automatically identify personal information in your Firestore project and pull it into Transcend as datapoints. Those datapoints are assigned a data category and processing purpose to help you classify internal data. See the full Structured Discovery guide.

The Datapoint Schema Discovery plugin for Cloud Firestore scans your project to identify data and pull it into Transcend as objects and properties. Once discovered, objects and properties can be classified, labeled, and used across Transcend.

  • Traversal:
    • Enumerates databases in the project and lists root collections.
    • Pages documents per collection (default page size 200) up to a capped number per collection, flattens document fields into canonical paths, and recursively traverses subcollections.
  • Canonical path conventions (consistent across integrations):
    • Nested objects/maps: use :: between path segments
    • Arrays: append [] to the field name
  • Subcollections are represented as nested objects in Transcend and discovered recursively.
Firebase Datapoint Schema Discovery

To enable the Datapoint Schema Discovery plugin, navigate to the Structured Discovery tab within the Cloud Firestore data system and toggle the plugin on. From there, set the schedule for when the plugin runs to discover new objects and properties as they are added. We recommend scheduling during off-peak hours for large datasets.

Scan configuration for the Firebase integration
OptionControlsDefaultValid valuesWhen to changePerformance impact
databaseRegexFilterLimit scanned databases by matching the database ID (e.g., "(default)")Not set (no filtering)JavaScript/ECMAScript regex stringNarrow scope in large or multi‑env projectsFewer database enumerations; less traversal
databaseRegexFilterOutExclude databases that match the regexNot set (false)BooleanExclude test/dev, etc., with a broad regexFaster scans; fewer API calls
collectionRegexFilterLimit scanned collection paths (root and subcollections); regex matches path (e.g., /users, /users/{doc}/orders)Not set (no filtering)JavaScript/ECMAScript regex stringFocus on critical namespaces or avoid archival/analyticsFewer listings and document pages
collectionRegexFilterOutExclude collection paths that match the regexNot set (false)BooleanExclude logs/events/cache without enumerating includesReduces traversal and sampling

Content Classification uses samples of Firestore data to suggest data categories and processing purposes for discovered fields.

The sampling strategy determines how documents are selected for classification within a collection.

StrategyDescription
defaultRandom sampling across documents.
newestSort by a specified field descending (e.g., createdAt).
OptionControlsDefaultValid valuesWhen to changePerformance impact
sampleSizeDocuments sampled per collection for classification25Positive integerIncrease for priority collections that need more coverage; decrease to cut time/costHigher → more coverage and cost; lower → faster runs
pageSizeDocuments per API call when listing during sampling500 (clamped 1–1000)1–1000Increase to reduce round‑trips; decrease if timeouts/payload/quotaHigher → fewer requests, larger responses; lower → steadier load
maxPagesMaximum pages fetched per collectionNot set (optional cap)Positive integerSet to bound runtime/quota during large scansLower caps reduce requests; may limit coverage
maxDocsTraversedCap on total documents traversed across samplingTypically up to ~25,000Positive integerUse to cap cost/time on large datasetsDirectly bounds total work; too low may under‑sample
nonNullSamplingSample only non‑null values for the target fieldDisabled (unless enabled)BooleanEnable to avoid empty/null values skewing resultsFewer samples; higher signal per request
flattenMaxDepthMax nesting depth to flatten during classification5Positive integerIncrease to include deeper leaf fields; decrease to curb cardinalityHigher depth increases derived properties/time
flattenMaxPathLengthMax characters for a canonical flattened path100Positive integerDecrease to curb extremely long keys; increase if legitimate fields are droppedShorter limits reduce memory/output
  • Canonical path rules (aligned with Mongo): nested objects use :: (e.g., profile::location::lat); arrays append [] (e.g., tags[], posts[]::title).
  • Note: When using the “newest” strategy, set newestSortField to a sortable timestamp (e.g., createdAt).
  • Start with defaults, then:
    • Use include/exclude filters first to bound scope (databases/collections).
    • If runs are slow or quota‑heavy, lower maxDocumentsPerCollection and pageSize; consider caps on maxCollectionsPerDatabase and maxSubCollectionsPerDocument.
    • For very nested schemas, set flattenMaxDepth (e.g., 3–5) to avoid an explosion of derived properties; optionally cap flattenMaxPathLength.
    • For Content Classification, the defaults (sampleSize=25, pageSize=500 with 1–1000 clamp, flattenMaxDepth=5) balance coverage and performance; increase selectively for priority collections.
  • Permission denied (403/UNAUTHENTICATED): Ensure the service account has roles/datastore.viewer and the projectId is correct; verify Firestore is enabled for the project.
  • Invalid key JSON: Paste the full service account key JSON and preserve newlines.
  • Quotas (429) or service unavailability (503): The integration automatically backs off and retries. Reduce page sizes, lower sample sizes, or schedule during off-peak hours.
  • Newest strategy: Ensure your newestSortField exists on documents.

Note: DSR Automation is not yet supported for Cloud Firestore.