Guide to Triaging & Classifying Cookies and Data Flows

The first step to implementing a consent manager on your site is to understand what data is being sent from your site and where it’s going. Transcend does this by collecting data flows, cookies, network requests, etc. from your site and regulating them using assigned tracking purposes. This guide discusses how to review and classify data flows discovered through telemetry to ensure all data flows have tracking purposes and are approved for regulation. For more information about the data flows and cookies in Transcend, check out our overview on Data Flows.

If you’ve added Transcend’s airgap.js script to your website, telemetry data is automatically collected from outgoing network requests and used to populate the Data Flow and Cookie triage views in the Transcend Consent Management dashboard. These data flows and cookies are discovered by the airgap.js script when they are encountered on your site by a website user. The more activity you get across all the pages of your site, the more telemetry data will be available for Triage. In this way, you can automatically collect all potential sources of data sharing and tracking that happen on your site.

In order to respect a website user’s consent preferences, it’s important to know the consent tracking purposes for each data flow/cookie on your site. For example, if a user on your website elects to “opt-out of the sale of personal data”, the Transcend Consent Management needs to know which data flows are collecting data for sale in order to effectively regulate those network requests. This is accomplished by classifying each data flow and cookie with tracking purposes in the Transcend Cookie & Data Flow Dashboards. The next sections go over how data flows/cookies are classified and how to approve them for regulation by the consent manager.

Cookies and Data Flows that appear in the Triage tabs in the Admin Dashboard are those that have been discovered by the airgap.js script, but still require review before they are regulated by the consent manager on your site. These data flows are not yet included in your “live” airgap.js bundles, meaning that they are not regulated by a user’s consent choices just yet (outside of your Unknown Request Policy). At this point, the information has been gathered, but is not actionable until reviewed and approved.

Ultimately, it’s each company’s responsibility to know what personal information is collected on their sites, when that info is shared and how it’s used. However, researching and classifying the data and purpose for each data flow/cookie on your site can be a time-consuming process. To make this process more efficient, Transcend will automatically classify data flows and cookies for common trackers. As an example, let’s look at Google Analytics: we commonly see customers with a Google Analytics script added to their site to collect analytics/tracking data on their users. We can auto-classify the consent tracking purpose for each data flow resulting from the GA script to provide granular tracking purpose recommendations. We do this for many common trackers and are continually expanding our auto-classification capabilities to simplify this effort.

Auto-classified data flows and cookies are visible in the Triage view, with the associated service and our recommended tracking purpose.

Triaging classified data flows is easy: simply confirm the service and the tracking purpose before approving the data flow. We highly recommend testing your site with these classifications thoroughly before approving data flows for regulation, but we’ll touch on this in more detail in a later section.

Some cookies and data flows discovered through telemetry will not have an associated service or tracking purpose. This often occurs for proprietary data flows, like cookies developed and added to your site by your Web team. Because Transcend cannot know about custom data flows, they may be listed in the Triage view without tracking purpose recommendations. This can also occur for external data flows that Transcend has not encountered previously.

To correctly classify these cookies or data flows, you can do some research to figure out where it’s coming from and what the tracking purpose should be. Your engineering team should be able to provide you a list of in-house cookies and data flows with a description of what data is collected and how it’s used. For third party trackers, here are some steps you can take to research what data is being processed:

  1. Look up the Cookie/data sharing policy for the script or domain the data flow is coming from. Companies will often publish a list of cookies that get set by their SDKs when added to a website.
  2. Search for the data flow or cookie on Google. There is often data available already, especially when using additional search terms like “compliance” and “GDPR”.
    • Some tools and databases are available for researching cookies, like CookieDatabase.org
    • better.fyi/trackers is another tool to use to find a company name for a data flow. Pro Tip: be sure to search without the subdomain. (ex: try lincdn.com instead of snap.lincdn.com)
    • Try to find the tool's Content Security Policy (ex:${companyName} CSP). CSPs are a great resource to understand what/how information is shared from a system.

There will be cases where multiple versions of the same data flow/cookie are present on your site. It’s not uncommon to see some cookies being set hundreds of times. For example, the Google Analytics script sets a cookie, _ga{{UUID}} to ​​track a user's page views and clicks with a unique ID. This cookie gets set with a unique ID for every distinct user on the site - ex: _ga128958374384. Because we expect the tracking purpose and consent options to be the same for every instance of this cookie, it doesn’t make sense to manually assign a tracking purpose to each unique occurrence and approve them for every new occurrence.

Instead, you can create a New Cookie with a Regular Expression rule to proactively match every occurrence of that cookie. To do so:

  1. Select the button to add a new cookie/data flow
  2. Enable the “regular expression” toggle
  3. Enter the regular expression to match cookies on. Regex101 is a helpful tool to test whether a regular expression will match the desired cookie name
  4. Add the Tracking Purposes and Service to the Cookie/Data Flow
  5. Select the “Add Cookie” button to save the new Rule

Transcend can recommend cookie and data flow regexes for common trackers and systems in the Triage view. We recommend checking the recommended regexes first by filtering the Triage view by Type > Regex before creating new ones - you may save time if one has already been created for you!

There are times when data flows are present on your site for scripts and trackers that are not loaded directly by your site, meaning that they aren’t placed intentionally by your organization. This is often the result of browser extensions or malware that are present on an end-users’ device or browser. For example, if an end user accessing your site has a browser extension running, that browser extension may inject a data flow into the site to accomplish its purpose.

consent manager telemetry sometimes picks up these data flows and as such they may be present in the Triage view of the Admin Dashboard. Businesses are only responsible for regulating consent preferences for data flows/cookies they add - not those injected by external software. There is no need for these data flows to be regulated by your consent manager. You can mark these data flows as “junk” in Transcend to remove them from your Triage view. Data Flows that are marked as junk will not be added to your airgap.js bundle.

Once your data flows and cookies are classified with a tracking purpose and any desired regexes are added to collapse recurring cookies, you can approve them to be regulated by your consent manager. Once they are approved, they will be added to your Airgap.js bundle to be regulated on your site, and are available under the "Approved" tab.

To approve a data flow or cookie and add it to your airgap.js bundle for regulation, you can use the "Approve" button to individually approve the data flow in line, or use the bulk selection feature to approve many at once.

Researching and approving all your cookies and data flows may take time. It’s helpful to prioritize approving data flows for high impact and common trackers that align with your legal strategy first. Here are a few recommended steps on how to prioritize cookie & data flow approval:

1. Understand which Consent purposes you need to support. Different privacy regimes (CCPA, GDPR, etc.) give consumers different rights to their data. Before approving any data flows, we recommend knowing which consent regions you have users in, and which consent purposes should be supported. This way, if you come across cookies/flows with consent purposes that aren't required for your use case, you don't have to spend time approving it if your Unknown Request Policy is set to allow unknown requests. For more information, check out our Data Rights Laws guide and California Do Not Sell guide

2. Approve commonly known trackers first.

  • Filter by “Regexes” to view Transcend recommended regular expression rules for common scripts like Google Ads, Google Analytics, Segment, etc.
  • Confirm the tracking purposes and approve these to get high impact coverage and reduce the number of flows in Triage.
  • Review and approve remaining recommended classifications, starting with the first page. Data flows in the Triage view are ordered by number of occurrences, so ones with the highest signal will be present first in the list.

3. Approve internal cookies and data flows.

  • Add tracking purposes and label with “Internal Service”.
  • Approve these flows and cookies to be regulated by the consent manager.

4. Create additional regular expression rules for recurring data flows.

  • Quickly scan through the remaining Triage list to identify patterns
  • Create regular expression rules to collapse down repeating entries

5. Research, classify and approve remaining data flows without classifications.

  • Use the “Mark as Junk” feature to denote junk data flows added by browser extensions and remove them from your Triage view.
  • Reach out to Transcend if you have any questions.