Architecture
This document focuses on the technical details of Sombra's architecture, such as how it achieves high availability and how it communicates with our backends that we host.
An executive summary of Sombra can be found on the Introduction page.
Sombra uses a reverse tunneling architecture. Tunneling lets Transcend Cloud securely open a connection to Sombra in your private network or Virtual Private Cloud (VPC), without exposing it to the internet.
As an alternative to the Reverse Tunnel method, Sombra also supports the Direct Connection method, where our backend sends traffic directly to Sombra. Since Sombra is on your network, this method requires additional setup, detailed on our Networking Configuration Page.
The Reverse Tunnel method is almost always the right choice:
- It is simpler to set up: No need to create firewall ingress rules, TLS configuration, or load balancers.
- It is more secure in general: Reverse tunneling does not require exposing ports to the internet, and is less prone to security misconfigurations.
The Reverse Tunnel method is the default in our recommended setup guide. We only recommend the Direct Connection method if you have company policies which require it, such as established deployment patterns.
At its core, Sombra is an Amazon Linux 2023-based Docker image.
For details on gaining access to the Sombra Docker image, see our quickstart guide.
If you are using our Structured Discovery or Unstructured Discovery products, you may optionally host a Transcend-created LLM that can classify the contents of your databases, filesystems, buckets, SaaS tools (like Salesforce or OneDrive), and more.
Similar to the core Sombra image, the LLM Classifier is a Docker image. The primary difference is that in order to run efficiently, the LLM Classifier requires GPU access.
The only networking requirement for the LLM Classifier is that Sombra is able to make HTTP requests to it. The LLM Classifier service is typically hosted within your internal network / firewalls. If you are using our recommended setup, this connection is already set up.
Sombra is designed to scale horizontally. Each Sombra node is a worker that processes jobs. It does not store any persistent state, and it is fault tolerant, making it especially simple to auto-scale.
You can also vertically scale Sombra onto larger instance sizes, which has a similar effect of enabling more bandwidth to process tasks.
There are many ways to achieve horizontal scaling, vertical scaling, or a combination of the two. These techniques are outlined in depth in our Deployment Guides.
Sombra performs a Diffie-Hellman Key Exchange with its frontend clients (admin users in the Admin Dashboard, and end-users in the Privacy Center) to generate a shared cryptographic key. That shared key is then used to send end-to-end encrypted messages between the Sombra and the users. Communications between Sombra and users are end-to-end encrypted. Much like a Signal or iMessage chat, the messages passing through Transcend's servers are opaque to Transcend. The below picture shows a high-level version of the flow, where a shared key can be established by Sombra and a client in such a way that our backend cannot generate the same secret.
To be clear, this E2EE architecture is in addition to ordinary encryption of data in-transit and at-rest that you'd expect from any software vendor. Thus, E2EE messages are also transmitted with TLS, and any E2EE messages in storage are also separately encrypted at rest. Sombra does not replace those security best practices. It accomplishes the separate goal of keeping your corporate data out of Transcend.
Sombra hosts an HTTP API for your services to make requests to Transcend. This API can be thought of as the "Transcend API" since the Sombra API is in fact an API gateway to the Transcend Cloud.
This API encrypts customer data before it enters the Transcend Cloud. For example, if you are programmatically uploading files in response to a data subject access request, that file is end-to-end encrypted. It is only decipherable by Sombra and the end-user.
Other API use-cases include:
For more information, see "Sombra Customer Ingress".
As Sombra is the encryption module on the backend, Penumbra is the decryption module on the frontend. Sombra and Penumbra form each "end" in the "end-to-end encryption" architecture.
Our web interfaces are largely powered by encrypted data. Since Transcend's backend servers only have encrypted copies of your data, we cannot serve unencrypted data to a user interface. Instead, we serve encrypted data which can be decrypted on a client device by Penumbra. Specifically, Penumbra is a decryption technology which operates on a background thread in the browser's runtime.
In cases where a user needs to view unencrypted data (and they have permission to do so) the user can decrypt data on their device using a decryption key. To fetch this decryption key, the user must be authenticated and have the right privileges. A user verifies their identity through a seamless web-based authentication flow (such as account login), Penumbra forms a secure channel with Sombra to pass the user's authentication information, and Sombra attempts to verify the user. If Sombra successfully verifies the user (and if they have permission to retrieve the requested decryption key), Sombra responds to Penumbra with the decryption key, and Penumbra uses it to decrypt the data.
All of this happens seamlessly through our web interfaces. To a user, there is no visual difference between a Transcend interface using encrypted data and a typical web interface—it's as if the data was served normally.
Transcend's Admin Dashboard (used by you and your team) and the Privacy Center (used by your end-users) both have Penumbra under the hood. Note: depending on the Transcend products you use, the Privacy Center may not be applicable to your Transcend implementation—the Privacy Center is part of the Transcend DSR Automation product.
- In the Admin Dashboard, admins can be given permission to decrypt data. For example, an admin can decrypt samples of real data in Transcend Structured Discovery, or the content of a data export associated with a data subject access request in Transcend DSR Automation.
- In the Privacy Center, end-users requesting access to their data have, of course, permission to decrypt their own data export.
To make all of this possible, Transcend Engineering built and open-sourced Penumbra, the first client-side decryption streaming technology. Like Sombra, Penumbra also streams all content, which means data never has to fully buffer into memory. Since Transcend's E2EE stack purely streams data, hardware memory is not a constraint, and any-sized payload can be transferred with end-to-end encryption.
Once Penumbra has begun decrypting data, it can stream the unencrypted output for display in a web interface (e.g., preview data in the Admin Dashboard), or download the data to disk. Since many exports include several files, Transcend Engineering also built and open-sourced Conflux, the first client-side zip-streaming technology, which takes many file streams as input, and outputs one .zip file stream.
- If you're self-hosting Sombra, follow this guide to deploy Sombra in minutes.
- If Transcend is managing Sombra for you, there is no configuration required from you.
- If you're curious about how Transcend offers a seamless web experience powered by encrypted data, check out Penumbra and Conflux on GitHub.