LLM Classifier
The LLM (Large Language Model) Classifier is an application designed to improve data classification accuracy using advanced natural language processing techniques. It is a separate, optional service that is attached to Sombra. Customers self-hosting Sombra can run the LLM Classifier in the same private network as Sombra.
Our deployment guides provide instructions on how to deploy the LLM Classifier using different configuration setups. If you are using our recommended Helm Chart, deploying the LLM Classifier simply requires adding:
llm-classifier:
enabled: trueto your values.yaml file.
The LLM Classifier container runs a gunicorn server that listens to requests and performs LLM classification on the inputs provided in the request body. Since the LLM performs more efficiently on Graphics Processing Units (GPUs), the container must run on a node with a supported NVIDIA GPU (see GPU Requirements below).
The LLM Classifier container by default listens on port 6081. This can be changed using the LLM_SERVER_PORT environment variable.
To enable HTTPS connections to the LLM Classifier server, you can mount the SSL certificate and key file to the container and set the path to these files using the environment variables LLM_CERT_PATH and LLM_KEY_PATH respectively.
You can pull our image from Transcend's private Docker registry using basic authentication.
First, please contact us and request permission to pull the llm-classifier image. We will then add your Transcend account to our permissions list.
Once we have added you to our allow list, you can log in to our private registry:
docker login docker.transcend.ioYou will be prompted to enter the basic auth credentials. The username will always be "Transcend" (this is case-sensitive), and the password will be any API Key for your organization within the Admin Dashboard (note: a scope is not required for the API key).
Once you've logged in, you may pull images by running:
docker pull docker.transcend.io/llm-classifier:<version_tag>The LLM Classifier requires a node with an NVIDIA GPU that has sufficient VRAM to load the classification models. The service uses 4-bit quantization to reduce memory requirements.
| Requirement | Value |
|---|---|
| GPU vendor | NVIDIA |
| Minimum VRAM | 24 GB |
| CUDA support | Required (CUDA 12.x) |
| GPU count per node | 1 |
| Cloud provider | Instance type | GPU | GPU VRAM | Instance RAM | vCPU |
|---|---|---|---|---|---|
| AWS | g5.2xlarge (recommended) | 1x NVIDIA A10G | 24 GB | 32 GB | 8 |
| AWS | g5.xlarge | 1x NVIDIA A10G | 24 GB | 16 GB | 4 |
| GCP | g2-standard-8 or a2-highgpu-1g | 1x NVIDIA L4 or A100 | 24–40 GB | 32 GB | 8 |
| Azure | Standard_NC8as_T4_v3 or equivalent | 1x NVIDIA T4 or A10 | 16–24 GB | 56 GB | 8 |
Note: The g5.2xlarge on AWS is our recommended and most thoroughly tested instance type. The NVIDIA A10G GPU (24 GB VRAM) provides the best price-to-performance ratio for this workload. If using another cloud provider or GPU model, ensure it has at least 24 GB of VRAM.
Each LLM Classifier pod requires:
| Resource | Value |
|---|---|
| GPU | 1x nvidia.com/gpu |
| Memory | 15 GB |
| Replicas (production) | 2 (minimum recommended) |
The Kubernetes cluster must support nvidia.com/gpu as a schedulable resource. Ensure the NVIDIA device plugin is installed in your cluster.
| Setting | Value |
|---|---|
| Minimum replicas | 1 (dev/staging), 2 (production) |
| Maximum replicas | 2–4 |
| Scale-out trigger (GPU utilization) | ~60% average |
| Scale-out trigger (response time) | ~20 seconds average |
The Horizontal Pod Autoscaler (HPA) monitors GPU utilization and response time. When either metric exceeds the target threshold, additional pods are scheduled. Throughput scales linearly with the number of pods — adding more replicas will proportionally increase classification capacity.
| Configuration | Instance type | Pricing model | Estimated monthly cost |
|---|---|---|---|
| Single node | g5.2xlarge | On-demand | ~$880/month |
| Single node | g5.2xlarge | 1-year reserved | ~$560/month |
| Single node | g5.2xlarge | 3-year reserved | ~$355/month |
| Production (2 nodes) | 2x g5.2xlarge | 1-year reserved | ~$1,120/month |
Pricing is approximate and based on AWS US East (N. Virginia) region. Check AWS EC2 pricing and reserved instance pricing for current rates. Other cloud providers will have comparable pricing for equivalent GPU instances.
If you need more throughput (more classifications per hour), you can add more instances of the LLM Classifier to linearly scale the throughput.
This values.yaml adds an accompanying LLM Classifier to your Sombra deployment. The LLM Classifier requires an NVIDIA GPU to run, so please make sure your cluster supports nvidia.com/gpu as a resource.
envs:
# ... other env vars
- name: LLM_CLASSIFIER_URL
value: http://<release-name>-llm-classifier.transcend.svc:6081
llm-classifier:
enabled: trueOr with TLS termination at Sombra and the LLM Classifier server:
envs:
# ... other env vars
- name: LLM_CLASSIFIER_URL
value: https://<release-name>-llm-classifier.transcend.svc:6081
envs_as_secret:
# ... other env vars
- name: SOMBRA_TLS_CERT
value: <SOMBRA_TLS_CERT>
- name: SOMBRA_TLS_KEY
value: <SOMBRA_TLS_KEY>
# An optional passphrase associated with your TLS private key. If you set a
# passphrase when you created your key and certificate, you must provide it here.
- name: SOMBRA_TLS_KEY_PASSPHRASE
value: <SOMBRA_TLS_KEY_PASSPHRASE>
llm-classifier:
enabled: true
tls:
enabled: true
# saved as secret
cert: |-
-----BEGIN CERTIFICATE-----
<base64>
-----END CERTIFICATE-----
# saved as secret
key: |-
-----BEGIN PRIVATE KEY-----
<base64>
-----END PRIVATE KEY-----
# volume containing cert and key
volumes:
- name: llm-classifier-ssl
secret:
secretName: llm-classifier-secrets
# mount the directory containing the cert and key to pod
volumeMounts:
- mountPath: '/etc/llm-classifier/ssl'
name: llm-classifier-ssl
readOnly: true
# Set the location of cert and key in environment
envs:
- name: LLM_CERT_PATH
value: '/etc/llm-classifier/ssl/llm-classifier.cert'
- name: LLM_KEY_PATH
value: '/etc/llm-classifier/ssl/llm-classifier.key'| Variable | Default | Description |
|---|---|---|
LLM_SERVER_PORT | 6081 | Port the classifier listens on |
LLM_SERVER_CONCURRENCY | 1 | Number of gunicorn workers (should match GPU count) |
LLM_SERVER_WORKER_CONNECTIONS | 1000 | Max simultaneous connections per worker |
LLM_SERVER_TIMEOUT | 120 | Request timeout in seconds |
LLM_SERVER_BACKLOG | 500 | Max queued connections |
LLM_CERT_PATH | — | Path to TLS certificate (enables HTTPS) |
LLM_KEY_PATH | — | Path to TLS private key (enables HTTPS) |