LLM Classifier on Kubernetes
The LLM (Large Language Model) Classifier is a computationally expensive application designed to analyze and classify text data using advanced natural language processing techniques. As a Docker container image, the LLM Classifier can be easily deployed into a Kubernetes (k8s) environment. This guide provides a minimal example of deploying the LLM Classifier on Kubernetes alongside your Sombra instance. For a comprehensive reference on Sombra deployment, refer to the Sombra on Kubernetes guide.
Note: LLM classification is a computationally expensive operation and it's required that the LLM classifier pod runs on node with Nvidia GPU.
You can pull our image from Transcend's private Docker registry using basic authentication.
First, please contact us and request permission to pull the llm-classifier
image. We will then add your Transcend account to our permissions list.
Once we have added you to our allow list, you can log in to our private registry:
docker login docker.transcend.io
You will be prompted to enter the basic auth credentials. The username will always be "Transcend" (this is case-sensitive), and the password will be any API Key for your organization within the Admin Dashboard (note: a scope is not required for the API key).
Once you've logged in, you may pull images by running:
docker pull docker.transcend.io/llm-classifier:<version_tag>
You can deploy the LLM Classifier alongside Sombra, either in the same cluster, or as a separate cluster, and configure your network to allow Sombra to reach the Classifier.
This is a sample config to deploy the LLM Classifier in a Kubernetes cluster.
apiVersion: v1 kind: Namespace metadata: name: transcend --- apiVersion: v1 kind: Service metadata: name: llm-classifier-ingress namespace: transcend spec: selector: app: llm-classifier-app ports: - protocol: TCP port: 6081 targetPort: 6081 type: ClusterIP --- apiVersion: apps/v1 kind: Deployment metadata: name: llm-classifier-app namespace: transcend spec: replicas: 1 selector: matchLabels: app: llm-classifier-app template: metadata: labels: app: llm-classifier-app spec: containers: - name: llm-classifier-container image: llm-classifier:<version_tag> ports: - name: http containerPort: 6081 protocol: TCP env: - name: LLM_SERVER_PORT value: '6081' - name: LLM_SERVER_CONCURRENCY value: '2' - name: LLM_SERVER_TIMEOUT value: '120' resources: limits: memory: 8Gi nvidia.com/gpu: '1' livenessProbe: httpGet: path: /health/ping port: 6081 scheme: HTTP timeoutSeconds: 30 periodSeconds: 10 successThreshold: 1 failureThreshold: 10 startupProbe: httpGet: path: /health/ping port: 6081 scheme: HTTP timeoutSeconds: 30 periodSeconds: 20 successThreshold: 1 failureThreshold: 10
Update the LLM_CLASSIFIER_URL environment variable in your Sombra deployment to point to the llm-classifier-ingress
created above:
LLM_CLASSIFIER_URL=http://<llm-classifier-service-cluster-ip>:<llm_service_port>