LLM Classifier on Kubernetes

The LLM (Large Language Model) Classifier is a computationally expensive application designed to analyze and classify text data using advanced natural language processing techniques. As a Docker container image, the LLM Classifier can be easily deployed into a Kubernetes (k8s) environment. This guide provides a minimal example of deploying the LLM Classifier on Kubernetes alongside your Sombra instance. For a comprehensive reference on Sombra deployment, refer to the Sombra on Kubernetes guide.

Note: LLM classification is a computationally expensive operation and it's required that the LLM classifier pod runs on node with Nvidia GPU.

You can pull our image from Transcend's private Docker registry using basic authentication.

First, please contact us and request permission to pull the llm-classifier image. We will then add your Transcend account to our permissions list.

Once we have added you to our allow list, you can log in to our private registry:

docker login docker.transcend.io

You will be prompted to enter the basic auth credentials. The username will always be "Transcend" (this is case-sensitive), and the password will be any API Key for your organization within the Admin Dashboard (note: a scope is not required for the API key).

Once you've logged in, you may pull images by running:

docker pull docker.transcend.io/llm-classifier:<version_tag>

You can deploy the LLM Classifier alongside Sombra, either in the same cluster, or as a separate cluster, and configure your network to allow Sombra to reach the Classifier.

This is a sample config to deploy the LLM Classifier in a Kubernetes cluster.

apiVersion: v1
kind: Namespace
metadata:
  name: transcend
---
apiVersion: v1
kind: Service
metadata:
  name: llm-classifier-ingress
  namespace: transcend
spec:
  selector:
    app: llm-classifier-app
  ports:
    - protocol: TCP
      port: 6081
      targetPort: 6081
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: llm-classifier-app
  namespace: transcend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: llm-classifier-app
  template:
    metadata:
      labels:
        app: llm-classifier-app
    spec:
      containers:
        - name: llm-classifier-container
          image: llm-classifier:<version_tag>
          ports:
            - name: http
              containerPort: 6081
              protocol: TCP
          env:
            - name: LLM_SERVER_PORT
              value: '6081'
            - name: LLM_SERVER_CONCURRENCY
              value: '2'
            - name: LLM_SERVER_TIMEOUT
              value: '120'
          resources:
            limits:
              memory: 8Gi
              nvidia.com/gpu: '1'
          livenessProbe:
            httpGet:
              path: /health/ping
              port: 6081
              scheme: HTTP
            timeoutSeconds: 30
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 10
          startupProbe:
            httpGet:
              path: /health/ping
              port: 6081
              scheme: HTTP
            timeoutSeconds: 30
            periodSeconds: 20
            successThreshold: 1
            failureThreshold: 10

Update the LLM_CLASSIFIER_URL environment variable in your Sombra deployment to point to the llm-classifier-ingress created above: LLM_CLASSIFIER_URL=http://<llm-classifier-service-cluster-ip>:<llm_service_port>