LLM-Sentinel

Enterprise-Grade Anomaly Detection and Observability for LLM Applications

![Rust](https://www.rust-lang.org/)
![License](LICENSE)
![Build]()
![Kubernetes](https://kubernetes.io/)
![Helm](https://helm.sh/)

LLM-Sentinel is a production-ready, real-time anomaly detection and observability platform designed specifically for Large Language Model (LLM) applications. Built in Rust for maximum performance and reliability, it provides comprehensive monitoring, statistical anomaly detection, automated alerting, and deep observability to ensure the reliability, security, and cost-effectiveness of your LLM deployments.

Status: ✅ Production Ready - Full implementation with enterprise-grade deployment infrastructure

- Key Features
- Detection Capabilities
- Deployment Options
- Quick Start
- Architecture
- Observability
- API Reference
- Example Producers
- Configuration
- Performance
- Contributing

Key Features

$3

- Z-Score Detection: Parametric anomaly detection for normally distributed metrics with configurable thresholds (default: 3.0σ)
- IQR Detection: Non-parametric outlier detection using interquartile range (default: 1.5x multiplier)
- MAD Detection: Robust outlier detection using median absolute deviation (default: 3.5 threshold)
- CUSUM Detection: Cumulative sum change point detection for drift and regime shifts (default: 5.0 threshold, 0.5 drift)
- Multi-Dimensional Baselines: Per-service, per-model statistical baselines with automatic updates
- Configurable Sensitivity: Tune detection sensitivity for your specific use cases

$3

Monitor all critical LLM metrics:

- Latency Spikes: Detect unusual response times (P50, P95, P99)
- Token Usage Anomalies: Monitor prompt and completion token consumption patterns
- Cost Anomalies: Track unexpected spending patterns and budget overruns
- Error Rate Spikes: Identify service degradation and failures
- Model Drift: Detect quality degradation over time
- Usage Patterns: Identify suspicious or abnormal usage behavior
- Throughput Changes: Monitor request rate variations

$3

- 10,000+ events/second ingestion throughput
- Sub-5ms P50 latency for anomaly detection
- Lock-free concurrent baseline updates using DashMap
- Batch processing for optimal InfluxDB writes (100-record batches)
- Zero-copy parsing for maximum efficiency
- Async/await throughout for non-blocking I/O
- Memory-efficient streaming processing

$3

- Zero unsafe code: Memory safety guaranteed by Rust's type system
- Comprehensive error handling: Type-safe Result propagation with detailed error context
- Graceful shutdown: Proper signal handling (SIGTERM, SIGINT) with resource cleanup
- Health checks: Liveness, readiness, and startup probes for Kubernetes
- Circuit breakers: Automatic failure detection and recovery
- Exponential backoff: Intelligent retry logic for transient failures
- Connection pooling: Efficient resource management

$3

- RabbitMQ Integration: Topic-based routing with severity levels (info, warning, critical)
- Webhook Delivery: HTTP POST with HMAC-SHA256 signatures for verification
- Alert Deduplication: Configurable 5-minute window to prevent alert storms
- Retry Logic: Exponential backoff with configurable max attempts (default: 3)
- Priority Routing: Route critical alerts to different channels
- Batch Alerting: Optional batching for high-volume scenarios

$3

- InfluxDB v3: Time-series storage for telemetry with automatic downsampling
- Moka Cache: High-performance in-memory cache (10,000 entry capacity)
- Redis Support: Distributed caching for multi-instance deployments
- Persistent Baselines: Baseline persistence to disk for quick restarts
- Query API: REST endpoints for historical data retrieval and analysis
- TTL Management: Automatic expiration of stale data (300s default)

$3

- 50+ Prometheus Metrics: Comprehensive instrumentation of all subsystems
- 4 Pre-built Grafana Dashboards:
- Anomaly Detection Dashboard
- System Health Dashboard
- Performance Metrics Dashboard
- Alert Overview Dashboard
- 50+ Alert Rules: Production-ready Prometheus alerting covering all failure modes
- Distributed Tracing: OpenTelemetry support for request tracing
- Structured Logging: JSON logs with configurable levels (trace, debug, info, warn, error)

$3

- Kubernetes-Ready: Production manifests with HPA, PDB, and network policies
- Helm Chart: Parameterized chart for easy deployment and upgrades
- Docker Images: Multi-stage builds with minimal attack surface (<50MB)
- Horizontal Scaling: Support for 3-10+ replicas with auto-scaling
- StatefulSet Support: Optional for baseline persistence
- Service Mesh Compatible: Works with Istio, Linkerd, Consul Connect

$3

- Non-Root Containers: Runs as UID 1000 with dropped capabilities
- Read-Only Filesystem: Root filesystem mounted read-only
- Network Policies: Restrict ingress/egress to required services only
- Secret Management: Support for Kubernetes secrets and external secret stores
- PII Sanitization: Automatic detection and removal of sensitive data
- Audit Logging: Complete audit trail of all anomalies and alerts
- SBOM Generation: Software Bill of Materials for vulnerability tracking

Detection Capabilities

$3

LLM-Sentinel provides four complementary detection algorithms that can be enabled individually or in combination:

#### 1. Z-Score Detection (Parametric)

Best for: Metrics that follow a normal distribution

``yaml detection: zscore: threshold: 3.0 # Standard deviations from mean sensitivity: "medium" # low, medium, high metrics: - latency_ms - total_tokens - cost_usd`

Use Cases: - Latency spike detection - Token usage monitoring - Cost anomaly detection

#### 2. IQR Detection (Non-Parametric)

Best for: Metrics with skewed distributions or outliers

`yaml detection: iqr: multiplier: 1.5 # IQR multiplier for outliers metrics: - latency_ms - total_tokens`

Use Cases: - Robust outlier detection - Handling non-normal distributions - Resistant to extreme values

#### 3. MAD Detection (Robust)

Best for: Metrics requiring high robustness to outliers

`yaml detection: mad: threshold: 3.5 # MAD threshold metrics: - latency_ms`

Use Cases: - Ultra-robust detection - Minimal false positives - Gradual baseline shifts

#### 4. CUSUM Detection (Change Point)

Best for: Detecting sustained changes and drift

`yaml detection: cusum: threshold: 5.0 # Cumulative threshold drift: 0.5 # Drift parameter metrics: - latency_ms`

Use Cases: - Model performance degradation - Service quality drift - Gradual system changes

`$3`

- Adaptive Baselines: Automatic baseline updates every 60 seconds - Multi-Dimensional: Separate baselines per service, model, and metric - Configurable Window: 1000-sample sliding window (configurable) - Minimum Samples: Require 10+ samples before detection (prevents cold-start false positives) - Persistence: Save/load baselines from disk for fast restarts

`Deployment Options`

`$3`

Perfect for local development and testing:

`bash

`Start full environment (Kafka, InfluxDB, RabbitMQ, Redis, Prometheus, Grafana)`


docker-compose up -d
View logs

docker-compose logs -f sentinel
Stop environment

docker-compose down


Includes:
- Sentinel (3 replicas)
- Kafka cluster (3 brokers)
- InfluxDB v3
- RabbitMQ with management UI
- Redis
- Prometheus
- Grafana with pre-loaded dashboards
$3
Production-grade deployment with all manifests:

`bash

`Apply all manifests`


kubectl apply -f k8s/
Or using kustomize

kubectl apply -k k8s/
Check status

kubectl get pods -l app.kubernetes.io/name=sentinel


Includes:
- Deployment with 3 replicas
- HorizontalPodAutoscaler (3-10 replicas, CPU/memory targets)
- PodDisruptionBudget (min 2 available)
- NetworkPolicy (restricted ingress/egress)
- ServiceMonitor (Prometheus integration)
- Ingress with TLS
$3
Easiest production deployment with full parameterization:

`bash

`Install with default values`


helm install sentinel ./helm/sentinel \
  --set secrets.influxdbToken="your-token" \
  --set secrets.rabbitmqPassword="your-password"
Install with custom values

helm install sentinel ./helm/sentinel -f production-values.yaml
Upgrade

helm upgrade sentinel ./helm/sentinel
Uninstall

helm uninstall sentinel


Helm Chart Features:
- Full configuration parameterization (335+ options)
- External secrets support (AWS Secrets Manager, Vault, etc.)
- Multiple environment profiles (dev, staging, prod)
- Init containers for dependency checking
- Automatic ConfigMap/Secret generation
- Post-install notes with helpful commands
See Helm Chart README for complete documentation.
$3
Direct installation on Linux servers:

`bash

`Build release binary`


cargo build --release
Install

sudo cp target/release/sentinel /usr/local/bin/
Create config

sudo mkdir -p /etc/sentinel
sudo cp config/sentinel.yaml /etc/sentinel/
Create systemd service

sudo cp deployments/systemd/sentinel.service /etc/systemd/system/
sudo systemctl enable sentinel
sudo systemctl start sentinel


Quick Start
$3
- Rust 1.75+ (for building from source)
- Kafka 2.8+ (message queue)
- InfluxDB v3 (time-series database)
- RabbitMQ 3.8+ (optional, for alerting)
- Redis 6.0+ (optional, for distributed caching)
$3

`bash

`Clone repository`


git clone https://github.com/llm-devops/llm-sentinel.git
cd llm-sentinel
Start infrastructure

docker-compose up -d kafka influxdb rabbitmq redis
Build project

cargo build --release
Run tests

cargo test --workspace
Start service

export INFLUXDB_TOKEN="your-token"
export RABBITMQ_PASSWORD="your-password"
./target/release/sentinel --config config/sentinel.yaml

$3

`bash

`Create namespace`


kubectl create namespace sentinel
Create secrets

kubectl create secret generic sentinel-secrets \
  --from-literal=influxdb-token="your-token" \
  --from-literal=rabbitmq-password="your-password" \
  -n sentinel
Deploy with Helm

helm install sentinel ./helm/sentinel \
  --namespace sentinel \
  --set secrets.influxdbToken="your-token" \
  --set secrets.rabbitmqPassword="your-password"
Verify deployment

kubectl get pods -n sentinel
kubectl logs -f deployment/sentinel -n sentinel
Access dashboards

kubectl port-forward svc/sentinel 8080:8080 -n sentinel
open http://localhost:8080/metrics

$3

`bash

`Check health`


curl http://localhost:8080/health/live
curl http://localhost:8080/health/ready
View metrics

curl http://localhost:8080/metrics
Query recent anomalies

curl http://localhost:8080/api/v1/anomalies/recent?limit=10
Query telemetry

curl "http://localhost:8080/api/v1/telemetry?service=chat-api&hours=1"


Architecture
$3

`┌────────────────────────────────────────────────────────────────────┐ │ LLM Applications │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ OpenAI │ │ Claude │ │ Llama │ │ Custom │ │ │ │ API │ │ API │ │ API │ │ LLM API │ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ └───────┼─────────────┼─────────────┼──────────────┼────────────────┘ │ │ │ │ └─────────────┴─────────────┴──────────────┘ │ ▼ (Telemetry Events) ┌─────────────────────────────┐ │ Apache Kafka │ │ (llm.telemetry topic) │ └─────────────┬───────────────┘ │ ▼ ┌────────────────────────────────────────────────────────────────────┐ │ LLM-SENTINEL │ ├────────────────────────────────────────────────────────────────────┤ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ Ingestion Layer (sentinel-ingestion) │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌───────────────┐ │ │ │ │ │ Kafka │───>│ OTLP/JSON │───>│ Validation │ │ │ │ │ │ Consumer │ │ Parsing │ │ & PII Filter │ │ │ │ │ └──────────────┘ └──────────────┘ └───────┬───────┘ │ │ │ └────────────────────────────────────────────────────┼─────────┘ │ │ │ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ Detection Engine (sentinel-detection) │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ │ │ Baseline Manager (Multi-Dimensional) │ │ │ │ │ │ Per Service × Model × Metric Statistical Baselines │ │ │ │ │ └─────────────────────┬────────────────────────────────┘ │ │ │ │ │ │ │ │ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐│ │ │ │ │ Z-Score │ │ IQR │ │ MAD │ │ CUSUM ││ │ │ │ │ Detector │ │ Detector │ │ Detector │ │ Detector ││ │ │ │ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ └────┬─────┘│ │ │ │ └──────────────┴───────────────┴──────────────┘ │ │ │ │ │ (Anomalies) │ │ │ └────────────────────────┼─────────────────────────────────────┘ │ │ │ │ │ ┌────────────────────────┼─────────────────────────────────────┐ │ │ │ ▼ │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ InfluxDB │ │ RabbitMQ │ │ Webhook │ │ │ │ │ │ Storage │ │ Alerting │ │ Delivery │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ • Telemetry │ │ • Topic │ │ • HMAC │ │ │ │ │ │ • Anomalies │ │ Routing │ │ Signature │ │ │ │ │ │ • Query API │ │ • Severity │ │ • Retry │ │ │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ (sentinel-storage) (sentinel-alerting) │ │ │ └──────────────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ REST API (sentinel-api) │ │ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ │ │ Health │ │ Metrics │ │ Query │ │ │ │ │ │ /health/ │ │ /metrics │ │ /api/v1/ │ │ │ │ │ └────────────┘ └────────────┘ └────────────┘ │ │ │ └──────────────────────────────────────────────────────────────┘ │ └────────────────────────────────────────────────────────────────────┘ │ │ ▼ ▼ ┌──────────────────┐ ┌──────────────────┐ │ Prometheus │ │ Grafana │ │ (Metrics) │ │ (Dashboards) │ └──────────────────┘ └──────────────────┘`

`$3`

#### sentinel-core - Error types and handling - Configuration models - Telemetry event types - Shared utilities

#### sentinel-ingestion - Kafka consumer with group management - OTLP/JSON parsing - Schema validation - PII detection and sanitization - Configurable message handling

#### sentinel-detection - Statistical baseline management - Four detection algorithms (Z-Score, IQR, MAD, CUSUM) - Multi-dimensional baseline tracking - Lock-free concurrent updates - Baseline persistence

#### sentinel-storage - InfluxDB v3 client with batch writes - In-memory cache (Moka) - Redis distributed cache - Query API for historical data - Automatic TTL management

#### sentinel-alerting - RabbitMQ topic publisher - Webhook HTTP delivery - Alert deduplication (5-minute window) - Exponential backoff retry - HMAC signature generation

#### sentinel-api - REST API server (Axum) - Health check endpoints - Prometheus metrics exporter - Query endpoints for telemetry and anomalies - CORS support

`Observability`

`$3`

Four production-ready dashboards included in deployments/grafana/dashboards/:

#### 1. Anomaly Detection Dashboard - Anomaly detection rate by severity (timeseries) - Total anomaly rate gauge - Anomalies in last 24h (stat) - Anomaly types distribution (bar chart) - Top services by anomalies - Detection latency percentiles (P50, P95, P99)

#### 2. System Health Dashboard - Service status (up/down) - Active instance count - Uptime tracking - Memory usage by instance - CPU usage by instance - Event processing throughput - Error rates by component

#### 3. Performance Metrics Dashboard - Events processed per second - Detection P95 latency - Cache hit rate percentage - Pipeline throughput (Kafka → Detection → Storage) - Storage write rate - API request rate

#### 4. Alert Overview Dashboard - Alerts sent (last hour) - Alerts deduplicated (last hour) - Alert failures (last hour) - RabbitMQ publishes by severity - Webhook delivery success rate - Alert distribution by type

`$3`

50+ metrics exported at /metrics:

Ingestion Metrics: -sentinel_events_received_total- Total events consumed from Kafka -sentinel_events_processed_total- Successfully processed events -sentinel_ingestion_errors_total- Ingestion errors by type -sentinel_validation_failures_total- Validation failures -sentinel_kafka_messages_consumed_total- Kafka messages consumed -sentinel_kafka_consumption_errors_total - Kafka errors

Detection Metrics: -sentinel_anomalies_detected_total- Anomalies by severity and detector -sentinel_detection_latency_seconds- Detection latency histogram -sentinel_detection_errors_total- Detection errors -sentinel_baseline_updates_total- Baseline update count -sentinel_baseline_samples - Current baseline sample counts

Storage Metrics: -sentinel_storage_writes_total- Successful storage writes -sentinel_storage_errors_total- Storage errors -sentinel_cache_hits_total- Cache hits -sentinel_cache_misses_total- Cache misses -sentinel_cache_size - Current cache size

Alerting Metrics: -sentinel_alerts_sent_total- Alerts sent by channel -sentinel_alerts_deduplicated_total- Deduplicated alerts -sentinel_alert_failures_total- Alert delivery failures -sentinel_rabbitmq_publishes_total- RabbitMQ publishes -sentinel_webhook_deliveries_total- Webhook deliveries -sentinel_webhook_failures_total - Webhook failures

`$3`

50+ production-ready alert rules in deployments/prometheus/alerts/sentinel-alerts.yaml:

Service Health: - SentinelServiceDown (1m down) - SentinelHighRestartRate (frequent restarts) - SentinelInsufficientInstances (<2 instances)

Performance: - SentinelHighLatency (P95 >50ms) - SentinelVeryHighLatency (P95 >100ms) - SentinelLowThroughput (<100 events/sec)

Errors: - SentinelHighErrorRate (>10 errors/sec) - SentinelIngestionErrors (>5 errors/sec) - SentinelDetectionErrors (>5 errors/sec) - SentinelStorageErrors (>5 errors/sec)

Resources: - SentinelHighMemoryUsage (>1.5GB) - SentinelCriticalMemoryUsage (>1.8GB) - SentinelHighCPUUsage (>80%)

Anomalies: - SentinelAnomalySpike (>50 anomalies/sec) - SentinelCriticalAnomalies (>5 critical/sec) - SentinelNoAnomalies (0 anomalies for 6h - detection health check)

`API Reference`

`$3`

#### Liveness Probe`bash GET /health/live

Response: 200 OK { "status": "healthy", "timestamp": "2024-11-06T10:30:00Z" }`

#### Readiness Probe`bash GET /health/ready

Response: 200 OK (when ready to accept traffic) Response: 503 Service Unavailable (when not ready)`

`$3`

#### Prometheus Metrics`bash GET /metrics

Response: 200 OK

`HELP sentinel_events_processed_total Total events processed`


TYPE sentinel_events_processed_total counter

sentinel_events_processed_total{service="chat-api"} 12453
...

$3

#### Query Recent Telemetry`bash GET /api/v1/telemetry?service={service}&model={model}&hours={hours}

Example: GET /api/v1/telemetry?service=chat-api&model=gpt-4&hours=24

Response: 200 OK { "data": [...], "count": 1523, "timeRange": { "start": "2024-11-05T10:30:00Z", "end": "2024-11-06T10:30:00Z" } }`

#### Query Anomalies`bash GET /api/v1/anomalies?severity={severity}&hours={hours}&limit={limit}

Example: GET /api/v1/anomalies?severity=critical&hours=1&limit=50

Response: 200 OK { "anomalies": [ { "id": "anom-123", "timestamp": "2024-11-06T10:25:00Z", "service": "chat-api", "model": "gpt-4", "detector": "zscore", "metric": "latency_ms", "value": 15234.5, "baseline_mean": 1234.5, "baseline_stddev": 234.2, "z_score": 59.8, "severity": "critical" } ], "count": 3 }`

#### Query Recent Anomalies`bash GET /api/v1/anomalies/recent?limit={limit}

Example: GET /api/v1/anomalies/recent?limit=10

Response: 200 OK (last 10 anomalies)`

#### Query Baseline Statistics`bash GET /api/v1/baselines?service={service}&model={model}

Response: 200 OK { "baselines": [ { "key": "chat-api:gpt-4:latency_ms", "samples": 1000, "mean": 1234.5, "stddev": 234.2, "median": 1205.3, "p95": 1687.2, "p99": 2103.4 } ] }`

`Example Producers`

`$3`

Full-featured Python example for sending telemetry to Kafka:

`bash cd examples/python

`Install dependencies`


pip install -r requirements.txt
Run with defaults (20 normal + 5 anomalous events)

python producer.py --brokers localhost:9092
Run continuously

python producer.py --continuous
Custom configuration

python producer.py \
  --brokers kafka-0:9092,kafka-1:9092 \
  --topic llm.telemetry \
  --normal-events 50 \
  --anomalous-events 10


Features:
- Configurable event generation
- Simulates 4 anomaly types (high latency, high tokens, high cost, suspicious patterns)
- Kafka integration with guaranteed delivery
- Continuous mode for load testing
See Python Example README
$3
High-performance Go example for production use:

`bash cd examples/go

`Build`


go build -o producer producer.go
Run with defaults

./producer -brokers localhost:9092
Run continuously

./producer -continuous
Custom configuration

./producer \
  -brokers kafka-0:9092,kafka-1:9092 \
  -topic llm.telemetry \
  -normal-events 100 \
  -anomalous-events 20


Features:
- Native Go performance (10,000+ events/sec)
- Graceful shutdown (SIGINT/SIGTERM)
- Connection pooling and retries
- Minimal memory footprint (<20MB)
See Go Example README
$3
Integrate telemetry into your LLM application:

`python from llm_sentinel import TelemetryProducer

`Initialize`


producer = TelemetryProducer(
    brokers=["kafka:9092"],
    topic="llm.telemetry"
)
After each LLM API call

event = producer.create_telemetry_event(
    service_name="my-chatbot",
    model_name="gpt-4",
    latency_ms=response_time,
    prompt_tokens=completion.usage.prompt_tokens,
    completion_tokens=completion.usage.completion_tokens,
    cost_usd=calculated_cost,
    user_id=user.id,
    session_id=session.id
)

producer.send_event(event)`

`Configuration`

`$3`

`yaml

`Ingestion configuration`


ingestion:
  kafka:
    brokers:
      - "kafka-0:9092"
      - "kafka-1:9092"
      - "kafka-2:9092"
    topic: "llm.telemetry"
    group_id: "sentinel-consumer"
    session_timeout_ms: 6000
    enable_auto_commit: false
    auto_offset_reset: "latest"
    max_poll_records: 500
  parsing:
    max_text_length: 10000
    enable_sanitization: true
    sanitize_patterns:
      - "password"
      - "api_key"
      - "secret"
  validation:
    min_latency_ms: 0.1
    max_latency_ms: 300000.0  # 5 minutes
    max_tokens: 100000
    max_cost_usd: 100.0
    enable_pii_detection: true
Detection configuration

detection:
  enabled_detectors:
    - "zscore"
    - "iqr"
    - "mad"
    - "cusum"
  baseline:
    window_size: 1000
    min_samples: 10
    update_interval_secs: 60
    enable_persistence: true
    persistence_path: "/var/lib/sentinel/baselines"
  zscore:
    threshold: 3.0
    sensitivity: "medium"  # low, medium, high
    metrics:
      - "latency_ms"
      - "total_tokens"
      - "cost_usd"
  iqr:
    multiplier: 1.5
    metrics:
      - "latency_ms"
      - "total_tokens"
  mad:
    threshold: 3.5
    metrics:
      - "latency_ms"
  cusum:
    threshold: 5.0
    drift: 0.5
    metrics:
      - "latency_ms"
Storage configuration

storage:
  influxdb:
    url: "http://influxdb:8086"
    org: "sentinel"
    token: "${INFLUXDB_TOKEN}"
    telemetry_bucket: "telemetry"
    anomaly_bucket: "anomalies"
    batch_size: 100
    timeout_secs: 10
  cache:
    max_capacity: 10000
    ttl_secs: 300
    tti_secs: 60
    enable_metrics: true
  redis:
    enabled: true
    url: "redis://redis:6379"
    password: "${REDIS_PASSWORD}"
    key_prefix: "sentinel:"
    ttl_secs: 300
Alerting configuration

alerting:
  rabbitmq:
    url: "amqp://rabbitmq:5672"
    username: "sentinel"
    password: "${RABBITMQ_PASSWORD}"
    exchange: "sentinel.alerts"
    exchange_type: "topic"
    routing_key_prefix: "alert"
    persistent: true
    timeout_secs: 10
    retry_config:
      max_attempts: 3
      initial_delay_ms: 1000
      backoff_multiplier: 2.0
      max_delay_ms: 30000
  webhook:
    enabled: false
    url: "https://alerts.example.com/webhook"
    method: "POST"
    secret: "${WEBHOOK_SECRET}"
    timeout_secs: 10
    max_retries: 3
    retry_delay_ms: 1000
    backoff_multiplier: 2.0
  deduplication:
    enabled: true
    window_secs: 300
    cleanup_interval_secs: 60
API configuration

api:
  bind_addr: "0.0.0.0:8080"
  enable_cors: true
  cors_origins:
    - "*"
  timeout_secs: 30
  max_body_size: 10485760  # 10MB
  enable_logging: true
  metrics_path: "/metrics"


See config/sentinel.yaml for a complete annotated example.
$3
All sensitive configuration can be provided via environment variables:

`bash export INFLUXDB_TOKEN="your-influxdb-token" export RABBITMQ_PASSWORD="your-rabbitmq-password" export REDIS_PASSWORD="your-redis-password" export WEBHOOK_SECRET="your-webhook-secret" export SENTINEL_LOG_LEVEL="info" export SENTINEL_LOG_JSON="true" export RUST_BACKTRACE="1"`

`Performance`

`$3`

Tested on AWS m5.xlarge (4 vCPU, 16GB RAM):

| Operation | Throughput | P50 | P95 | P99 | |-----------|------------|-----|-----|-----| | Ingestion | 10,500 events/sec | 2ms | 8ms | 15ms | | Detection | 5,200 detections/sec | 3ms | 12ms | 25ms | | Storage (batched) | 8,300 writes/sec | 5ms | 18ms | 35ms | | API Queries | 2,800 req/sec | 8ms | 25ms | 48ms |

`$3`

| Configuration | Memory | CPU | Disk I/O | |---------------|--------|-----|----------| | Idle | 180MB | 0.1 cores | 1 MB/s | | Light load (1k events/sec) | 320MB | 0.8 cores | 15 MB/s | | Medium load (5k events/sec) | 520MB | 1.8 cores | 42 MB/s | | Heavy load (10k events/sec) | 850MB | 2.9 cores | 78 MB/s |

`$3`

- Horizontal: Linear scaling up to 10 replicas - Vertical: Efficient use of multi-core (up to 8 cores) - Storage: InfluxDB handles 100k+ writes/sec with proper sizing - Network: ~10 Mbps at 10k events/sec (depends on event size)

`Development`

`$3`

`llm-sentinel/ ├── crates/ │ ├── sentinel-core/ # Core types and error handling (1,350 lines) │ ├── sentinel-ingestion/ # Kafka consumer and OTLP parsing (1,390 lines) │ ├── sentinel-detection/ # Anomaly detection algorithms (2,319 lines) │ ├── sentinel-storage/ # InfluxDB and caching (987 lines) │ ├── sentinel-alerting/ # RabbitMQ and webhooks (1,645 lines) │ └── sentinel-api/ # REST API server (1,452 lines) ├── sentinel/ # Main binary (285 lines) ├── config/ # Configuration examples ├── deployments/ # Deployment configurations │ ├── grafana/ # 4 Grafana dashboards │ └── prometheus/ # 50+ alert rules ├── docs/ # Complete documentation (27 files) ├── examples/ # Example producers │ ├── python/ # Python producer example │ └── go/ # Go producer example ├── helm/ # Helm chart │ └── sentinel/ # Production-ready chart ├── k8s/ # Kubernetes manifests (13 files) ├── .github/workflows/ # CI/CD pipelines ├── Cargo.toml # Rust workspace ├── docker-compose.yaml # Local development environment └── Dockerfile # Multi-stage Docker build`

Total: ~9,500 lines of production Rust code

`$3`

`bash

`All tests`


cargo test --workspace
Specific crate

cargo test -p sentinel-detection
With output

cargo test -- --nocapture
Integration tests only

cargo test --workspace --test '*'
With coverage

cargo tarpaulin --out Html --output-dir coverage

$3

`bash

`Debug build`


cargo build
Release build (optimized)

cargo build --release
With specific features

cargo build --features redis
Docker build

docker build -t llm-sentinel:latest .
Multi-platform Docker build

docker buildx build --platform linux/amd64,linux/arm64 -t llm-sentinel:latest .

$3

`bash

`Format code`


cargo fmt
Lint with clippy

cargo clippy -- -D warnings
Security audit

cargo audit
Check dependencies

cargo deny check


CI/CD Pipeline
Full GitHub Actions workflow:
$3
- Format check (rustfmt)
- Lint (clippy with all warnings as errors)
- Build (debug and release)
- Test (all workspaces)
- Coverage (tarpaulin with 80% target)
- Security audit (cargo-audit)
- Dependency check (cargo-deny)
$3
- Docker build (multi-stage)
- SBOM generation (syft)
- Image signing (cosign)
- Vulnerability scan (trivy)
- Push to registry (GHCR)
- Helm package (chart packaging)
- Deploy (to staging/production)
Contributing
We welcome contributions! Please see CONTRIBUTING.md for:
- Code of Conduct
- Development setup
- Coding standards
- Testing requirements
- Pull request process
- Security reporting
Documentation

Complete documentation available in /docs`:

- Architecture Overview
- Detection Methods
- Deployment Guide
- Integration Examples
- Performance Benchmarks
- Configuration Reference

License

This project is licensed under the Apache License 2.0 - see LICENSE file for details.

Support

- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Complete docs
- Security: security@llm-devops.io

Roadmap

- [ ] Additional detection algorithms (Isolation Forest, AutoEncoder)
- [ ] Real-time dashboard (WebSocket streaming)
- [ ] Multi-tenant support
- [ ] Advanced query DSL
- [ ] Machine learning model drift detection
- [ ] Automated baseline tuning
- [ ] Support for additional message brokers (NATS, Pulsar)
- [ ] OpenTelemetry native ingestion
- [ ] Grafana plugin for custom visualizations

---

Status: ✅ Production Ready
Version: 0.1.0
License: Apache 2.0
Built with: Rust 🦀
Last Updated: 2025-11-06

LLM-Sentinel

Enterprise-Grade Anomaly Detection and Observability for LLM Applications

![Rust](https://www.rust-lang.org/)
![License](LICENSE)
![Build]()
![Kubernetes](https://kubernetes.io/)
![Helm](https://helm.sh/)

Status: ✅ Production Ready - Full implementation with enterprise-grade deployment infrastructure

- Key Features
- Detection Capabilities
- Deployment Options
- Quick Start
- Architecture
- Observability
- API Reference
- Example Producers
- Configuration
- Performance
- Contributing

Key Features

$3

Monitor all critical LLM metrics:

$3

Detection Capabilities

$3

LLM-Sentinel provides four complementary detection algorithms that can be enabled individually or in combination:

#### 1. Z-Score Detection (Parametric)

Best for: Metrics that follow a normal distribution

``yaml detection: zscore: threshold: 3.0 # Standard deviations from mean sensitivity: "medium" # low, medium, high metrics: - latency_ms - total_tokens - cost_usd`

Use Cases: - Latency spike detection - Token usage monitoring - Cost anomaly detection

#### 2. IQR Detection (Non-Parametric)

Best for: Metrics with skewed distributions or outliers

`yaml detection: iqr: multiplier: 1.5 # IQR multiplier for outliers metrics: - latency_ms - total_tokens`

Use Cases: - Robust outlier detection - Handling non-normal distributions - Resistant to extreme values

#### 3. MAD Detection (Robust)

Best for: Metrics requiring high robustness to outliers

`yaml detection: mad: threshold: 3.5 # MAD threshold metrics: - latency_ms`

Use Cases: - Ultra-robust detection - Minimal false positives - Gradual baseline shifts

#### 4. CUSUM Detection (Change Point)

Best for: Detecting sustained changes and drift

`yaml detection: cusum: threshold: 5.0 # Cumulative threshold drift: 0.5 # Drift parameter metrics: - latency_ms`

Use Cases: - Model performance degradation - Service quality drift - Gradual system changes

`$3`

`Deployment Options`

`$3`

Perfect for local development and testing:

`bash

`Start full environment (Kafka, InfluxDB, RabbitMQ, Redis, Prometheus, Grafana)`


docker-compose up -d
View logs

docker-compose logs -f sentinel
Stop environment

docker-compose down


Includes:
- Sentinel (3 replicas)
- Kafka cluster (3 brokers)
- InfluxDB v3
- RabbitMQ with management UI
- Redis
- Prometheus
- Grafana with pre-loaded dashboards
$3
Production-grade deployment with all manifests:

`bash

`Apply all manifests`


kubectl apply -f k8s/
Or using kustomize

kubectl apply -k k8s/
Check status

kubectl get pods -l app.kubernetes.io/name=sentinel


Includes:
- Deployment with 3 replicas
- HorizontalPodAutoscaler (3-10 replicas, CPU/memory targets)
- PodDisruptionBudget (min 2 available)
- NetworkPolicy (restricted ingress/egress)
- ServiceMonitor (Prometheus integration)
- Ingress with TLS
$3
Easiest production deployment with full parameterization:

`bash

`Install with default values`


helm install sentinel ./helm/sentinel \
  --set secrets.influxdbToken="your-token" \
  --set secrets.rabbitmqPassword="your-password"
Install with custom values

helm install sentinel ./helm/sentinel -f production-values.yaml
Upgrade

helm upgrade sentinel ./helm/sentinel
Uninstall

helm uninstall sentinel


Helm Chart Features:
- Full configuration parameterization (335+ options)
- External secrets support (AWS Secrets Manager, Vault, etc.)
- Multiple environment profiles (dev, staging, prod)
- Init containers for dependency checking
- Automatic ConfigMap/Secret generation
- Post-install notes with helpful commands
See Helm Chart README for complete documentation.
$3
Direct installation on Linux servers:

`bash

`Build release binary`


cargo build --release
Install

sudo cp target/release/sentinel /usr/local/bin/
Create config

sudo mkdir -p /etc/sentinel
sudo cp config/sentinel.yaml /etc/sentinel/
Create systemd service

sudo cp deployments/systemd/sentinel.service /etc/systemd/system/
sudo systemctl enable sentinel
sudo systemctl start sentinel


Quick Start
$3
- Rust 1.75+ (for building from source)
- Kafka 2.8+ (message queue)
- InfluxDB v3 (time-series database)
- RabbitMQ 3.8+ (optional, for alerting)
- Redis 6.0+ (optional, for distributed caching)
$3

`bash

`Clone repository`


git clone https://github.com/llm-devops/llm-sentinel.git
cd llm-sentinel
Start infrastructure

docker-compose up -d kafka influxdb rabbitmq redis
Build project

cargo build --release
Run tests

cargo test --workspace
Start service

export INFLUXDB_TOKEN="your-token"
export RABBITMQ_PASSWORD="your-password"
./target/release/sentinel --config config/sentinel.yaml

$3

`bash

`Create namespace`


kubectl create namespace sentinel
Create secrets

kubectl create secret generic sentinel-secrets \
  --from-literal=influxdb-token="your-token" \
  --from-literal=rabbitmq-password="your-password" \
  -n sentinel
Deploy with Helm

helm install sentinel ./helm/sentinel \
  --namespace sentinel \
  --set secrets.influxdbToken="your-token" \
  --set secrets.rabbitmqPassword="your-password"
Verify deployment

kubectl get pods -n sentinel
kubectl logs -f deployment/sentinel -n sentinel
Access dashboards

kubectl port-forward svc/sentinel 8080:8080 -n sentinel
open http://localhost:8080/metrics

$3

`bash

`Check health`


curl http://localhost:8080/health/live
curl http://localhost:8080/health/ready
View metrics

curl http://localhost:8080/metrics
Query recent anomalies

curl http://localhost:8080/api/v1/anomalies/recent?limit=10
Query telemetry

curl "http://localhost:8080/api/v1/telemetry?service=chat-api&hours=1"


Architecture
$3

`$3`

#### sentinel-core - Error types and handling - Configuration models - Telemetry event types - Shared utilities

#### sentinel-ingestion - Kafka consumer with group management - OTLP/JSON parsing - Schema validation - PII detection and sanitization - Configurable message handling

#### sentinel-storage - InfluxDB v3 client with batch writes - In-memory cache (Moka) - Redis distributed cache - Query API for historical data - Automatic TTL management

#### sentinel-alerting - RabbitMQ topic publisher - Webhook HTTP delivery - Alert deduplication (5-minute window) - Exponential backoff retry - HMAC signature generation

#### sentinel-api - REST API server (Axum) - Health check endpoints - Prometheus metrics exporter - Query endpoints for telemetry and anomalies - CORS support

`Observability`

`$3`

Four production-ready dashboards included in deployments/grafana/dashboards/:

`$3`

50+ metrics exported at /metrics:

`$3`

50+ production-ready alert rules in deployments/prometheus/alerts/sentinel-alerts.yaml:

Service Health: - SentinelServiceDown (1m down) - SentinelHighRestartRate (frequent restarts) - SentinelInsufficientInstances (<2 instances)

Performance: - SentinelHighLatency (P95 >50ms) - SentinelVeryHighLatency (P95 >100ms) - SentinelLowThroughput (<100 events/sec)

Errors: - SentinelHighErrorRate (>10 errors/sec) - SentinelIngestionErrors (>5 errors/sec) - SentinelDetectionErrors (>5 errors/sec) - SentinelStorageErrors (>5 errors/sec)

Resources: - SentinelHighMemoryUsage (>1.5GB) - SentinelCriticalMemoryUsage (>1.8GB) - SentinelHighCPUUsage (>80%)

Anomalies: - SentinelAnomalySpike (>50 anomalies/sec) - SentinelCriticalAnomalies (>5 critical/sec) - SentinelNoAnomalies (0 anomalies for 6h - detection health check)

`API Reference`

`$3`

#### Liveness Probe`bash GET /health/live

Response: 200 OK { "status": "healthy", "timestamp": "2024-11-06T10:30:00Z" }`

#### Readiness Probe`bash GET /health/ready

Response: 200 OK (when ready to accept traffic) Response: 503 Service Unavailable (when not ready)`

`$3`

#### Prometheus Metrics`bash GET /metrics

Response: 200 OK

`HELP sentinel_events_processed_total Total events processed`


TYPE sentinel_events_processed_total counter

sentinel_events_processed_total{service="chat-api"} 12453
...

$3

#### Query Recent Telemetry`bash GET /api/v1/telemetry?service={service}&model={model}&hours={hours}

Example: GET /api/v1/telemetry?service=chat-api&model=gpt-4&hours=24

Response: 200 OK { "data": [...], "count": 1523, "timeRange": { "start": "2024-11-05T10:30:00Z", "end": "2024-11-06T10:30:00Z" } }`

#### Query Anomalies`bash GET /api/v1/anomalies?severity={severity}&hours={hours}&limit={limit}

Example: GET /api/v1/anomalies?severity=critical&hours=1&limit=50

#### Query Recent Anomalies`bash GET /api/v1/anomalies/recent?limit={limit}

Example: GET /api/v1/anomalies/recent?limit=10

Response: 200 OK (last 10 anomalies)`

#### Query Baseline Statistics`bash GET /api/v1/baselines?service={service}&model={model}

Response: 200 OK { "baselines": [ { "key": "chat-api:gpt-4:latency_ms", "samples": 1000, "mean": 1234.5, "stddev": 234.2, "median": 1205.3, "p95": 1687.2, "p99": 2103.4 } ] }`

`Example Producers`

`$3`

Full-featured Python example for sending telemetry to Kafka:

`bash cd examples/python

`Install dependencies`


pip install -r requirements.txt
Run with defaults (20 normal + 5 anomalous events)

python producer.py --brokers localhost:9092
Run continuously

python producer.py --continuous
Custom configuration

python producer.py \
  --brokers kafka-0:9092,kafka-1:9092 \
  --topic llm.telemetry \
  --normal-events 50 \
  --anomalous-events 10


Features:
- Configurable event generation
- Simulates 4 anomaly types (high latency, high tokens, high cost, suspicious patterns)
- Kafka integration with guaranteed delivery
- Continuous mode for load testing
See Python Example README
$3
High-performance Go example for production use:

`bash cd examples/go

`Build`


go build -o producer producer.go
Run with defaults

./producer -brokers localhost:9092
Run continuously

./producer -continuous
Custom configuration

./producer \
  -brokers kafka-0:9092,kafka-1:9092 \
  -topic llm.telemetry \
  -normal-events 100 \
  -anomalous-events 20


Features:
- Native Go performance (10,000+ events/sec)
- Graceful shutdown (SIGINT/SIGTERM)
- Connection pooling and retries
- Minimal memory footprint (<20MB)
See Go Example README
$3
Integrate telemetry into your LLM application:

`python from llm_sentinel import TelemetryProducer

`Initialize`


producer = TelemetryProducer(
    brokers=["kafka:9092"],
    topic="llm.telemetry"
)
After each LLM API call

event = producer.create_telemetry_event(
    service_name="my-chatbot",
    model_name="gpt-4",
    latency_ms=response_time,
    prompt_tokens=completion.usage.prompt_tokens,
    completion_tokens=completion.usage.completion_tokens,
    cost_usd=calculated_cost,
    user_id=user.id,
    session_id=session.id
)

producer.send_event(event)`

`Configuration`

`$3`

`yaml

`Ingestion configuration`


ingestion:
  kafka:
    brokers:
      - "kafka-0:9092"
      - "kafka-1:9092"
      - "kafka-2:9092"
    topic: "llm.telemetry"
    group_id: "sentinel-consumer"
    session_timeout_ms: 6000
    enable_auto_commit: false
    auto_offset_reset: "latest"
    max_poll_records: 500
  parsing:
    max_text_length: 10000
    enable_sanitization: true
    sanitize_patterns:
      - "password"
      - "api_key"
      - "secret"
  validation:
    min_latency_ms: 0.1
    max_latency_ms: 300000.0  # 5 minutes
    max_tokens: 100000
    max_cost_usd: 100.0
    enable_pii_detection: true
Detection configuration

detection:
  enabled_detectors:
    - "zscore"
    - "iqr"
    - "mad"
    - "cusum"
  baseline:
    window_size: 1000
    min_samples: 10
    update_interval_secs: 60
    enable_persistence: true
    persistence_path: "/var/lib/sentinel/baselines"
  zscore:
    threshold: 3.0
    sensitivity: "medium"  # low, medium, high
    metrics:
      - "latency_ms"
      - "total_tokens"
      - "cost_usd"
  iqr:
    multiplier: 1.5
    metrics:
      - "latency_ms"
      - "total_tokens"
  mad:
    threshold: 3.5
    metrics:
      - "latency_ms"
  cusum:
    threshold: 5.0
    drift: 0.5
    metrics:
      - "latency_ms"
Storage configuration

storage:
  influxdb:
    url: "http://influxdb:8086"
    org: "sentinel"
    token: "${INFLUXDB_TOKEN}"
    telemetry_bucket: "telemetry"
    anomaly_bucket: "anomalies"
    batch_size: 100
    timeout_secs: 10
  cache:
    max_capacity: 10000
    ttl_secs: 300
    tti_secs: 60
    enable_metrics: true
  redis:
    enabled: true
    url: "redis://redis:6379"
    password: "${REDIS_PASSWORD}"
    key_prefix: "sentinel:"
    ttl_secs: 300
Alerting configuration

alerting:
  rabbitmq:
    url: "amqp://rabbitmq:5672"
    username: "sentinel"
    password: "${RABBITMQ_PASSWORD}"
    exchange: "sentinel.alerts"
    exchange_type: "topic"
    routing_key_prefix: "alert"
    persistent: true
    timeout_secs: 10
    retry_config:
      max_attempts: 3
      initial_delay_ms: 1000
      backoff_multiplier: 2.0
      max_delay_ms: 30000
  webhook:
    enabled: false
    url: "https://alerts.example.com/webhook"
    method: "POST"
    secret: "${WEBHOOK_SECRET}"
    timeout_secs: 10
    max_retries: 3
    retry_delay_ms: 1000
    backoff_multiplier: 2.0
  deduplication:
    enabled: true
    window_secs: 300
    cleanup_interval_secs: 60
API configuration

api:
  bind_addr: "0.0.0.0:8080"
  enable_cors: true
  cors_origins:
    - "*"
  timeout_secs: 30
  max_body_size: 10485760  # 10MB
  enable_logging: true
  metrics_path: "/metrics"


See config/sentinel.yaml for a complete annotated example.
$3
All sensitive configuration can be provided via environment variables:

`Performance`

`$3`

Tested on AWS m5.xlarge (4 vCPU, 16GB RAM):

`$3`

`Development`

`$3`

Total: ~9,500 lines of production Rust code

`$3`

`bash

`All tests`


cargo test --workspace
Specific crate

cargo test -p sentinel-detection
With output

cargo test -- --nocapture
Integration tests only

cargo test --workspace --test '*'
With coverage

cargo tarpaulin --out Html --output-dir coverage

$3

`bash

`Debug build`


cargo build
Release build (optimized)

cargo build --release
With specific features

cargo build --features redis
Docker build

docker build -t llm-sentinel:latest .
Multi-platform Docker build

docker buildx build --platform linux/amd64,linux/arm64 -t llm-sentinel:latest .

$3

`bash

`Format code`


cargo fmt
Lint with clippy

cargo clippy -- -D warnings
Security audit

cargo audit
Check dependencies

cargo deny check


CI/CD Pipeline
Full GitHub Actions workflow:
$3
- Format check (rustfmt)
- Lint (clippy with all warnings as errors)
- Build (debug and release)
- Test (all workspaces)
- Coverage (tarpaulin with 80% target)
- Security audit (cargo-audit)
- Dependency check (cargo-deny)
$3
- Docker build (multi-stage)
- SBOM generation (syft)
- Image signing (cosign)
- Vulnerability scan (trivy)
- Push to registry (GHCR)
- Helm package (chart packaging)
- Deploy (to staging/production)
Contributing
We welcome contributions! Please see CONTRIBUTING.md for:
- Code of Conduct
- Development setup
- Coding standards
- Testing requirements
- Pull request process
- Security reporting
Documentation

Complete documentation available in /docs`:

- Architecture Overview
- Detection Methods
- Deployment Guide
- Integration Examples
- Performance Benchmarks
- Configuration Reference

License

This project is licensed under the Apache License 2.0 - see LICENSE file for details.

Support

- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Complete docs
- Security: security@llm-devops.io

Roadmap

---

Status: ✅ Production Ready
Version: 0.1.0
License: Apache 2.0
Built with: Rust 🦀
Last Updated: 2025-11-06

@llm-dev-ops/sentinel-cli

LLM-Sentinel

Table of Contents

Key Features

$3

$3

$3

$3

$3

$3

$3

$3

$3

Detection Capabilities

$3

$3

Deployment Options

$3

Start full environment (Kafka, InfluxDB, RabbitMQ, Redis, Prometheus, Grafana)

View logs

Stop environment

$3

Apply all manifests

Or using kustomize

Check status

$3

Install with default values

Install with custom values

Upgrade

Uninstall

$3

Build release binary

Install

Create config

Create systemd service

Quick Start

$3

$3

Clone repository

Start infrastructure

Build project

Run tests

Start service

$3

Create namespace

Create secrets

Deploy with Helm

Verify deployment

Access dashboards

$3

Check health

View metrics

Query recent anomalies

Query telemetry

Architecture

$3

$3

Observability

$3

$3

$3

API Reference

$3

$3

HELP sentinel_events_processed_total Total events processed

TYPE sentinel_events_processed_total counter

$3

Example Producers

$3

Install dependencies

Run with defaults (20 normal + 5 anomalous events)

Run continuously

Custom configuration

$3

Build

Run with defaults

Run continuously

Custom configuration

$3

Initialize

`$3`

`Deployment Options`

`$3`

`Start full environment (Kafka, InfluxDB, RabbitMQ, Redis, Prometheus, Grafana)`

`Apply all manifests`

`Install with default values`

`Build release binary`

`Clone repository`

`Create namespace`

`Check health`

`$3`

`Observability`

`$3`

`$3`

`$3`

`API Reference`

`$3`

`$3`

`HELP sentinel_events_processed_total Total events processed`

`Example Producers`

`$3`

`Install dependencies`

`Build`

`Initialize`

`Configuration`

`$3`

`Ingestion configuration`

`Performance`

`$3`

`$3`

`$3`

`Development`

`$3`

`$3`

`All tests`

`Debug build`

`Format code`

`$3`

`Deployment Options`

`$3`

`Start full environment (Kafka, InfluxDB, RabbitMQ, Redis, Prometheus, Grafana)`

`Apply all manifests`

`Install with default values`