LLM Observability with Langfuse
With AI features in multiple portfolio apps, I needed visibility into LLM usage. Today I deployed Langfuse - an open-source LLM observability platform.
Why Langfuse?
Running local LLMs and calling external APIs (OpenAI, Anthropic) without observability is flying blind:
- What prompts are being sent?
- What's the latency distribution?
- How much is each feature costing?
- Are there errors or rate limits?
- What's the token usage per user?
Langfuse answers all of these with tracing, analytics, and prompt management.
Architecture
Langfuse consists of:
- Web UI - Dashboard for traces and analytics
- API - Ingestion endpoint for trace data
- ClickHouse - Analytics database for fast aggregations
- PostgreSQL - Metadata and user management
- S3-compatible storage - For large payloads (MinIO in my case)
Deployment
I created a dedicated namespace and deployed via Helm values:
apiVersion: v1
kind: Namespace
metadata:
name: langfuse
---
# Using External Secrets for sensitive config
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: langfuse-secrets
namespace: langfuse
spec:
secretStoreRef:
name: doppler-secret-store
kind: ClusterSecretStore
target:
name: langfuse-secrets
data:
- secretKey: DATABASE_URL
remoteRef:
key: LANGFUSE_DATABASE_URL
- secretKey: NEXTAUTH_SECRET
remoteRef:
key: LANGFUSE_NEXTAUTH_SECRET
- secretKey: SALT
remoteRef:
key: LANGFUSE_SALT
- secretKey: ENCRYPTION_KEY
remoteRef:
key: LANGFUSE_ENCRYPTION_KEYThe main deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: langfuse-web
namespace: langfuse
spec:
replicas: 1
template:
spec:
containers:
- name: langfuse
image: langfuse/langfuse:latest
ports:
- containerPort: 3000
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: langfuse-secrets
key: DATABASE_URL
- name: NEXTAUTH_URL
value: "https://langfuse.el-jefe.me"
- name: CLICKHOUSE_URL
value: "http://langfuse-clickhouse:8123"ClickHouse for Analytics
ClickHouse handles the analytics workload:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: langfuse-clickhouse
namespace: langfuse
spec:
serviceName: langfuse-clickhouse
replicas: 1
template:
spec:
containers:
- name: clickhouse
image: clickhouse/clickhouse-server:24.1
ports:
- containerPort: 8123
- containerPort: 9000
volumeMounts:
- name: clickhouse-data
mountPath: /var/lib/clickhouse
volumeClaimTemplates:
- metadata:
name: clickhouse-data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20GiInstrumenting Applications
I added Langfuse tracing to my AI-powered apps. For the Python backend:
from langfuse import Langfuse
langfuse = Langfuse(
public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
secret_key=os.environ["LANGFUSE_SECRET_KEY"],
host="https://langfuse.el-jefe.me"
)
@langfuse.trace()
def generate_response(user_message: str) -> str:
# Create a generation span
generation = langfuse.generation(
name="llama-3.2-response",
model="llama3.2:3b",
input=user_message,
)
response = ollama.chat(
model="llama3.2:3b",
messages=[{"role": "user", "content": user_message}]
)
generation.end(
output=response["message"]["content"],
usage={
"input_tokens": response["prompt_eval_count"],
"output_tokens": response["eval_count"],
}
)
return response["message"]["content"]For JavaScript/Node.js:
import { Langfuse } from 'langfuse';
const langfuse = new Langfuse({
publicKey: process.env.LANGFUSE_PUBLIC_KEY,
secretKey: process.env.LANGFUSE_SECRET_KEY,
baseUrl: 'https://langfuse.el-jefe.me',
});
async function chat(message) {
const trace = langfuse.trace({ name: 'chat-request' });
const generation = trace.generation({
name: 'openai-completion',
model: 'gpt-4o-mini',
input: message,
});
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: message }],
});
generation.end({
output: response.choices[0].message.content,
usage: {
inputTokens: response.usage.prompt_tokens,
outputTokens: response.usage.completion_tokens,
},
});
await langfuse.flush();
return response;
}What I Can See Now
The Langfuse dashboard shows:
Traces
Every LLM interaction with full context:
- Input prompt
- Output completion
- Latency breakdown
- Token counts
- Model used
- User ID (for per-user analytics)
Metrics
- P50/P95/P99 latency by model
- Token usage over time
- Cost tracking (for paid APIs)
- Error rates and types
Prompt Management
I can version prompts in Langfuse and A/B test different versions without code changes.
Integration with Dashboard
I added a Langfuse link to my portfolio dashboard's monitoring menu:
const monitoringLinks = [
{ name: 'Grafana', url: 'https://grafana.el-jefe.me' },
{ name: 'Prometheus', url: 'https://prometheus.el-jefe.me' },
{ name: 'Langfuse', url: 'https://langfuse.el-jefe.me' }, // New!
];Sample Insights
After a week of data:
| Metric | Value |
|---|---|
| Total traces | 1,247 |
| Avg latency (Ollama local) | 2.3s |
| Avg latency (GPT-4o-mini) | 0.8s |
| Total tokens (local) | 892K |
| Total tokens (OpenAI) | 124K |
| Est. OpenAI cost | $0.18 |
The local Ollama inference is slower but free. For user-facing features where speed matters, I fall back to GPT-4o-mini.
Lessons Learned
- Instrument early - Adding tracing retroactively is tedious
- Include user context - Per-user analytics help identify abuse
- Set up cost alerts - Easy to accidentally burn through API credits
- Use async flush - Don't block requests waiting for Langfuse
Documenting the evolution of my homelab infrastructure.
