Braintrust
★ 4.5Enterprise LLM eval platform — logging, evals, and prompt iteration with strong offline scoring.
Weights & Biases
The MLOps platform for tracking, visualizing, and optimizing ML experiments and model training.
Langfuse
Open-source LLM engineering platform — trace, evaluate, and debug your AI application in production.
Helicone
LLM observability proxy — one line of code to monitor costs, latency, and quality across all AI calls.
Arize AI
ML and LLM observability — model monitoring, drift detection, and agent tracing at enterprise scale.
LangSmith
The observability platform from LangChain — tracing, eval, and prompt management for LLM apps.
Humanloop
Prompt management and eval platform for enterprise LLM applications — collaboration between engineers and subject-matter experts.
PromptLayer
Prompt registry and observability — manage, version, and monitor prompts across LLM providers.
AgentOps
Observability and monitoring for AI agents — trace runs, measure costs, and debug multi-agent systems.