►
LLM evaluation with W&B
Weights & Biases · evals, llm apps, observability, mlops
AI directory search
Use this when you know the topic you need: Claude Code, MCP, evals, RAG, agents, product, coding, prompting, foundations, or model internals.
33 matches for "Evals"
Watch first when you want a fast feel for the topic before opening courses, docs, or profiles.
►
Weights & Biases · evals, llm apps, observability, mlops
►
Arize AI · evals, observability, tracing, rag debugging
►
Promptfoo · evals, prompt testing, red teaming, security
►
Hamel Husain and Shreya Shankar · evals, product, llm reliability
Hamel's AI evals guides · Intermediate to advanced
Very practical material on evaluating LLM apps before they disappoint users.
Skills
Evals, RAG, LLM product quality
AI Evals for Engineers and PMs · Intermediate
Useful if you need to judge whether an AI feature is actually improving.
Skills
Evals, LLM reliability, Product quality
AI Hero · Beginner to advanced
Practical developer-focused AI education across LLM fundamentals, AI SDK app development, MCP, Claude Code workflows, agent-ready codebases, evals, TDD, handoffs, and reusable skills such as /teach, /grill-me, /to-prd, /to-issues, /tdd, /triage, and /handoff.
Skills
AI coding, Claude Skills, Agentic workflows, AI SDK, MCP, LLM fundamentals, Personalized learning
Hamza Farooq on Maven · Beginner to intermediate
Useful for PMs who need to design, evaluate, and ship reliable AI systems beyond impressive demos.
Skills
Agentic AI, AI product strategy, Evals, Production AI
W&B Courses · Intermediate
Good for builders who need to measure, debug, and improve LLM apps rather than just demo them.
Topics
LLM apps, Evals, Experiment tracking, MLOps
OpenAI model docs and Cookbook · Beginner to advanced
Official model and implementation material for learning current GPT-5.5, GPT-5.5 Pro, and GPT-5.4 tradeoffs, Codex workflows, agent evals, MCP and connector patterns, retrieval, model optimization, and structured outputs.
Topics
GPT models, Reasoning models, Model selection, Agents, RAG, Structured outputs, MCP, Evals
Useful for debugging and evaluating LLM applications once you move beyond prototypes.
Topics
Observability, Evals, Tracing, RAG debugging
Langfuse Docs · Intermediate
Good operational material for tracing, scoring, and improving production LLM apps.
Topics
Observability, Prompt management, Evals, Tracing
Vellum Guides · Beginner to intermediate
Useful for product and ops teams that need practical LLM product concepts without getting lost in research.
Topics
Prompt management, Evals, Workflow design
Humanloop Blog and Docs · Intermediate
Useful for teams building repeatable AI product processes around prompts, datasets, and evaluations.
Topics
Prompt management, Evals, LLM workflows
Promptfoo Docs · Intermediate
Very practical for regression testing prompts, model changes, and LLM outputs.
Topics
Prompt testing, Evals, Red teaming
Maven AI courses · Beginner to advanced
Useful discovery surface for live courses taught by practitioners across AI product, work, and engineering.
Topics
AI product, AI leadership, AI workflows, Evals
AI product teams
Learn first
Good matches
Open next
Workshop · Matt Pocock · Intermediate
You want a structured AI SDK v6 course that covers model choice, text and object generation, UI streams, agents, persistence, context engineering, evals, and advanced app patterns.
ai sdk, llm apps, agents, streaming, evals
Free tutorial · Matt Pocock · Beginner to intermediate
You want a guided path through core AI concepts, model selection, the AI engineering mindset, evals, and techniques for improving LLM-powered apps.
ai engineering, model selection, evals, llm apps
Guide · Hamel Husain · Intermediate
Your AI app needs quality checks before users see it.
evals, quality, llm apps
Short course · DeepLearning.AI · Intermediate
You need to test, trace, and improve agent workflows instead of judging only single LLM responses.
agent evals, evals, agents, reliability, tracing
Short course · DeepLearning.AI · Intermediate
You already know basic RAG and need better retrieval, evaluation, and production-quality patterns.
rag, evals, retrieval, llm apps, ai engineering
Guide · OpenAI · Intermediate
You need API-level guidance for testing outputs, comparing models, and catching regressions during upgrades.
openai, evals, quality, regression testing, reliability
Guide · OpenAI · Intermediate
You need the current OpenAI path for tracing, grading, and regression-testing agent workflows instead of only single-prompt evals.
openai, agents, evals, traces, graders
Guide · OpenAI · Intermediate
You need a practical optimization loop across prompt changes, evals, and fine-tuning rather than guessing which knob to turn next.
openai, prompting, evals, fine-tuning, optimization
►
Free course · Weights & Biases · Intermediate
You need to debug and measure LLM app quality.
evals, llm apps, observability
►
Open source tool and docs · Arize AI · Intermediate
You need to trace, inspect, and evaluate LLM app behavior.
evals, observability, tracing
►
Open source docs · Promptfoo · Intermediate
You need regression tests for prompts, models, and LLM outputs.
evals, prompt testing, red teaming
►
Cohort course · Hamel Husain and Shreya Shankar · Intermediate
You are shipping AI features and need a serious evaluation workflow.
evals, product, llm reliability
Beehiiv post · Sumanth P · Intermediate
You want a concise technical briefing on why code, traces, tests, and harnesses matter for real agent systems.
beehiiv, agents, ai engineering, evals, tracing
Guides · Hamel Husain · Intermediate to advanced
Use this when you want Hamel Husain's material for evals and related AI skills.
Evals, RAG, LLM product quality
Course · Shreya Shankar · Intermediate
Use this when you want Shreya Shankar's material for evals and related AI skills.
Evals, LLM reliability, Product quality
Maven cohort course · Agentic AI for Product Managers · Beginner to intermediate
Use this when you want Agentic AI for Product Managers's material for agentic ai and related AI skills.
Agentic AI, AI product strategy, Evals, Production AI