AI directory search

Search across educators, skills, and resources.

Use this when you know the topic you need: Claude Code, MCP, evals, RAG, agents, product, coding, prompting, foundations, or model internals.

33 matches for "Evals"

GPT-5.5 Claude Code Gemini Deep Research Grok 4.3 MCP context engineering evals RAG OpenRouter coding agents

Video matches

Watch first when you want a fast feel for the topic before opening courses, docs, or profiles.

►

LLM evaluation with W&B

Weights & Biases · evals, llm apps, observability, mlops

►

AI evals with Phoenix

Arize AI · evals, observability, tracing, rag debugging

►

Promptfoo red teaming

Promptfoo · evals, prompt testing, red teaming, security

►

AI Evals for Engineers & PMs

Hamel Husain and Shreya Shankar · evals, product, llm reliability

Educators

Hamel Husain

Hamel's AI evals guides · Intermediate to advanced

Very practical material on evaluating LLM apps before they disappoint users.

Skills

Evals, RAG, LLM product quality

Shreya Shankar

AI Evals for Engineers and PMs · Intermediate

Useful if you need to judge whether an AI feature is actually improving.

Skills

Evals, LLM reliability, Product quality

Practical developer-focused AI education across LLM fundamentals, AI SDK app development, MCP, Claude Code workflows, agent-ready codebases, evals, TDD, handoffs, and reusable skills such as /teach, /grill-me, /to-prd, /to-issues, /tdd, /triage, and /handoff.

Skills

AI coding, Claude Skills, Agentic workflows, AI SDK, MCP, LLM fundamentals, Personalized learning

Agentic AI for Product Managers

Hamza Farooq on Maven · Beginner to intermediate

Useful for PMs who need to design, evaluate, and ship reliable AI systems beyond impressive demos.

Skills

Agentic AI, AI product strategy, Evals, Production AI

Providers and platforms

Weights & Biases

W&B Courses · Intermediate

Good for builders who need to measure, debug, and improve LLM apps rather than just demo them.

Topics

LLM apps, Evals, Experiment tracking, MLOps

OpenAI

OpenAI model docs and Cookbook · Beginner to advanced

Official model and implementation material for learning current GPT-5.5, GPT-5.5 Pro, and GPT-5.4 tradeoffs, Codex workflows, agent evals, MCP and connector patterns, retrieval, model optimization, and structured outputs.

Topics

GPT models, Reasoning models, Model selection, Agents, RAG, Structured outputs, MCP, Evals

Arize AI

Phoenix · Intermediate

Useful for debugging and evaluating LLM applications once you move beyond prototypes.

Topics

Observability, Evals, Tracing, RAG debugging

Langfuse

Langfuse Docs · Intermediate

Good operational material for tracing, scoring, and improving production LLM apps.

Topics

Observability, Prompt management, Evals, Tracing

Vellum

Vellum Guides · Beginner to intermediate

Useful for product and ops teams that need practical LLM product concepts without getting lost in research.

Topics

Prompt management, Evals, Workflow design

Humanloop

Humanloop Blog and Docs · Intermediate

Useful for teams building repeatable AI product processes around prompts, datasets, and evaluations.

Topics

Prompt management, Evals, LLM workflows

Promptfoo

Promptfoo Docs · Intermediate

Very practical for regression testing prompts, model changes, and LLM outputs.

Topics

Prompt testing, Evals, Red teaming

Maven AI courses

Maven AI courses · Beginner to advanced

Useful discovery surface for live courses taught by practitioners across AI product, work, and engineering.

Topics

AI product, AI leadership, AI workflows, Evals

Learning paths

Evals and reliability

AI product teams

Open path

Learn first

Test sets
Human review
Regression checks
Quality metrics

Good matches

Hamel Husain

Shreya Shankar

Chip Huyen

Matt Pocock

Open next

Resources

AI SDK v6 Crash Course

Workshop · Matt Pocock · Intermediate

You want a structured AI SDK v6 course that covers model choice, text and object generation, UI streams, agents, persistence, context engineering, evals, and advanced app patterns.

ai sdk, llm apps, agents, streaming, evals

The AI Engineer Roadmap

Free tutorial · Matt Pocock · Beginner to intermediate

You want a guided path through core AI concepts, model selection, the AI engineering mindset, evals, and techniques for improving LLM-powered apps.

ai engineering, model selection, evals, llm apps

LLM Evals

Guide · Hamel Husain · Intermediate

Your AI app needs quality checks before users see it.

evals, quality, llm apps

Evaluating AI Agents

Short course · DeepLearning.AI · Intermediate

You need to test, trace, and improve agent workflows instead of judging only single LLM responses.

agent evals, evals, agents, reliability, tracing

Building and Evaluating Advanced RAG Applications

Short course · DeepLearning.AI · Intermediate

You already know basic RAG and need better retrieval, evaluation, and production-quality patterns.

rag, evals, retrieval, llm apps, ai engineering

OpenAI Working with evals

Guide · OpenAI · Intermediate

You need API-level guidance for testing outputs, comparing models, and catching regressions during upgrades.

openai, evals, quality, regression testing, reliability

OpenAI Evaluate agent workflows

Guide · OpenAI · Intermediate

You need the current OpenAI path for tracing, grading, and regression-testing agent workflows instead of only single-prompt evals.

openai, agents, evals, traces, graders

OpenAI model optimization

Guide · OpenAI · Intermediate

You need a practical optimization loop across prompt changes, evals, and fine-tuning rather than guessing which knob to turn next.

openai, prompting, evals, fine-tuning, optimization

►

W&B LLM Evaluation Course

Free course · Weights & Biases · Intermediate

You need to debug and measure LLM app quality.

evals, llm apps, observability

►

Phoenix by Arize

Open source tool and docs · Arize AI · Intermediate

You need to trace, inspect, and evaluate LLM app behavior.

evals, observability, tracing

►

Promptfoo Intro

Open source docs · Promptfoo · Intermediate

You need regression tests for prompts, models, and LLM outputs.

evals, prompt testing, red teaming

►

AI Evals for Engineers & PMs

Cohort course · Hamel Husain and Shreya Shankar · Intermediate

You are shipping AI features and need a serious evaluation workflow.

evals, product, llm reliability

AI Engineering: Code Is the New Agent Harness

Beehiiv post · Sumanth P · Intermediate

You want a concise technical briefing on why code, traces, tests, and harnesses matter for real agent systems.

beehiiv, agents, ai engineering, evals, tracing

Hamel's AI evals guides

Guides · Hamel Husain · Intermediate to advanced

Use this when you want Hamel Husain's material for evals and related AI skills.

Evals, RAG, LLM product quality

AI Evals for Engineers and PMs

Course · Shreya Shankar · Intermediate

Use this when you want Shreya Shankar's material for evals and related AI skills.

Evals, LLM reliability, Product quality

Hamza Farooq on Maven

Maven cohort course · Agentic AI for Product Managers · Beginner to intermediate

Use this when you want Agentic AI for Product Managers's material for agentic ai and related AI skills.

Agentic AI, AI product strategy, Evals, Production AI