AI directory search

Search across educators, skills, and resources.

Use this when you know the topic you need: Claude Code, MCP, evals, RAG, agents, product, coding, prompting, foundations, or model internals.

33 matches for "Evals"

Video matches

Watch first when you want a fast feel for the topic before opening courses, docs, or profiles.

LLM evaluation with W&B video thumbnail

LLM evaluation with W&B

Weights & Biases · evals, llm apps, observability, mlops

AI evals with Phoenix video thumbnail

AI evals with Phoenix

Arize AI · evals, observability, tracing, rag debugging

Promptfoo red teaming video thumbnail

Promptfoo red teaming

Promptfoo · evals, prompt testing, red teaming, security

AI Evals for Engineers & PMs video thumbnail

AI Evals for Engineers & PMs

Hamel Husain and Shreya Shankar · evals, product, llm reliability

Educators

Hamel Husain profile photo

Hamel Husain

Hamel's AI evals guides · Intermediate to advanced

Very practical material on evaluating LLM apps before they disappoint users.

Skills

Evals, RAG, LLM product quality

Matt Pocock profile photo

Matt Pocock

AI Hero · Beginner to advanced

Practical developer-focused AI education across LLM fundamentals, AI SDK app development, MCP, Claude Code workflows, agent-ready codebases, evals, TDD, handoffs, and reusable skills such as /teach, /grill-me, /to-prd, /to-issues, /tdd, /triage, and /handoff.

Skills

AI coding, Claude Skills, Agentic workflows, AI SDK, MCP, LLM fundamentals, Personalized learning

Providers and platforms

Weights & Biases profile photo

Weights & Biases

W&B Courses · Intermediate

Good for builders who need to measure, debug, and improve LLM apps rather than just demo them.

Topics

LLM apps, Evals, Experiment tracking, MLOps

OpenAI profile photo

OpenAI

OpenAI model docs and Cookbook · Beginner to advanced

Official model and implementation material for learning current GPT-5.5, GPT-5.5 Pro, and GPT-5.4 tradeoffs, Codex workflows, agent evals, MCP and connector patterns, retrieval, model optimization, and structured outputs.

Topics

GPT models, Reasoning models, Model selection, Agents, RAG, Structured outputs, MCP, Evals

Arize AI profile photo

Arize AI

Phoenix · Intermediate

Useful for debugging and evaluating LLM applications once you move beyond prototypes.

Topics

Observability, Evals, Tracing, RAG debugging

Langfuse profile photo

Langfuse

Langfuse Docs · Intermediate

Good operational material for tracing, scoring, and improving production LLM apps.

Topics

Observability, Prompt management, Evals, Tracing

Vellum profile photo

Vellum

Vellum Guides · Beginner to intermediate

Useful for product and ops teams that need practical LLM product concepts without getting lost in research.

Topics

Prompt management, Evals, Workflow design

Useful for teams building repeatable AI product processes around prompts, datasets, and evaluations.

Topics

Prompt management, Evals, LLM workflows

Promptfoo profile photo

Promptfoo

Promptfoo Docs · Intermediate

Very practical for regression testing prompts, model changes, and LLM outputs.

Topics

Prompt testing, Evals, Red teaming

Maven AI courses profile photo

Maven AI courses

Maven AI courses · Beginner to advanced

Useful discovery surface for live courses taught by practitioners across AI product, work, and engineering.

Topics

AI product, AI leadership, AI workflows, Evals

Learning paths

Resources

AI SDK v6 Crash Course

Workshop · Matt Pocock · Intermediate

You want a structured AI SDK v6 course that covers model choice, text and object generation, UI streams, agents, persistence, context engineering, evals, and advanced app patterns.

ai sdk, llm apps, agents, streaming, evals

The AI Engineer Roadmap

Free tutorial · Matt Pocock · Beginner to intermediate

You want a guided path through core AI concepts, model selection, the AI engineering mindset, evals, and techniques for improving LLM-powered apps.

ai engineering, model selection, evals, llm apps

LLM Evals

Guide · Hamel Husain · Intermediate

Your AI app needs quality checks before users see it.

evals, quality, llm apps

Evaluating AI Agents

Short course · DeepLearning.AI · Intermediate

You need to test, trace, and improve agent workflows instead of judging only single LLM responses.

agent evals, evals, agents, reliability, tracing

Building and Evaluating Advanced RAG Applications

Short course · DeepLearning.AI · Intermediate

You already know basic RAG and need better retrieval, evaluation, and production-quality patterns.

rag, evals, retrieval, llm apps, ai engineering

OpenAI Working with evals

Guide · OpenAI · Intermediate

You need API-level guidance for testing outputs, comparing models, and catching regressions during upgrades.

openai, evals, quality, regression testing, reliability

OpenAI Evaluate agent workflows

Guide · OpenAI · Intermediate

You need the current OpenAI path for tracing, grading, and regression-testing agent workflows instead of only single-prompt evals.

openai, agents, evals, traces, graders

OpenAI model optimization

Guide · OpenAI · Intermediate

You need a practical optimization loop across prompt changes, evals, and fine-tuning rather than guessing which knob to turn next.

openai, prompting, evals, fine-tuning, optimization

W&B LLM Evaluation Course video thumbnail

W&B LLM Evaluation Course

Free course · Weights & Biases · Intermediate

You need to debug and measure LLM app quality.

evals, llm apps, observability

Phoenix by Arize video thumbnail

Phoenix by Arize

Open source tool and docs · Arize AI · Intermediate

You need to trace, inspect, and evaluate LLM app behavior.

evals, observability, tracing

Promptfoo Intro video thumbnail

Promptfoo Intro

Open source docs · Promptfoo · Intermediate

You need regression tests for prompts, models, and LLM outputs.

evals, prompt testing, red teaming

AI Evals for Engineers & PMs video thumbnail

AI Evals for Engineers & PMs

Cohort course · Hamel Husain and Shreya Shankar · Intermediate

You are shipping AI features and need a serious evaluation workflow.

evals, product, llm reliability

AI Engineering: Code Is the New Agent Harness

Beehiiv post · Sumanth P · Intermediate

You want a concise technical briefing on why code, traces, tests, and harnesses matter for real agent systems.

beehiiv, agents, ai engineering, evals, tracing

Hamel's AI evals guides

Guides · Hamel Husain · Intermediate to advanced

Use this when you want Hamel Husain's material for evals and related AI skills.

Evals, RAG, LLM product quality

AI Evals for Engineers and PMs

Course · Shreya Shankar · Intermediate

Use this when you want Shreya Shankar's material for evals and related AI skills.

Evals, LLM reliability, Product quality

Hamza Farooq on Maven

Maven cohort course · Agentic AI for Product Managers · Beginner to intermediate

Use this when you want Agentic AI for Product Managers's material for agentic ai and related AI skills.

Agentic AI, AI product strategy, Evals, Production AI