Guide
LLM Evals
Intermediate
Your AI app needs quality checks before users see it.
AI educator
Hamel's AI evals guides
Very practical material on evaluating LLM apps before they disappoint users.
Start with: Read the evals guide and build a small test set for your own app.
Guide
Intermediate
Your AI app needs quality checks before users see it.
Guides
Intermediate to advanced
Use this when you want Hamel Husain's material for evals and related AI skills.
Builders shipping LLM systems should start here when they need evals, rag, and llm product quality. The strongest fit is a learner who wants material in these formats: guides, workshops.
Read the evals guide and build a small test set for your own app. After that, open one related resource below and write down the exact workflow, concept, or implementation pattern you want to apply.
Very practical material on evaluating LLM apps before they disappoint users. Use this profile when you are comparing educators by topic, level, format, and practical usefulness rather than browsing random AI content.
Compare the skill coverage, the starting recommendation, the educator's own resources, and any videos when available. If you need evals, search the directory for that skill and shortlist three profiles before committing to a course, book, or playlist.
| Resource | Kind | Level | Use when |
|---|---|---|---|
|
OpenAI Cookbook
OpenAI
|
GitHub repo | Beginner to advanced | You need implementation examples rather than theory. |
|
Prompt Engineering Guide
DAIR.AI
|
Guide | Beginner to advanced | You want examples of prompting techniques and patterns. |
|
AI SDK v6 Crash Course
Matt Pocock
|
Workshop | Intermediate | You want a structured AI SDK v6 course that covers model choice, text and object generation, UI streams, agents, persistence, context engineering, evals, and advanced app patterns. |
|
The AI Engineer Roadmap
Matt Pocock
|
Free tutorial | Beginner to intermediate | You want a guided path through core AI concepts, model selection, the AI engineering mindset, evals, and techniques for improving LLM-powered apps. |
|
Evaluating AI Agents
DeepLearning.AI
|
Short course | Intermediate | You need to test, trace, and improve agent workflows instead of judging only single LLM responses. |
|
Building and Evaluating Advanced RAG Applications
DeepLearning.AI
|
Short course | Intermediate | You already know basic RAG and need better retrieval, evaluation, and production-quality patterns. |
|
LangChain for LLM Application Development
DeepLearning.AI
|
Short course | Beginner to intermediate | You want a fast introduction to building LLM applications with chains, retrieval, and tools. |
|
OpenAI Working with evals
OpenAI
|
Guide | Intermediate | You need API-level guidance for testing outputs, comparing models, and catching regressions during upgrades. |
|
OpenAI Evaluate agent workflows
OpenAI
|
Guide | Intermediate | You need the current OpenAI path for tracing, grading, and regression-testing agent workflows instead of only single-prompt evals. |
|
OpenAI model optimization
OpenAI
|
Guide | Intermediate | You need a practical optimization loop across prompt changes, evals, and fine-tuning rather than guessing which knob to turn next. |
|
OpenAI Retrieval guide
OpenAI
|
Guide | Intermediate | You need the official path for file search, retrieval, and grounded answers before designing a RAG stack. |
|
Cohere models overview
Cohere
|
Model docs | Beginner to advanced | You need to choose between current Cohere Command, embedding, and rerank models for grounded enterprise search. |