►
LLM evaluation with W&B
Weights & Biases · evals, llm apps, observability, mlops
AI directory search
Use this when you know the topic you need: Claude Code, MCP, evals, RAG, agents, product, coding, prompting, foundations, or model internals.
13 matches for "Evaluation"
The Data Exchange · Intermediate
Good practitioner interviews across data, ML, and AI engineering.
Skills
Data systems, ML engineering, AI trends
Data Independent AI tutorials · Beginner to intermediate
Practical walkthroughs for retrieval, LLM application patterns, and common developer questions.
Skills
RAG, LLM apps, Prompting, Evaluation
W&B Courses · Intermediate
Good for builders who need to measure, debug, and improve LLM apps rather than just demo them.
Topics
LLM apps, Evals, Experiment tracking, MLOps
Humanloop Blog and Docs · Intermediate
Useful for teams building repeatable AI product processes around prompts, datasets, and evaluations.
Topics
Prompt management, Evals, LLM workflows
Stanford CS229 Machine Learning · Intermediate
A strong foundation for people who need the math and modeling basics under applied AI.
Topics
ML foundations, Supervised learning, Unsupervised learning, Model evaluation
OpenRouter docs · Beginner to intermediate
Useful for learning model comparison, routing, fallback behavior, and API-compatible experimentation across proprietary and open model families.
Topics
Model routing, Model comparison, Auto Router, GPT models, Claude models, Gemini, Llama, Mistral, DeepSeek, Qwen, API examples, Evaluation
Short course · DeepLearning.AI · Intermediate
You already know basic RAG and need better retrieval, evaluation, and production-quality patterns.
rag, evals, retrieval, llm apps, ai engineering
Free course · Hugging Face · Intermediate
You want a current structured course on instruction tuning, fine-tuning, and evaluation around compact open models.
fine-tuning, post-training, open models, evaluation, smollm
►
Free course · Weights & Biases · Intermediate
You need to debug and measure LLM app quality.
evals, llm apps, observability
►
Cohort course · Hamel Husain and Shreya Shankar · Intermediate
You are shipping AI features and need a serious evaluation workflow.
evals, product, llm reliability
YouTube tutorials · Greg Kamradt · Beginner to intermediate
Use this when you want Greg Kamradt's material for rag and related AI skills.
RAG, LLM apps, Prompting, Evaluation