LLMs & AI Agents¶
Stack notes for building production LLM agents, RAG, and evaluation pipelines.
LLM providers¶
OpenAI — GPT-class models; tool / function calling, structured outputs (JSON Schema), streaming, vision
Anthropic Claude — long-context reasoning, prompt caching, tool use, vision
Hugging Face / open-weights — local inference, sentence-transformers for embeddings
Agent frameworks¶
LangChain — broad ecosystem, integrations
LangGraph — graph / state-machine orchestration for production multi-agent
PydanticAI — type-safe agents with Pydantic schemas
CrewAI — quick multi-agent role setups (researcher / coder / reviewer)
OpenHands — open-source coding agent platform (Claude-Code alternative)
Plain-Python tool-using agents — when frameworks get in the way
Patterns¶
ReAct, planner-executor, sub-agents, parallel sub-agents, reviewer / critic
Orchestrator + sub-agents for multi-step workflows
MCP (Model Context Protocol) servers and clients
RAG¶
Vector stores: FAISS, pgvector, Weaviate, Qdrant
Embeddings: OpenAI, sentence-transformers
Retrieval: hybrid search (BM25 + dense), reranking, chunking, query rewriting, citation grounding
Evaluation¶
Golden datasets, hallucination detection, prompt regression
LLM-as-judge with calibration
Eval in CI (treat agent behaviour as a regression surface)
Ops¶
Cost / latency budgets, retries, fallbacks
Observability — traces, token spend, eval drift
Guardrails, jailbreak resistance, PII redaction
Tools & IDEs¶
Claude Code — agent-driven CLI for codegen, refactors, multi-agent dev
Cursor — AI-first editor
Zed — modern editor with AI integrations