LLMs & AI Agents¶

Stack notes for building production LLM agents, RAG, and evaluation pipelines.

LLM providers¶

OpenAI — GPT-class models; tool / function calling, structured outputs (JSON Schema), streaming, vision
Anthropic Claude — long-context reasoning, prompt caching, tool use, vision
Hugging Face / open-weights — local inference, sentence-transformers for embeddings

Agent frameworks¶

LangChain — broad ecosystem, integrations
LangGraph — graph / state-machine orchestration for production multi-agent
PydanticAI — type-safe agents with Pydantic schemas
CrewAI — quick multi-agent role setups (researcher / coder / reviewer)
OpenHands — open-source coding agent platform (Claude-Code alternative)
Plain-Python tool-using agents — when frameworks get in the way

Patterns¶

ReAct, planner-executor, sub-agents, parallel sub-agents, reviewer / critic
Orchestrator + sub-agents for multi-step workflows
MCP (Model Context Protocol) servers and clients

RAG¶

Vector stores: FAISS, pgvector, Weaviate, Qdrant
Embeddings: OpenAI, sentence-transformers
Retrieval: hybrid search (BM25 + dense), reranking, chunking, query rewriting, citation grounding

Evaluation¶

Golden datasets, hallucination detection, prompt regression
LLM-as-judge with calibration
Eval in CI (treat agent behaviour as a regression surface)

Ops¶

Cost / latency budgets, retries, fallbacks
Observability — traces, token spend, eval drift
Guardrails, jailbreak resistance, PII redaction

Tools & IDEs¶

Claude Code — agent-driven CLI for codegen, refactors, multi-agent dev
Cursor — AI-first editor
Zed — modern editor with AI integrations