LLMs & AI Agents

Stack notes for building production LLM agents, RAG, and evaluation pipelines.

LLM providers

  • OpenAI — GPT-class models; tool / function calling, structured outputs (JSON Schema), streaming, vision

  • Anthropic Claude — long-context reasoning, prompt caching, tool use, vision

  • Hugging Face / open-weights — local inference, sentence-transformers for embeddings

Agent frameworks

  • LangChain — broad ecosystem, integrations

  • LangGraph — graph / state-machine orchestration for production multi-agent

  • PydanticAI — type-safe agents with Pydantic schemas

  • CrewAI — quick multi-agent role setups (researcher / coder / reviewer)

  • OpenHands — open-source coding agent platform (Claude-Code alternative)

  • Plain-Python tool-using agents — when frameworks get in the way

Patterns

  • ReAct, planner-executor, sub-agents, parallel sub-agents, reviewer / critic

  • Orchestrator + sub-agents for multi-step workflows

  • MCP (Model Context Protocol) servers and clients

RAG

  • Vector stores: FAISS, pgvector, Weaviate, Qdrant

  • Embeddings: OpenAI, sentence-transformers

  • Retrieval: hybrid search (BM25 + dense), reranking, chunking, query rewriting, citation grounding

Evaluation

  • Golden datasets, hallucination detection, prompt regression

  • LLM-as-judge with calibration

  • Eval in CI (treat agent behaviour as a regression surface)

Ops

  • Cost / latency budgets, retries, fallbacks

  • Observability — traces, token spend, eval drift

  • Guardrails, jailbreak resistance, PII redaction

Tools & IDEs

  • Claude Code — agent-driven CLI for codegen, refactors, multi-agent dev

  • Cursor — AI-first editor

  • Zed — modern editor with AI integrations