AI & LLM Agents¶

I build production-grade LLM agents and the evaluation pipelines that keep them honest. Modern, agent-driven development is the default workflow — humans set direction, agents handle code, tests, review, and merge.

What I offer¶

Production LLM agents — customer-facing chat, RAG over your catalogue / docs / data, tool-using agents, planner-executor patterns
MCP (Model Context Protocol) servers and clients — make your platform first-class to AI assistants (Claude, ChatGPT desktop, agent IDEs)
RAG pipelines — FAISS, pgvector, Weaviate; hybrid search (BM25 + dense), reranking, chunking, citation grounding
LLM / RAG evaluation — golden datasets, hallucination detection, prompt regression, LLM-as-judge, eval in CI
Guardrails & safety — jailbreak resistance, PII redaction, content filtering, deterministic mode for regulated paths
Agent-driven development consulting — coach your team on the Concept → Plan → Spec → Code → Feature discipline, multi-agent workflows, and AI-augmented dev pipelines

Stack¶

LLMs: OpenAI, Claude (Anthropic), Hugging Face / open-weights
Agent frameworks: LangChain, LangGraph, PydanticAI, CrewAI, OpenHands, plain-Python tool-using agents
RAG: FAISS, pgvector, Weaviate, sentence-transformers, hybrid retrieval
Eval & Ops: golden sets, LLM-as-judge, cost / latency budgets, observability (traces, token spend)
AI-assisted dev: Claude Code, Cursor, Zed; multi-agent setups for planner / developer / tester / reviewer / ops roles

If this matches what you need, see my Profile Vladislav Vorobev and feel free to reach out.