AI & LLM Agents

I build production-grade LLM agents and the evaluation pipelines that keep them honest. Modern, agent-driven development is the default workflow — humans set direction, agents handle code, tests, review, and merge.

What I offer

  • Production LLM agents — customer-facing chat, RAG over your catalogue / docs / data, tool-using agents, planner-executor patterns

  • MCP (Model Context Protocol) servers and clients — make your platform first-class to AI assistants (Claude, ChatGPT desktop, agent IDEs)

  • RAG pipelines — FAISS, pgvector, Weaviate; hybrid search (BM25 + dense), reranking, chunking, citation grounding

  • LLM / RAG evaluation — golden datasets, hallucination detection, prompt regression, LLM-as-judge, eval in CI

  • Guardrails & safety — jailbreak resistance, PII redaction, content filtering, deterministic mode for regulated paths

  • Agent-driven development consulting — coach your team on the Concept Plan Spec Code Feature discipline, multi-agent workflows, and AI-augmented dev pipelines

Stack

  • LLMs: OpenAI, Claude (Anthropic), Hugging Face / open-weights

  • Agent frameworks: LangChain, LangGraph, PydanticAI, CrewAI, OpenHands, plain-Python tool-using agents

  • RAG: FAISS, pgvector, Weaviate, sentence-transformers, hybrid retrieval

  • Eval & Ops: golden sets, LLM-as-judge, cost / latency budgets, observability (traces, token spend)

  • AI-assisted dev: Claude Code, Cursor, Zed; multi-agent setups for planner / developer / tester / reviewer / ops roles

If this matches what you need, see my Profile Vladislav Vorobev and feel free to reach out.