Role Summary

Hands-on engineer building production AI and ML systems. Spans LLM application development, retrieval-augmented generation pipelines, classical ML model training and deployment, and the integration plumbing that connects models to enterprise systems of record.

Ships small, evaluable increments rather than ambitious demos that never reach production. Treats every prompt and tool definition as configuration that needs versioning and review. Maintains strong opinions on the boundary between application logic and model behavior, and on what should never be left to the model to decide.

Skills

  • Python production engineering
  • LLM API integration (OpenAI, Anthropic, Bedrock, Vertex AI, Azure OpenAI)
  • Open-source model deployment (Llama, Mistral, Qwen, etc.)
  • Prompt engineering with versioning and evaluation discipline
  • Retrieval-augmented generation architecture
  • Vector databases (Pinecone, Weaviate, pgvector, Postgres-native)
  • Embedding pipeline construction and freshness management
  • Agent frameworks (LangGraph, custom orchestration)
  • Tool-calling design and privilege-boundary controls
  • Classical ML lifecycle (scikit-learn, XGBoost, PyTorch, TensorFlow)
  • Feature engineering and selection
  • Model serving frameworks (FastAPI, Triton, BentoML, Ray Serve)
  • Batch inference pipeline construction
  • Evaluation harnesses (offline benchmarks, online A/B testing)
  • LLMOps observability (Langfuse, Helicone, Arize, custom)
  • Content safety and output filtering
  • PII detection and redaction
  • CI/CD for AI artifacts (prompts, tools, model versions)
  • Cost monitoring and per-request economics analysis
  • Integration with enterprise systems of record

Capabilities & Focus Areas

  • LLM application development against OpenAI, Anthropic, Bedrock, Vertex, and open-source models
  • Retrieval-augmented generation across vector stores
  • Classical ML lifecycle from training to deployment
  • Agent and tool-calling architectures with explicit privilege boundaries
  • Model serving and batch-inference pipeline construction
  • Evaluation harnesses tied to CI/CD
  • Integration of model outputs with enterprise systems of record

Typical Engagement Patterns

  • Four to twelve week production AI use-case builds, proof-of-value through go-live
  • Embedded ML engineering augmentation for client AI teams
  • RAG platform implementation engagements (eight to sixteen weeks)
  • Recovery engagements for stalled or under-performing production AI systems
  • Targeted feature-build engagements on existing AI applications

Outcomes Delivered

  • Production AI applications with documented evaluation results before launch
  • Reliable retrieval pipelines that survive content drift over time
  • Tool-calling agents with explicit authorization boundaries, not implicit ones
  • Model serving infrastructure that meets the same SLOs as conventional services
  • Engineering teams that can ship the next AI use case without consulting support

Need this role for an engagement?

Brief us on the scope and timeline and we'll match a senior practitioner.

Get in touch →