MLOps Engineer

Role Summary

Builds the production substrate for AI systems: model gateways, prompt and tool registries, evaluation harnesses, observability stacks, and the cost-control instrumentation that keeps generative-AI economics viable at portfolio scale.

Treats AI infrastructure as platform engineering with a probabilistic execution layer, not as a separate discipline. Insists on the same SLO and on-call discipline for AI systems as for any other production service. Pushes back on per-team ad-hoc model integration in favor of shared platform primitives that scale with the portfolio.

Skills

Model gateway design (LiteLLM, Bedrock, Vertex, custom)
Provider abstraction and multi-model routing
Prompt registry design with versioning and rollback
Tool registry design and shared tool libraries
Evaluation harness construction (offline + online)
CI/CD for AI artifacts (prompts, tools, model configs)
Token-level cost telemetry and per-request economics
LLM observability platforms (Langfuse, Helicone, Arize, OpenTelemetry-based custom)
Drift detection on inputs and outputs of production AI systems
Latency and throughput SLO design for AI services
Feedback loop construction (user thumbs, structured feedback, downstream business outcomes)
Caching strategies for LLM responses
Rate limiting at multiple granularities (user, session, tool, model)
Cost attribution and chargeback / showback for AI workloads
Commitment-tier optimization for hyperscaler AI services
Incident-response process for AI-specific failure modes
Container orchestration (Kubernetes, ECS) for model-serving workloads
Infrastructure-as-code for AI platforms

Capabilities & Focus Areas

Model gateway architecture with routing, retries, fallback, and cost attribution
Prompt and tool registry design with versioning and environment promotion
Evaluation harnesses integrated into CI/CD for AI artifacts
LLM observability stacks (token-level cost, latency, drift, abuse)
Feedback collection pipelines closing the loop between users and model owners
SLO design for non-deterministic systems
FinOps for generative-AI workloads

Typical Engagement Patterns

Twelve to twenty-four week AI platform builds for clients scaling pilots to portfolio
Embedded MLOps augmentation for client AI platform teams
Cost-discipline engagements when generative-AI bills exceed planned budgets
Reliability and observability engagements for clients with production AI incidents
Migration engagements consolidating per-team AI integrations onto a shared platform

Outcomes Delivered

AI platforms where new use cases launch in days, not quarters
Per-team and per-use-case cost attribution that finance can reconcile
Production AI systems with the same observability rigor as conventional services
Prompt and tool changes promoted through CI/CD rather than copy-paste
On-call rotations sized for actual AI workload patterns, not worst-case panic

Need this role for an engagement?

Brief us on the scope and timeline and we'll match a senior practitioner.

Get in touch →