AI & Intelligent Automation
MLOps Engineer
Role Summary
Builds the production substrate for AI systems: model gateways, prompt and tool registries, evaluation harnesses, observability stacks, and the cost-control instrumentation that keeps generative-AI economics viable at portfolio scale.
Treats AI infrastructure as platform engineering with a probabilistic execution layer, not as a separate discipline. Insists on the same SLO and on-call discipline for AI systems as for any other production service. Pushes back on per-team ad-hoc model integration in favor of shared platform primitives that scale with the portfolio.
Skills
- Model gateway design (LiteLLM, Bedrock, Vertex, custom)
- Provider abstraction and multi-model routing
- Prompt registry design with versioning and rollback
- Tool registry design and shared tool libraries
- Evaluation harness construction (offline + online)
- CI/CD for AI artifacts (prompts, tools, model configs)
- Token-level cost telemetry and per-request economics
- LLM observability platforms (Langfuse, Helicone, Arize, OpenTelemetry-based custom)
- Drift detection on inputs and outputs of production AI systems
- Latency and throughput SLO design for AI services
- Feedback loop construction (user thumbs, structured feedback, downstream business outcomes)
- Caching strategies for LLM responses
- Rate limiting at multiple granularities (user, session, tool, model)
- Cost attribution and chargeback / showback for AI workloads
- Commitment-tier optimization for hyperscaler AI services
- Incident-response process for AI-specific failure modes
- Container orchestration (Kubernetes, ECS) for model-serving workloads
- Infrastructure-as-code for AI platforms
Capabilities & Focus Areas
- Model gateway architecture with routing, retries, fallback, and cost attribution
- Prompt and tool registry design with versioning and environment promotion
- Evaluation harnesses integrated into CI/CD for AI artifacts
- LLM observability stacks (token-level cost, latency, drift, abuse)
- Feedback collection pipelines closing the loop between users and model owners
- SLO design for non-deterministic systems
- FinOps for generative-AI workloads
Typical Engagement Patterns
- Twelve to twenty-four week AI platform builds for clients scaling pilots to portfolio
- Embedded MLOps augmentation for client AI platform teams
- Cost-discipline engagements when generative-AI bills exceed planned budgets
- Reliability and observability engagements for clients with production AI incidents
- Migration engagements consolidating per-team AI integrations onto a shared platform
Outcomes Delivered
- AI platforms where new use cases launch in days, not quarters
- Per-team and per-use-case cost attribution that finance can reconcile
- Production AI systems with the same observability rigor as conventional services
- Prompt and tool changes promoted through CI/CD rather than copy-paste
- On-call rotations sized for actual AI workload patterns, not worst-case panic
Need this role for an engagement?
Brief us on the scope and timeline and we'll match a senior practitioner.

