Role Summary

Builds the production substrate for AI systems: model gateways, prompt and tool registries, evaluation harnesses, observability stacks, and the cost-control instrumentation that keeps generative-AI economics viable at portfolio scale.

Treats AI infrastructure as platform engineering with a probabilistic execution layer, not as a separate discipline. Insists on the same SLO and on-call discipline for AI systems as for any other production service. Pushes back on per-team ad-hoc model integration in favor of shared platform primitives that scale with the portfolio.

Skills

  • Model gateway design (LiteLLM, Bedrock, Vertex, custom)
  • Provider abstraction and multi-model routing
  • Prompt registry design with versioning and rollback
  • Tool registry design and shared tool libraries
  • Evaluation harness construction (offline + online)
  • CI/CD for AI artifacts (prompts, tools, model configs)
  • Token-level cost telemetry and per-request economics
  • LLM observability platforms (Langfuse, Helicone, Arize, OpenTelemetry-based custom)
  • Drift detection on inputs and outputs of production AI systems
  • Latency and throughput SLO design for AI services
  • Feedback loop construction (user thumbs, structured feedback, downstream business outcomes)
  • Caching strategies for LLM responses
  • Rate limiting at multiple granularities (user, session, tool, model)
  • Cost attribution and chargeback / showback for AI workloads
  • Commitment-tier optimization for hyperscaler AI services
  • Incident-response process for AI-specific failure modes
  • Container orchestration (Kubernetes, ECS) for model-serving workloads
  • Infrastructure-as-code for AI platforms

Capabilities & Focus Areas

  • Model gateway architecture with routing, retries, fallback, and cost attribution
  • Prompt and tool registry design with versioning and environment promotion
  • Evaluation harnesses integrated into CI/CD for AI artifacts
  • LLM observability stacks (token-level cost, latency, drift, abuse)
  • Feedback collection pipelines closing the loop between users and model owners
  • SLO design for non-deterministic systems
  • FinOps for generative-AI workloads

Typical Engagement Patterns

  • Twelve to twenty-four week AI platform builds for clients scaling pilots to portfolio
  • Embedded MLOps augmentation for client AI platform teams
  • Cost-discipline engagements when generative-AI bills exceed planned budgets
  • Reliability and observability engagements for clients with production AI incidents
  • Migration engagements consolidating per-team AI integrations onto a shared platform

Outcomes Delivered

  • AI platforms where new use cases launch in days, not quarters
  • Per-team and per-use-case cost attribution that finance can reconcile
  • Production AI systems with the same observability rigor as conventional services
  • Prompt and tool changes promoted through CI/CD rather than copy-paste
  • On-call rotations sized for actual AI workload patterns, not worst-case panic

Need this role for an engagement?

Brief us on the scope and timeline and we'll match a senior practitioner.

Get in touch →