AI engineering velocity without the recruiter chaos.
Vetted US-based LLM application engineers, agentic workflow engineers, AI infrastructure, AI-native full-stack, and vector / ML engineers — full-time or fractional. Eval-driven, cost-aware, shipping production AI features today. Pay only when you hire.
Full-time or fractional · $10,000 flat per hire · A real recruiter reviews every candidate
Prioritized for who you are and where you're going.
LLM Application Engineer
RAG over real data, tool use, structured outputs, evals. Owns the product surface — what the user actually sees an LLM do. Ships with Anthropic / OpenAI SDKs, Vercel AI SDK, structured outputs, and Cursor or Claude Code in the loop.
Agentic Workflow Engineer
Multi-step agents that actually finish the work. MCP servers, LangGraph or Inngest / Mastra-style orchestration, safe tool exposure, sandboxed execution, human-in-the-loop checkpoints. Knows where agents help and where they hurt.
AI Infrastructure & Inference
vLLM / SGLang / TGI for self-serve, Modal / Replicate / Together / Fireworks for hosted. GPU cost posture, batching, streaming, observability for LLM workloads (Langfuse, Helicone, Braintrust traces).
AI-Native Full-stack
TypeScript everywhere, Next.js / Hono / Bun, Postgres + pgvector, Drizzle / Prisma. Streaming UIs, generative UI, tool-calling baked into the product flow. Ships fast with AI in the loop.
Data + ML Engineer
Embeddings pipelines, vector DBs (pgvector / LanceDB / Qdrant / Pinecone), hybrid retrieval (BM25 + dense + rerank), fine-tuning (LoRA / QLoRA / DPO), eval datasets that catch regressions before users do.
Staff / Founding AI Engineer
The senior who owns AI across the product. Sets eval discipline, picks the model / inference stack, decides when to fine-tune vs. prompt, makes cost vs. quality calls. Often the first AI hire after the founder.
Not every role needs an LLM. We still place the classic tracks on the same bench, same human review, same per-hire model.
- Front-end / WebReact/Next, Vue, Svelte. Production SPAs, marketing sites, design-system implementation, performance.
- Back-end / APINode/TS, Python, Go, Java. Domain modeling, REST/GraphQL APIs, queue workers, integration depth.
- Full-stackEnd-to-end feature delivery across the stack. Pragmatic, ships fast.
- MobileiOS/Android, React Native, Flutter. Native feel where it matters.
- DevOps / CloudAWS/GCP/Azure, Terraform, CI/CD, observability, incident readiness.
- QA / AutomationPlaywright, Cypress, load and integration testing. Reduces rollback rate.
Beyond the resume.
Culture screening, AI-assisted skills assessment, expert human review. A real recruiter signs off before any candidate reaches you.
Eval discipline
Offline + online evals, golden sets, regression detection, A/B against frontier models. If they can't answer "how would you know if your prompt got worse?", they're a no.
Prompt + context engineering
Tool design, structured outputs, RAG context construction, prompt versioning. Treats prompts like code — diffed, reviewed, tested.
LLM cost + latency tradeoffs
Model selection by task, caching, batching, streaming, cost ceilings. Knows the price-per-million-tokens of every major model.
Production agent design
When agents help vs. hurt. Tool exposure boundaries, retry / refusal handling, human-in-the-loop checkpoints, failure modes in production.
Retrieval + memory
Chunking, reranking, hybrid retrieval, structured retrieval, conversation memory. Real RAG, not a vector-DB MVP.
AI-tool fluency
Cursor, Windsurf, Claude Code, Zed AI daily. Ships with AI in the loop and reviews LLM-generated code with judgment — not "I tried ChatGPT once".
Outcomes we drive
- AI features that actually ship — and don't regress when the model updates
- Eval pipelines that catch quality drops before users do
- Production agents that finish the work, with humans on the loop where it matters
- Predictable AI cost and latency under real traffic
- RAG and memory that work past the demo
Sample profiles
Representative of the bench — share your requirements for live profiles.
RAG over 10M docs in prod · Anthropic + OpenAI SDKs · Vercel AI SDK · Braintrust evals · prior Cursor/Linear-style team · ships weekly.
vLLM + SGLang serving · GPU cost optimization · Modal/Replicate deployments · Langfuse + Helicone observability · SLO discipline.
MCP server design · LangGraph multi-step · sandboxed tool execution · production agent failure modes · prior YC AI-tools team.
Now callable via /hire in Slack.
Same vetted US bench. New way to brief a role. Type /hire in Slack — or any MCP-connected tool — and our sourcing agents plus human recruiters take it from there. Same $10,000 flat-per-hire model.
Add the AI engineer. Ship the model.
Common questions
- Do you place classic backend / frontend engineers, not just AI?
- Yes. AI-native tracks are our lead practice, but classic engineering roles (Front-end, Back-end, Full-stack, Mobile, DevOps, QA) are on the same bench with the same human review and per-hire model.
- How fast can I get a vetted shortlist?
- Typically days, not weeks. Time-to-shortlist depends on role specificity and bench availability for the discipline.
- What does it cost to hire a US engineer through Aperturio?
- $10,000 flat per direct hire — same fee whether the role is $120K junior or $250K staff. No retainer, no seats. Replacement guarantee included.