Ozgur Guler · A public record
Production AI, from first principles.
I build, write about, and ship production AI systems — agent harnesses, long-running workflows, inference architecture, EvalOps, and the infrastructure underneath.
A curated record: books in progress, essays, code, build notes, talks, and selected startup work. No hype, no metrics theatre — just the trail.
- Now
- Drafting AI Inference Engineering and AI Agents in Production.
- Building
- Durable agent harnesses on Azure AI Foundry · MCP boundaries · EvalOps.
- Open to
- Talks, sober technical consulting, and selected startup work.
Selected work
A curated index.
- № 01 AI Inference Engineering Serving stacks, latency, throughput, KV-cache pressure, GPU economics, and AI factory architecture. Book — in progress
- № 02 AI Agents in Production Memory, durable execution, tool boundaries, evals, replay, governance, and enterprise deployment. Book — in progress
- № 03 foundry-demo Hosted agents, tracing, governance, grounding, MCP tooling, and A2A flows — packaged as a hands-on workshop. Code — Azure AI Foundry workshop
- № 04 agent-framework-ozg Runnable samples for agents, workflows, memory, reasoning, and Azure AI Foundry integration. Code — workshop fork
- № 05 Production Agent Workflows: Orchestration and Observability Typed workflow graphs, fan-out/fan-in execution, checkpointing, human approval gates, and telemetry. Essay
- № 06 Local PII Pre-Filter with Presidio and Qwen 2.5 A local pre-prompt guardrail using Presidio and a small CPU model to reduce PII exposure before LLM calls. Essay — guardrails
Practice
Two tracks, one discipline.
Production AI is the meeting point of model behaviour, infrastructure economics, and operator trust. I work where those three meet.
Agents
- Memory, state, and forgetting policies
- Durable execution and long-running workflows
- Tool boundaries, MCP, and approval gates
- Evals, replay, and run-trace observability
- Governance and enterprise deployment
Inference
- Serving topology and model routing
- Latency, throughput, and batching
- KV-cache pressure and scheduling
- GPU economics and AI-factory architecture
- Benchmarking, reliability, observability
Books
Book drafts.
Recent
Writing and build notes.
Essays
- Building Agent Harnesses with Microsoft Agent-Framework Durable Extensions How durable runtime contracts bound long-running agent behavior with state, approvals, checkpointing, and recovery.
- March '26 AI Funding: Infrastructure Up, App Layer Down Market notes on AI capital concentrating around infrastructure, frontier labs, physical AI, and harder technical moats.
- 10 RAG Shifts Redefining Production AI in 2026 A systems view of RAG moving toward composable retrieval, late interaction, graph reasoning, freshness, and orchestration.
- Context Engineering with Microsoft Agent Framework's Context Provider API Context engineering as a bounded prompt-assembly control surface for short-term memory, persistent memory, retrieval, and policy.
Build log
- Agent workflow orchestration needs typed state Converted agent orchestration notes into a reusable production pattern: typed outputs, checkpoints, replay, and telemetry.
- Local PII filtering before model invocation Packaged a local Presidio plus small-model guardrail pattern for pre-prompt privacy control.
- Data requirements for agentic AI talks Mapped agentic AI, assisted coding, and governance shifts to concrete data estate requirements.
- Copilot Studio review after a break Revisited Copilot Studio capabilities through the lens of practical agent deployment and governance.
Startup work
Public-safe notes.
Selected involvement; deep technical work, public framing.
Enlighty.ai
AI-native consumer intelligence platform turning fragmented data into trusted insight.
Eachlabs.ai
AI workflow and model platform for app builders, with curated image, video, voice, and text models.
Ozgur Guler
AI systems builder.