Blog

2025-09-14
1 min read

W37 Papers

September 8, 2025 — Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
RL hierarchy reasoning

Under GRPO, training shows a two-phase dynamic—first the model fixes procedural/execution mistakes (syntax, arithmetic, step fidelity); then the bottleneck shifts to strategic planning (choosing the right approach).

2025-09-11
2 min read

Forward Deployed Engineering for building AI Agents

Forward Deployed Engineering Playbook

AI agents delivers an outcome by coordinating across many tools whereas traditional product categories are single-tool boxes. Think of a healthcare intake agent. It captures a referral, reads EHR notes and orders, checks eligibility and prior authorization with payers, assembles required documentation, schedules the patient, gets consents, sends reminders, and updates both the EHR and billing—end to end. The value is a resolved intake: “patient scheduled with auth in place, docs filed, denial risk reduced.” Agent does the whole work end-to-end delivering an outcome. Users don't live in the UI trying to push workflows forward. It is almost like a self-driving car that does the whole journey end-to-end (AI Agents) vs a navigation system in the car that only guides you (traditional software).

Therefore AI Agents don't map into an incumbent product category. In order to grow business, you must do product discovery from inside the enterprise with domain experts as well as rapid prototyping with deployed engineers.

FDE program isn’t just a sales tactic; it’s how you 1. Discover the real product, 2. Prove ROI fast, and finally 3. Convert that into bigger, outcome-priced contracts and reusable “agent core” IP.

This is a two-role operating system: - Echo (embedded analysts / “heretics”): insiders who know the domain and want to change it. - Delta (deployed engineers): Prototype under pain & time pressure; throwaway is acceptable.

Other defining characteristics of an FDE program engagement are... - Contract value is continously pushed up by the FDE team uncovering more use-cases and value. - Prototyping is used heavily to inspire or reveal "real user desire".

Ultimately FDE is about doing things that don't scale and turning them into scalable methods and assets that you improve as you go along. e.g. applying the same assets and solutions in other customers.

FDE talk from Bob McGrew (Check it out: FDE talk from Bob McGrew) really resonated with me as it aligns well with my earlier "Frontier Labs" initative at Microsoft Digital Natives org which was mostly about working with customer product as well as engineering managers to inspire them with quick but functional prototypes that demonstrate the potential and value with a product-led approach.

Ozgur Istanbul, September 11th 2025

2025-09-11
1 min read

MCP Capabilities on Azure

2025-09-11
1 min read

Codex Best Practices

CODEX Programming Best Practices

Baseline workflow

Use a tight prompt–run–review loop to converge on working code quickly.

Describe the intent and constraints clearly (inputs, outputs, tests).
Ask for minimal code to pass the tests; run it; paste failures back in.
Iterate until green; then ask for refactors with benchmarks.

Resources:

Advanced Context Engineering for Agents

2025-09-10
1 min read

Building Agents on Azure

Evolution of Builder Tools

Agents are simply your instructions + context + tools.

Design: Agent Design, AI UX, Product Design
Architecture: Memory, Context Engineering, Tool Use
Production: EvalOps, AI Safety / Security

2025-09-07
3 min read

Context Engineering

Context Engineering - What it is? Why it matters?

Context is everything fed to the LLM during inference. This includes: - The instructions - a.k.a prompt - Retrieved facts (e.g. RAG, SQL queries, KG context, etc.) - History of the conversation (e.g. chat history) - Tool outputs (e.g. API responses)

Context Engineering is simply how to get out of today's models by optimising the context, which includes all of the above. In practice it is everything that makes your agents better.

Where prompt engineering is used to craft the instruction text, context engineering architects the whole information flow -data, tools, memory, ranking and budgets. On top of RAG, context engineering adds query planning, re-ranking, compression, memory and policy loops. So it includes both RAG and prompt-engineering and strives to build a system that optimises the context to generate the best llm outcome.

Context Engineering - Core Tasks :

Context retrieval / generation (RAG)

Prompt based generation as well as external data integration using advanced models like agentic RAG and integration of Graph Data.

Context processing

(Skip this part if you are not building an LLM but using existing LLMs from frontier labs).

Is mostly about bringing structured, semi-structured and unstructured data together...

Marrying structured, semi structured, and unstructured data JSON, video, and columnar data should all be candidates for the context. Vector search and smart WHERE clauses with Postgres and pgvector is all you really need. You don't need fancy vector DBs like Qdrant or Pinecone.
Unstructured Data Videos should be properly summarized to not flood the context window. Context windows are expanding here but costs are still per token so summarizing is still a very smart thing to do even though it does cause a loss of detail. Depending on your AI's needs, you might have to keep the entire context (e.g. legal documents)
Semi-structured data JSON data is notoriously messy and includes keys and data you don't always need. Being smart about the keys and values that the AI actually needs will help overcome hallucinates and memory issues.
Structured data You can't inject every single row of data into the context window. Proper aggregate tables with WHERE clause can do wonders for getting the right data in front of AI without overwhelming it.

Balancing relevance and completeness is the key to proper context engineering and it's 100% the future of data engineering

Another important task here is mostly choosing or training a model which will accomodate the length of the context as by definition LLMs have a fixed context length. This can be done using state-space models, or models using sparse-attention or positional encoding tricks to handle long sequences.

Context Management

Memory hiearchies and compression & optimization strategies.

Probably not a complete list of what context engineering for #LLM's entails... Content I have seen -including the infamous survey paper- is mostly referring to how each of these tasks to be optimised individually. Will need a tool like DSPy with extended capabilities addressing these tasks end-to-end...

Prompt Engineering
Relational structural context
Graph context - KG / GNN integration
Dynamic context assembly, reranking & orchestration.
Multimodal context integration
Reasoning depth control
Long-multimodal context compression
Memory integration
Self-reflection
Long-context handling with SSM's, sparse attention, positional encoding tricks
Function calling / Agentic tool use
Guardrails - AI Safety / Grounding services

Azur AI Search

References: