Blog

October 28, 2025 — September's Mega Rounds
October 24, 2025 — When GenAI Reasoning Helps Beyond Classification & Predictive Models
October 11, 2025 — Production Agent Workflows: Orchestration & Observability
October 6, 2025 — True Multi‑Agency
October 4, 2025 — Generative UI: From Static Screens to Adaptive Systems
October 3, 2025 — Presidio PII Filtering
October 1, 2025 — AI Data Requirements
September 24, 2025 — Copilot Studio Review
September 21, 2025 — AI Ideation
September 14, 2025 — W37 Papers
September 14, 2025 — RL Environments
September 12, 2025 — Codex Best Practices
September 12, 2025 — MCP on Azure
September 11, 2025 — Forward Deployed Engineering
September 10, 2025 — Building Agents on Azure
September 7, 2025 — Context Engineering

2025-10-28
4 min read

September’s Mega Rounds

AI Investment Trends & Builder Patterns — September 25

Originally posted in GenAI Builders Newsletter.

After the slowest month since 2017, AI VC activity rebounded in September, recording the highest number of mega-deals and restoring momentum mostly to late-stage rounds.

(We analysed 428 VC AI investments >$500k that happened in September.)

September’s Megarounds. September saw an unprecedented cluster of mega-rounds flowing into model & AI Infra players (Anthropic, Mistral, Cohere, Perplexity and AI Platforms — Cerebras Systems, Groq, Cognition, Nscale, Baseten, Modular …). Early stage is tilting towards vertical agents focusing on document & automation use-cases in top GenAI verticals LegalTech, HRTech, Healthcare Ops, FinTech…

Heterogeneous AI infrastructure fabrics. Modular ($250M, Series C)’s inference stack lets you manage heterogeneous hardware (CPU+GPU — multiple vendors) clusters over a single API whereas Upscale AI ($100M, Seed) is providing the networking fabric for a unified AI networking fabric. Cerebras Systems & Groq keep carving out niches in vertical & sovereign AI clouds serving large LLMs with deterministic latency.

Europe’s sovereign AI infrastructure. Europe is building in its own way powered by renewables with Nscale ($1.1B, Series B) in the hyperscaler-for-AI gaming space and DataCrunch ($65M, Series A) offering a developer-first, self-serve AI GPU cloud. ASML became Mistral AI’s largest shareholder (≈11%) with a strategic board seat — a European sovereignty play across chips, compute & models.

Agents are condensing in the earlier stages. Mimica ($26.2M, Series B) has a unique task mining approach by watching how individuals complete work on their desktops, clustering organisation‑wide behavior and suggesting automation opportunities. Druid AI ($31M, Series C) is another horizontal agency player that plugs into legacy/RPA stacks (e.g., UiPath) and spans CX/EX.

AI Science Factory PPPR rounds. More ambitious startups with Lila Sciences ($235M, Series A, Drug Discovery), Periodic Labs ($300M, Seed, Advanced Materials), Cusp AI ($100M, Series A, Advanced Materials) lead the pack using generative models to propose candidates for physical experiments — shrinking the solution space for fringe scientific problems in molecular design, an increasingly active implementation pattern for GenAI. VCs are funnelling large funds to closed‑loop ‘AI science factories’ pre‑product, pre‑revenue (PPPR) rounds.

A similar startup is Hiverge ($5M, Seed), focusing on writing and improving optimisation algorithms with LLMs, created by DeepMind alumni behind AlphaFold and AlphaTensor.

NVIDIA is going all‑in. 22 deals in September alone, seeding every layer of the stack to both create and capture demand.

Top investments, September 25

Figure 1 — Top investments, September 25

Investments skewed toward later stages in September 2025

The month’s funding skewed heavily toward later‑stage rounds.

GenAI Heatmap of Markets: Adoption × Companies × Investment

GenAI was most applicable in horizontal enablers (AI/ML platforms, Sales/CX, Productivity, DevTools) with LegalTech leading the regulated verticals, while FinTech/HealthTech sat mid-pack and capital-heavy or integration-burdened sectors (Semis, Biotech, Supply/Prop/Insure) lagged.

AI/ML investments sub‑verticals

Investments by vertical

September AI/ML funding overwhelmingly concentrated in Foundation Models/LLMs (late-stage/PE heavy), with MLOps and agentic orchestration getting modest share, while RAG/LLMOps and the rest (multimodal, safety/evals, AutoML, synthetic data, edge) barely registered.

Funding by model (only when disclosed — total dataset: 44 startups out of 414)

AI Agent Plays

Among disclosed rounds, funding concentrated heavily in OpenAI/GPT and Meta/Llama (late-stage dominated) with Anthropic a distant third, while Cohere, Alibaba/Qwen, Google/Gemini, and xAI/Grok saw minimal disclosed capital — signaling continued consolidation around a few U.S. model leaders with a long thin tail.

Agent funding clustered around orchestration platforms and coding/support agents — the “control plane + developer productivity” bets — while voice saw niche traction and ops/back-office and sales/marketing agents lagged, signaling investor preference for platform leverage over narrow GTM bots.

Conclusion

AI capital has been concentrating in early winners and infrastructure moats, shifting capital to later rounds starting in the second half of 2025. Pre‑seed/seed is still active but is highly selective with bigger early checks, high PPPR investments and slower Series‑A graduation. For early stage agent startups the bar is higher requiring solid traction for funding.

👉 If this was useful, follow my LinkedIn newsletter — AI Builder Patterns — for weekly, production-grade agent patterns.

2025-10-24
2 min read

When GenAI Reasoning Helps Beyond Classification & Predictive Models

Modern teams often default to “on‑prem LLM with full PII context.” That approach maximizes raw detail but increases privacy exposure, friction, and cost—while rarely improving real decision quality. A more effective pattern is to compress user and business context into privacy‑preserving representations, then apply a reasoning model that can integrate signals, optimize under constraints, and explain outcomes.

When GenAI Reasoning Helps Beyond Classification & Predictive Models

Why compressed context wins

Cross‑domain context integration
Fuse transactional, behavioral, conversational, and content signals into compact features: cohort/segment IDs, derived intents, stability metrics, trust scores, eligibility flags. These abstractions travel safely and are easier to audit than raw PII.
Constraint‑driven optimization
Real systems are multi‑objective: user utility, cost, risk, compliance, SLAs. Reasoning models can weigh trade‑offs explicitly (e.g., preference satisfaction vs. risk budget) and produce actions that satisfy hard constraints while maximizing soft ones.
Temporal reasoning
Sequences matter. Compressed features (event cadence, recency, dwell change, churn risk, lifecycle stage) let the model reason about what‑if outcomes and next best actions without exposing raw timelines or identifiers.
Conversational explanation & empathy
Explanations reference segments, policies, and constraints rather than PII fields. Users hear why an action happened in plain language; auditors see policy‑aligned traces.

Compressed context vs PII‑heavy on‑prem LLMs

Privacy & governance: Move from raw fields to derived traits; minimize data-in-motion and storage of sensitive attributes.
Generalization: Reason over stable abstractions (segments, intents) instead of idiosyncratic identifiers; reduces overfitting to personal detail.
Latency & cost: Smaller prompts and fewer redaction passes; easier caching because features are normalized.
Robustness: Feature drift is measurable; policies encode constraints explicitly; outputs are reproducible.

What to compress

Segments and cohorts (behavioral, value‑based, risk, lifecycle)
Derived preferences (format, channel, tone), intents, and tasks
Policy/eligibility flags and risk tiers
Constraint budgets (cost, latency, compliance, fairness thresholds)
Temporal features (recency, frequency, stability, trend, seasonality)

When to use each approach

Use compressed‑context reasoning when you need cross‑domain fusion, policy compliance, explainability, and scale under strict privacy constraints.
Use PII‑rich on‑prem LLMs only when raw identifiers materially change outcomes and you have strong legal basis, storage controls, and audit coverage.

Implementation sketch

Build a privacy‑preserving feature layer: segmentation, trait extraction, eligibility rules, constraint budgets.
Define explicit policies and hard/soft constraints; expose them to the model as structured inputs.
Use a reasoning‑capable model (small local or hosted) to generate actions and natural‑language rationales.
Log features + constraints + decisions (no raw PII) for replay, audits, and A/Bs.
Monitor drift in features and policies; retrain segmentation and trait extractors on schedule.

TL;DR: Compress context into safe, auditable features and let the model reason under constraints. You gain privacy, robustness, and better decisions—without hauling raw PII into every prompt.

2025-10-11
5 min read

Production Agent Workflows: Orchestration & Observability

Why workflow orchestration matters for agents

Most agent demos are linear prompt chains with fragile state management. Production systems need typed workflow graphs with explicit coordination, checkpoint/resume, human-in-the-loop gates, and comprehensive telemetry. Without these primitives, agent systems fail silently, can't be debugged, and don't scale beyond toy examples.

Agent workflow orchestration dashboard

Workflow orchestration UI showing graph visualization, trace timeline, and telemetry streams

Core workflow capabilities

Typed workflow graph
Define agents, edges, fan-out/fan-in points, sub-workflows, and decision gates as a declarative graph. Every node has typed inputs/outputs (Pydantic schemas). The runtime enforces contracts and rejects invalid transitions.

Fan-out/fan-in execution
Launch parallel agent tasks (KYC, fraud, income verification) and synchronize results. The runtime manages concurrent execution, collects outputs, and handles partial failures with configurable policies (fail-fast vs. best-effort).

Checkpointing & resume
Persist workflow state at named checkpoints. Resume from any checkpoint on failure, replay, or A/B testing scenarios. Checkpoints are versioned and include full shared state + conversation memory snapshots.

Human-in-the-loop (HITL) gates
Explicit pause/resume points for human decisions. The workflow emits a structured request, waits for approval/rejection/adjustment, then resumes with the response injected into shared state. HITL gates are first-class nodes with telemetry coverage.

Shared state & memory
Agents communicate via a shared state dictionary and thread-scoped conversation memory. State updates are atomic and observable. Memory is queryable for replay and debugging.

Telemetry as a first-class concern

OpenTelemetry end-to-end
Every workflow run, agent invocation, tool call, fan-out boundary, checkpoint, and HITL gate emits structured spans with trace and span IDs. FastAPI auto-instrumentation captures HTTP boundaries; custom spans cover workflow internals.

Progress tracking
Weighted progress calculation across workflow stages. Each phase contributes a normalized weight to overall progress (e.g., doc intake 6%, parallel fan-out 32%, risk ensemble 22%). Real-time SSE streams expose granular progress + trace IDs to front-ends.

Structured event streaming
Server-sent events (SSE) deliver typed JSON payloads for every workflow event—stage transitions, agent outputs, checkpoints, HITL requests, errors. Each event includes: - Timestamp (ISO 8601) - Event type (progress, waiting, checkpoint.saved, hitl.resolved) - Phase identifier - Status (running, waiting, done, error) - Overall progress + step progress ratios - Trace ID + span ID for correlation with Jaeger

Jaeger integration
Docker Compose stack with OTEL Collector + Jaeger. Deep-link from run results to full trace timeline. Trace visualization shows agent orchestration, tool latencies, checkpoint saves, HITL wait durations, and error propagation.

GenAI span attributes
Custom attributes for agent spans: genai.agent, genai.model, genai.tokens.prompt, genai.tokens.completion, genai.latency_ms, genai.cache_hit. Ready for Azure Monitor integration and cost analysis.

Workflow visualization

GraphViz export
Generate SVG workflow diagrams with color-coded node types: - Agents (boxes) - Decision points (diamonds) - Fan-out/fan-in (octagons) - HITL gates (hexagons, highlighted) - Sub-workflows (double octagons) - Start/end (ovals)

Edges show dependencies. The diagram is queryable via API (GET /workflow/graph.svg) for documentation or real-time UI rendering.

Live trace timeline
Jaeger UI shows the full execution timeline—parallel agent fan-out, synchronization barriers, checkpoint saves, HITL pauses. Spans are nested to reflect orchestration hierarchy (workflow → stage → agent → tool).

Implementation highlights

RunContext abstraction
A context object threads through the entire workflow. Provides: - emit(event_type, message, payload) → SSE stream - save_checkpoint(id, label, data) → Persistent checkpoint - request_hitl(payload) → Async HITL gate (blocks until response) - snapshot() → State + memory + checkpoints for debugging

ProgressTracker
Stage-aware progress tracking with weighted contribution. Tracks step progress within a stage and computes overall progress by summing weighted phase contributions. Auto-publishes to SSE stream with trace/span IDs.

Deterministic agent shims
Development-time shims for Microsoft Agent Framework (pre-release). Agents return deterministic typed outputs while keeping integration points ready. Swap to ChatClientAgent when the SDK is released—contracts stay identical.

File-based tooling
Tools read from local synthetic files (applications, identity docs, bureau reports, fraud signals) to simulate external APIs. Every tool call is instrumented with custom OTEL spans for latency + error tracking.

Production patterns

Event-driven API design
- POST /runs → Start workflow, returns run_id immediately - GET /runs/{run_id}/events → SSE stream (long-lived connection) - POST /runs/{run_id}/hitl → Fulfill HITL request - POST /runs/{run_id}/resume → Resume from checkpoint - GET /runs/{run_id}/state → Snapshot shared state + memory

Graceful degradation
Workflow tolerates missing OTEL collector (falls back to console export), missing Graphviz (returns fallback SVG), and missing checkpoints (replays from start).

Policy-first orchestration
Fan-out policies (fail-fast vs. best-effort), HITL timeout policies, and checkpoint retention policies are configurable via environment. Policies are enforced at runtime, not post-hoc.

Observability stack

Local development

docker-compose up  # Backend + OTEL Collector + Jaeger

Jaeger access
http://localhost:16686 → Select service credit-desk-lite → View traces

Trace correlation
Every SSE event includes trace_id and span_id. Front-ends can deep-link to Jaeger for root-cause analysis. Example event payload:

{
  "ts": "2025-10-11T08:23:14.123Z",
  "type": "progress",
  "phase": "risk_ensemble",
  "status": "running",
  "message": "Risk PD model complete",
  "progress": 0.54,
  "step_progress": 0.33,
  "trace_id": "a1b2c3d4e5f6...",
  "span_id": "f9e8d7c6...",
  "meta": {"model": "gpt-4o", "tokens": 1234}
}

Azure Monitor ready
Point OTLP_ENDPOINT to Azure Monitor OTLP ingestion. GenAI attributes map to Azure Monitor custom metrics for cost/latency dashboards.

Why this approach works

Contracts over chaos
Typed schemas for every edge. Runtime validates inputs/outputs and rejects invalid transitions early. Errors are attributed to specific agents/tools with full trace context.

Replay & debugging
Checkpoint any run, replay from any stage. Combine with trace timeline to isolate failures. Shared state snapshots enable diffing between checkpoints.

Observable by default
Every action emits telemetry. No manual instrumentation required—the runtime auto-instruments workflow primitives. Add custom spans for domain-specific logic.

Human-in-the-loop as first-class
HITL gates are explicit workflow nodes, not hacks. They emit waiting events, track pause duration, and inject human responses into shared state with full attribution.

Front-end ready
SSE + trace IDs enable rich UIs—live progress bars, trace timelines, state diffing, workflow graph rendering. No polling required.

Engineering checklist

Define workflow graph with typed nodes (Pydantic schemas for inputs/outputs)
Implement RunContext with emit/checkpoint/HITL primitives
Add fan-out/fan-in executors with configurable policies
Instrument all workflow boundaries with OTEL spans (workflow → stage → agent → tool)
Stand up OTEL Collector + Jaeger (local Docker Compose or Azure Monitor)
Build SSE streaming API with trace/span IDs in every event
Implement weighted progress tracker with per-stage contributions
Add GraphViz export for workflow visualization
Write smoke tests that exercise normal/borderline/failure paths
Ship canaries; measure latency/error rates; promote by evidence

Demo stack

Credit Desk Lite showcases all primitives in a multi-agent credit underwriting workflow: - Typed workflow graph with 8 stages, 6 agents, fan-out/fan-in, sub-workflows - Full OTEL instrumentation (FastAPI + custom workflow spans) - SSE streaming with granular progress + trace correlation - HITL gates for borderline decisions - Checkpoint/resume for replays - GraphViz export for workflow viz - Deterministic synthetic data (runs offline, no external APIs)

Tech stack: FastAPI, Pydantic, OpenTelemetry, Jaeger, GraphViz, SQLite checkpoints, SSE streaming.

Repo: agent-framework-backend-custom01/credit-desk-lite

North star: Production agent systems are workflow orchestrators with typed contracts, comprehensive telemetry, and human-in-the-loop gates—not prompt chains with logging.

2025-10-06
2 min read

True Multi‑Agency

Why "true" multi‑agency matters

Most so‑called multi‑agent systems are linear toolchains with new names. True multi‑agency requires concurrent actors negotiating over a shared objective with explicit state, communication protocols, and guardrails.

Core ingredients

Roles & capabilities: Planners, critics, executors, tools. Each agent exposes a typed capability set, not free‑form prompts.
Shared state (blackboard/graph): Task graph + facts + artifacts. Updates are atomic and observable.
Coordination policy: Who can act when, and on what. Turn‑taking, parallelism, and arbitration rules are explicit.
Environments: Sandboxed I/O for tools, data, and effects; reproducible sims for learning/test.
Evaluation hooks: Trace, critique, and score plans, actions, and outcomes continuously.

Coordination patterns that work

Supervisor → Workers: Planner decomposes; workers execute; critic verifies; loop until done.
Peer consensus (debate → resolution): Multiple planners produce plans; a judge selects/merges.
Market/auction: Tasks bid out to specialized agents; cost/utility drives assignment.
Hierarchical control: High‑level goals → subgoals → executable steps with feedback at each level.

Communication & memory

Messages are structured: intent, inputs, preconditions, effects, confidence.
Memory is layered: short‑term (episode), long‑term (project), external (vector/graph indices).
State transitions are auditable: every change is attributed to an agent + rationale.

Safety & governance

Policy first: allowlists, redaction, rate limits, authority boundaries per role.
Counterfactual checks: simulate high‑risk actions in a shadow env; require approval.
Human‑in‑the‑loop gates: elevation for sensitive scopes; rollbacks are first‑class.

Engineering checklist

Define the agent roles and their typed capabilities (interfaces + schemas).
Stand up a shared blackboard/graph with optimistic concurrency + versioned snapshots.
Implement a coordinator that schedules turns, arbitrates conflicts, and enforces policy.
Add a critic/evaluator with golden tasks and outcome metrics (precision/latency/cost/SLA).
Build a replayable environment (sim + fixtures) and wire tracing for every message/action.
Ship canaries; measure deltas; promote policies by evidence, not vibes.

North star: multiple specialized agents cooperating over a shared state to deliver a measurable outcome—reliably, safely, and faster than a single generalist.

2025-10-04
4 min read

Generative UI: From Static Screens to Adaptive Systems

Generative UI — why now

Frictionless UX drives usage. Generative UI reduces friction by assembling interfaces at runtime.

Generative UI adaptive interface examples

Adaptive interfaces that respond to context and user signals

Generative UI: interfaces whose structure and behavior are generated on‑the‑fly by models, not hard‑coded. Principles:

Dynamic assembly: Models + analytics compose components in real time per user, device, and goal.
Prompt → UI spec: Intent becomes a typed JSON/declarative spec rendered by a client SDK (e.g., React).
Outcome‑oriented personalization: Designers set goals and constraints; the system adapts using user signals (preferences, behavior, environment).

Types: static (fill parameters), declarative (assemble from a registry), fully generated (raw HTML/CSS). Declarative best balances flexibility and reliability. Research flags trust, cognitive load, and fairness risks. Add constraints and a11y guardrails. Personalization can also improve readability (e.g., font/spacing per Readability Matters).

Brief: shift from interface-first to outcome-first. Define capabilities, allowlists, and must/should/never rules per individual. Personas and journeys become dynamic; invest in research, testing, and evaluation. We design outcomes and parameters—the system renders the right interface for the moment.

### Contracts before intelligence - UI schema (DSL): typed JSON/YAML describing pages, layouts, components, bindings. Treat as the API between generators and renderers. - Design system primitives: tokens, layout primitives, and a stable component library with clear props and accessibility guarantees. - Capability map: what the app can do (search, create, export). Compose only from capabilities; never invent them. - Policy & safety: allowlists, prop constraints, data access scopes, redaction, and rate limits. Reject or sanitize invalid schemas. - Observability: structured logs of inputs, chosen variants, user events, and outcomes to drive evaluation.

Proven patterns

Server-driven UI (schema-first): Backend returns a UI schema; client renders. Deterministic and debuggable.
Slot filling: Model fills copy, labels, hints, or validation messages within an approved layout.
Mixed-initiative flows: Assistant proposes; user approves/edits/rejects—no silent changes.

Context signals for adaptation

Front-end-only mini-CDP signals the UI can use:

Behavioral events: page/screen views, clicks on nav items, dwell time, scroll depth, search terms, feature usage (e.g., "FX opened", "BillPay started").
Derived traits: "likes FX", "frequent transfers", "explores offers", "prefers dark mode", "prefers TR locale".
Recency/frequency: last 5 visited menu paths, top 3 actions, last seen balances section.
Explicit preferences (if the user opts in): favorite quick actions, compact vs. comfy layout.

Quick-win demos (client-side only)

All achievable purely client-side with consent + first-party storage and no user identity:

Remembered navigation – If the user browsed deep into Payments → Utilities, next visit shows "Pay Bill" first and collapses rarely used categories.
Actionable insights – Promote "Transfer" and "FX" tiles if used frequently; demote others. Recently used beneficiaries appear inline (stored locally as hashes/aliases, not PII).
Contextual nudges – If the user lingers on "Services," surface a "Set up Auto‑Save" card next visit. If they ignore a banner 3 times, suppress it for 30 days (local streak counter).
Reading mode preference – Toggle compact vs. comfy density based on past toggles + dwell time; remember dark mode.
Search intelligence – If they searched "exchange rates" twice in a week, pre‑expand the FX widget on load.
Micro‑journeys without identity – User taps "Pay Bill," backs out; show a "Continue Bill Pay?" entry point next visit (timer‑gated, local only).
Language/locale nudges – If locale is TR and consistently used, keep it sticky and prioritize TR‑first copygen banners.
Quick‑action reordering – Automatically reorder top 4 quick actions based on frequency + recency.

Mirrors the behavioral part of Insider (events → segments → experiences), but not the cross‑channel/CDP pieces (email, push, journeys), which need a backend.

Guardrails and UX quality

Quality guardrails keep generation safe and consistent. - Determinism boundaries: Models may select from allowlisted components and props—never raw code or untyped HTML. - A11y by default: Components must remain accessible regardless of who (human/model) chooses them; enforce roles, labels, focus order. - Latency budgets: Cache schemas, stream renderable chunks, precompute common variants; degrade gracefully when models are slow/offline. - Consistency & theming: Only generate within tokenized design primitives; treat tokens as hard constraints, not suggestions. - Data hygiene: Validate bindings, throttle queries, and sanitize outputs; never let models emit executable code or unsafe URLs.

Engineering checklist

Define the DSL: Types, versioning, validation (JSON Schema + runtime checks).
Build the renderer: Deterministic schema → component mapping; exhaustive prop validation and safe defaults.
Write policy: Allowlists, prop ranges, PII controls, auth scopes; reject on breach with actionable errors.
Offline-first: Cached templates and non-model fallbacks; never block critical paths on generation.
Evaluation harness: Golden tasks, screenshot diffs, a11y tests, latency/error SLOs, canary rollouts.
Telemetry & feedback: Capture edits/aborts, success metrics, and model rationales to improve selection over time.
GenAIUI Whitepaper (2024)

2025-10-03
1 min read

Local PII Pre-Filter with Microsoft Presidio + Qwen 2.5

Place a small, local PII pre-filter in front of any LLM by combining deterministic, explainable detection from Microsoft Presidio (GitHub, docs) with a tiny CPU SLM such as Qwen 2.5 to catch fuzzy or implicit PII—packaged in containers to run on laptops or servers with no GPU. This pre-ingest/pre-prompt guard blocks, redacts, or annotates content before the LLM sees it, emitting auditable spans, types, and confidences to reduce exfiltration risk and deliver compliance-by-construction. It fits as a pre-ingest, pre-prompt, or guardrail-loop step, forwarding only clean text to downstream copilots/LLMs. Live demo: pii-checker.com.

2025-10-01
1 min read

AI Paradigm Shifts and their Data Requirements

Below is from a presentation I delivered at the hub mapping AI paradigm shift's data requirements to Data. Data narrative is mostly about unification and organization.

What I am really happy about is Microsoft embracing Graphs in GenAI context within Fabric to be able to build better AI products.

What these shifts demand from Data slide