Understanding Agentic Loops: A Practical Guide

July 12, 2026July 12, 2026Sandeep Mewara Leave a comment

Agentic AI has quickly become one of the most discussed topics in software engineering. New frameworks appear almost every week, each introducing new abstractions for planning, orchestration, memory and multi-agent collaboration.

https://learnbyinsight.com/wp-content/uploads/2026/07/agentic-loop-banner.png

These abstractions are valuable – they allow us to build increasingly capable systems without worrying about every implementation detail.

The trade-off, however, is that the underlying execution model becomes less visible.

While exploring different frameworks and architectural patterns, I found it useful to temporarily remove those abstractions and reduce the problem to its simplest form. Rather than asking which framework should I use?, I wanted to answer a more fundamental question:

What execution pattern are these frameworks implementing underneath?

That exercise consistently led back to one simple loop:

Reason ⟶ Act ⟶ Observe ⟶ Evaluate ⟶ Repeat

This sequence, commonly referred to as an Agentic Loop, is the execution pattern behind many modern AI agents. Understanding it provides a useful mental model for reasoning about agent behaviour regardless of which framework sits on top.

To make that pattern tangible, I built a small Streamlit reference application that exposes every iteration of the loop. The application itself is intentionally simple. The goal isn’t to demonstrate sophisticated AI capabilities – it’s to make the execution model visible.

https://learnbyinsight.com/wp-content/uploads/2026/07/agentic-loop-pattern.png

To make this pattern tangible, I built a small Streamlit reference application in Python. The application intentionally solves a very simple problem: basic financial calculations using a calculator tool. The calculator itself isn’t the interesting part. The value comes entirely from watching the loop execute one iteration at a time.

The Loop in Code

Frameworks intentionally abstract the execution engine. Underneath, an agentic loop is simply a control loop that carries state from one iteration to the next.

The heart of my reference implementation is surprisingly small:

def run_loop(self) -> LoopState:
    """Execute the agentic loop: Thought → Action → Observation → Evaluation → Repeat."""
    while not self.state.is_done and self.state.iteration < 4:
        self.state.iteration += 1

        # Phase 1 & 2: Reason + Act
        thought, action = self.think()
        self.state.thoughts.append(thought)
        self.state.actions.append(action)

        # Phase 3: Observe (Execute the tool)
        success, result, error = self.executor.execute(action)
        self.observe(success, result, error)

        # Phase 4: Evaluate progress
        self.evaluate()

    return self.state

Although simple, this loop captures the entire lifecycle of an agent. Each iteration updates the agent’s internal state, performs an action, learns from the result and decides whether another loop is worthwhile.

Note: The hard iteration limit (self.state.iteration < 4) is highly intentional. Agentic systems are inherently non-deterministic, meaning they should always be bounded by explicit stopping conditions. In production systems, these structural circuit breakers prevent runaway execution, excessive token consumption and unnecessary compute cycles.

Following One Execution

To see why this differs from a traditional prompt-response interaction, let’s trace a complete execution pathway.

Suppose a user asks the agent to calculate Apple’s P/E ratio using a stock price of $180.50 and an EPS of $6.50. During the first iteration, the agent identifies the required calculation and generates the mathematical expression to execute.

def think(self) -> tuple[str, str]:
    """Assess the state and plan the calculation."""
    iteration = self.state.iteration

    if iteration == 1:
        thought = "I need to calculate 180.50 / 6.50."
        action = "180.50 / 6.50"

    return thought, action

The calculator executes the string action and returns the raw result: 27.7615. At this point, the agent has an answer, but it does not immediately stop. Instead, it evaluates its current confidence and decides that a validation iteration is worthwhile.

Notice what changed. During the second iteration, the input context is no longer just the user’s raw prompt. The agent now reasons using everything it learned during the first pass:

https://learnbyinsight.com/wp-content/uploads/2026/07/agentic-loop-ex1.png

That second iteration highlights the key differentiator. Rather than reasoning solely from the original user prompt, the agent reasons directly from its own execution history. Without that accumulated state, every iteration would simply repeat the exact same work. State allows each iteration to become slightly better informed than the one before it.

In the reference application, that history is maintained inside a small LoopState container:

@dataclass
class LoopState:
    """Maintains an immutable timeline of agentic loop execution."""
    iteration: int = 0
    thoughts: list[str] = field(default_factory=list)
    actions: list[str] = field(default_factory=list)
    observations: list[str] = field(default_factory=list)
    confidence: float = 0.0
    is_done: bool = False

One Loop, Multiple Behaviors

One observation that surprised me while building this sample project was that I never changed the underlying loop code itself. I only changed the scenario context. The same execution engine naturally produced three entirely different runtime behaviors:

Simple Verification: The agent performs the calculation, verifies the result once, reaches its confidence threshold and exits cleanly. The second iteration acts as a lightweight quality check.
Building Confidence: Some scenarios require additional iterations before the agent decides it has gathered enough evidence to stop. The answer itself may not change, but the confidence score does.
Error Recovery: The most interesting scenario intentionally triggers a tool failure (e.g., an unexpected input anomaly). Instead of crashing, the failure becomes an observation. The next iteration adjusts its reasoning, selects a fallback action and successfully completes the task.

This is where the value of an agentic loop becomes obvious. Errors are no longer fatal application crashes. They are simply additional pieces of information that guide the next turn.

Tool Safety Still Matters

Giving an agent the ability to invoke tools also means treating every generated action as completely untrusted input. In the sample application, the calculator strictly validates every expression before allowing execution:

class SafeToolExecutor:
    """Ensures the agent can only execute safe mathematical expressions."""
    ALLOWED_PATTERN = re.compile(r'^[\d+\-*/().\s]+$')
    
    # Validation logic runs here...

Whether your tool is a calculator, a SQL database connection, a REST API endpoint or a code executor, strict validation belongs explicitly at the tool boundary – not inside the reasoning loop. The agent should remain free to reason creatively. The tool boundary must remain entirely predictable.

Where You’ll See This Pattern

Once you recognize the loop, you will start seeing it across the entire AI ecosystem.

You’ll find this same execution pattern in frameworks such as ReAct, LangGraph, Semantic Kernel, CrewAI, AutoGPT and many enterprise multi-agent implementations.

While they all differ wildly in orchestration styles, tooling capability and memory management, they all rely fundamentally on the exact same underlying pattern: Reason, Act, Observe, Evaluate, Repeat. Understanding this core execution model first makes those larger frameworks significantly easier to master.

More importantly, it changes how you design AI-powered systems. Instead of thinking in single request-response interactions, you begin designing workflows that can verify, adapt, recover from failures and continue working toward a goal with minimal human intervention.

Closing Thoughts

You may never build an agentic loop from scratch in a production system and that’s perfectly fine. Most teams will rely on frameworks that already implement this pattern well. The value of understanding the loop isn’t replacing those frameworks. It’s recognizing the execution model underneath them.

That understanding changes how you approach AI system design. Instead of thinking only about prompts or framework APIs, you begin asking different architectural questions:

Where should reasoning happen?
What actions should the agent be allowed to perform?
What observations should influence the next decision?
What state should persist across iterations?
When should the system decide it has done enough?

Those questions apply regardless of whether you’re building with LangGraph, Semantic Kernel, CrewAI, AutoGen or your own orchestration layer. That was the motivation behind this small Streamlit reference application. Not to build another framework. Simply to expose the execution pattern that many of today’s frameworks quietly implement beneath the surface.

Understanding the loop helps explain how modern AI agents behave. Designing systems around that loop is what ultimately determines whether they become reliable software.

This article intentionally focused on understanding the execution model. Applying that model to design reliable agentic systems introduces a different set of architectural considerations – from workflow design and long-running execution to guardrails and human oversight. That’s a discussion I’d like to explore separately.

The complete, framework-agnostic Streamlit sample used in this article is available on GitHub: https://github.com/sandeep-mewara/agentic-loop.

Happy Looping!

. Sandeep Mewara Github
Tech Explore
Trend
Learn Machine Learning with Examples
Machine Learning workflow

Persistent Structural Memory for AI: The Architecture Behind Infigraph

June 28, 2026July 12, 2026Sandeep Mewara Leave a comment

In my previous article, I wrote about what I called Code Blindness – the hidden operational cost of forcing AI assistants to repeatedly rediscover the structure and architectural relationships that already exist inside our codebases.

https://learnbyinsight.com/wp-content/uploads/2026/07/infigraph-blog-banner.png

Today’s coding assistants can inspect local files, trace explicit imports and painstakingly piece together relationships to answer familiar engineering questions:

– Who calls this function?
– What breaks if I alter this API route?
– Which services depend on this component?
– What is the true blast radius of this change?

These aren’t difficult questions because the code is hard to read. They’re difficult because the relationships that answer them aren’t explicitly available. Every new AI session reconstructs them from source code, only to discard that understanding when the conversation ends.

That observation eventually led us to build Infigraph – an attempt to turn software structure into reusable, local infrastructure.

When we recently open-sourced the project, one question came up repeatedly:

“What makes Infigraph different from the other code intelligence and code graph projects already out there?”

It’s a fair question.

The ecosystem already has tools for code search, static analysis, architecture visualization and AI-assisted development. Some focus on helping engineers navigate code. Others generate knowledge graphs for LLMs, visualize architecture or build richer retrieval pipelines.

We weren’t trying to build another code intelligence tool.

We were trying to build a local-first, persistent structural memory layer that AI assistants could query directly instead of repeatedly reconstructing software relationships from source code.

That objective influenced nearly every architectural decision we made – from how code is parsed, to how relationships are extracted and stored, to how AI agents retrieve information.

Looking back, those decisions weren’t independent optimizations. They were consequences of a single design principle:

If software structure changes far more slowly than AI conversations, then structural knowledge should be treated as infrastructure and not something rebuilt from scratch for every prompt.

This article walks through the engineering decisions that followed from that principle, the tradeoffs we accepted and the lessons we learned while building Infigraph.

The System Blueprint

Before discussing the individual architectural decisions, it’s worth understanding how the pieces fit together. At a high level, Infigraph continuously transforms a codebase into a persistent structural representation that AI assistants can query directly. Instead of rediscovering relationships for every conversation, those relationships become shared infrastructure.

https://learnbyinsight.com/wp-content/uploads/2026/07/infigraph-architecture-overview.png

The graph doesn’t replace the language model. It changes the question the language model has to answer. Rather than treating every prompt as an isolated reasoning exercise, Infigraph treats structural understanding as persistent infrastructure.

The important observation isn’t the individual technologies. It’s where the computational effort moves.

Traditional AI workflows spend most of their effort reconstructing architecture from source code every time a question is asked. Infigraph moves that work to indexing time. Parsing source code, resolving symbols, understanding imports and discovering relationships happen once – when the repository is indexed. Every subsequent question becomes a retrieval problem instead of a reconstruction problem.

https://learnbyinsight.com/wp-content/uploads/2026/07/infigraph-workflow.png

That architectural shift immediately imposed a new set of engineering requirements. We needed:

A storage engine optimized for relationship traversal rather than document retrieval
A retrieval layer that could combine graph queries with traditional search
A parsing architecture capable of understanding modern polyglot codebases without becoming language-specific

The next three sections explain how those requirements shaped Infigraph’s architecture.

Decision #1: Represent Code as a Persistent Graph

The first architectural decision was to decide how software itself should be represented.

A software system isn’t just a collection of source files. It’s a network of explicit relationships. A function call is an explicit relationship. An import statement expresses a dependency. A class hierarchy defines inheritance. Module boundaries already exist whether an AI model discovers them or not. We needed a statically discoverable representation where relationships were first-class citizens.

That naturally led us to a graph.

Instead of storing source files as isolated text, Infigraph persists a connected topology of software entities and the relationships between them.

https://learnbyinsight.com/wp-content/uploads/2026/07/infigraph-graph-setup.png

Once relationships become explicit, architectural questions stop being text-search problems. Familiar engineering questions become a graph traversal. The system isn’t reading raw source code inside an LLM reasoning loop to find callers. It is traversing an index that already knows they exist.

Once we committed to representing software as a graph, the next question became much more practical:

What kind of graph engine could support interactive AI workflows without becoming another server-side dependency?

That question shaped our next architectural decision.

Decision #2: Persist Structural Memory Locally

We could have stored the graph in a traditional client-server database. We could have relied on a managed graph service. Or we could have generated structural context on demand through cloud-hosted retrieval pipelines.

All of those approaches work.

But they conflicted with one of our architectural constraints from the very beginning:

Structural knowledge should live alongside the repository, not behind another network boundary.

That single constraint influenced far more than our storage engine. It shaped the entire architecture.

If AI assistants increasingly become part of the developer’s inner loop, structural knowledge should be available with the same characteristics developers already expect from their source code:

local
immediately accessible
private
independent of cloud connectivity

That immediately narrowed our design space. We needed a graph engine that was:

embedded rather than server-based
lightweight enough to ship with the developer environment
optimized for large relationship traversals
capable of answering structural queries within an interactive AI workflow

That led us to KuzuDB , an embedded, columnar graph database designed around analytical graph workloads rather than transactional business operations. The workload wasn’t updating records, it was traversing relationships. A columnar storage engine aligns well with that access pattern because it can efficiently scan relationship data without repeatedly loading complete records.

The architectural layout shift here is central to performance. The difference isn’t the graph model – it’s the storage layout:

https://learnbyinsight.com/wp-content/uploads/2026/07/infigraph-kuzu-db.png

Performance Benchmarking

When traversing deep, multi-hop dependency tracks across half a million nodes, we rarely need to unpack full, heavy row configurations. Benchmarks were run on representative repositories and consistently observed substantially lower traversal latency for deep dependency walks.

The important result wasn’t the absolute latency. It was that the storage layout aligned far better with the traversal-heavy workload of AI-assisted development.

https://learnbyinsight.com/wp-content/uploads/2026/07/infigraph-perf-kuzuDB.png

Like every architectural decision, it came with tradeoffs. KuzuDB is a younger ecosystem than some of the established graph platforms. We consciously traded ecosystem maturity for an embedded architecture that better matched the interaction model we were trying to enable. Looking back, that tradeoff shaped much more than storage. Once structural memory became local and inexpensive to traverse, the next challenge was no longer storage.

Decision #3: Retrieve Structural Context Before Reasoning

Persisting structural knowledge solved only half the problem. The remaining challenge was retrieving the right structural context quickly enough that an AI assistant never needed to fall back to reading large portions of the repository.

At first, it seemed tempting to rely on a single retrieval strategy. Keyword search is excellent when an engineer already knows the exact symbol they’re looking for. Semantic search is better when they describe an idea rather than an identifier. Graph traversal is indispensable when the question is fundamentally about relationships. But, none of these approaches is sufficient on its own.

Different questions require different retrieval strategies.

Instead of trying to force every question through a single search engine, Infigraph combines multiple retrieval mechanisms, each optimized for a different type of query.

https://learnbyinsight.com/wp-content/uploads/2026/07/infigraph-hybrid-retrieval.png

We built a local-first, parallel hybrid retrieval pipeline where each engine contributes a different signal:

BM25 (Exact Retrieval): Fast, deterministic lookup for symbols, filenames, identifiers and keywords
Semantic Retrieval (Model2Vec): A bundled 29 MB embedding model retrieves conceptually similar code without relying on external embedding APIs
Regex Retrieval: Captures explicit syntactic conventions, decorators, annotations and language-specific patterns that keyword and semantic search may overlook

Once these candidate starting points are identified, Graph Traversal takes over. The retrieval layer expands those candidate matches into architectural context.

If retrieval is part of the developer’s inner loop, it should remain just as local and self-contained as the graph itself. That led us to build the entire retrieval pipeline including keyword indexes and semantic embeddings to execute locally without depending on external services.

The goal wasn’t simply lower latency. It was to ensure that structural understanding remained available regardless of network connectivity, while keeping source code inside the developer’s environment. The retrieval layer shouldn’t decide what the model thinks. It should decide what the model needs to think about.

Making Structural Memory Consumable

Building a retrieval pipeline solves only part of the problem. The other half is exposing that structural knowledge in a way AI assistants can consume naturally. Rather than embedding graph traversal logic into individual coding assistants, Infigraph exposes focused capabilities – symbol lookup, dependency traversal, call graph exploration and structural search – through the Model Context Protocol (MCP).

That separation was intentional.

The graph remains the system of record. MCP becomes the interface through which AI assistants access that knowledge. Whether the client is Claude Code, Cursor, GitHub Copilot, Windsurf or another MCP-compatible tool, they all interact with the same persistent structural memory instead of rebuilding it independently.

This reinforces the same architectural principle that shaped the rest of Infigraph:

Structural knowledge should be shared infrastructure. MCP simply makes that infrastructure accessible.

The final challenge was making that extraction scale across the reality of modern polyglot systems.

Decision #4: Decouple Structural Extraction from Language

Very few systems live entirely within a single language. A typical request may begin in a TypeScript frontend, flow through a Java service, invoke a Python-based machine learning component and finally interact with SQL or infrastructure configuration. Supporting that reality required more than adding parsers. It required separating the extraction engine from language-specific syntax.

That became our final architectural decision:

The extraction pipeline should remain stable as language support grows.

Instead of writing language-specific logic inside the core engine, Infigraph separates parsing from extraction.

https://learnbyinsight.com/wp-content/uploads/2026/07/infigraph-multi-lang.png

To support both mainstream languages and enterprise-specific grammars, we built a dual-extraction architecture:

For mainstream languages, we rely on Tree-sitter grammars and declarative queries to identify structural entities such as symbols, imports, calls and inheritance.
For proprietary languages, internal DSLs or environments where Tree-sitter isn’t the right fit, Infigraph provides an ANTLR-based extension path. New grammars can be added without modifying the extraction engine itself.

That separation turned out to be more valuable than we initially expected.

Once parsing produces a common structural representation, everything else in the architecture remains unchanged. Every additional language increases the capability of the platform without increasing the complexity of its core.

Today, that approach allows Infigraph to support 62 languages out of the box while remaining extensible for environments that need more. Persistent structural memory shouldn’t become more complicated every time your software ecosystem grows. By separating extraction from language, we made language diversity an extension point instead of an architectural constraint.

The Landscape: Where Infigraph Fits

Most code intelligence platforms are ultimately designed around one of two consumers:

Humans, who need to search, visualize, analyze or understand software systems.
Analysis engines, which evaluate code for correctness, security, compliance or quality.

Our primary consumer is different. It’s an AI assistant operating inside a developer’s editing loop.

Projects such as SciTools Understand, Sourcegraph, Joern and newer AI-native graph initiatives have each pushed the ecosystem forward in different ways. Many engineers already rely on them successfully.

Our goal wasn’t to replace those tools. It was to optimize for a different execution model.

The architectural differences become clearer when viewed through the problems each category was designed to solve. The differences aren’t primarily about features. They’re about architectural optimization. Each category solves a different problem and therefore makes different tradeoffs.

Dimension	Human-Centric Platforms	AI Knowledge Builders	Infigraph
Typical Examples	SciTools Understand, Sourcegraph, Joern	Understand-Anything, Graphiti, Nomik and similar projects	Infigraph
Primary Consumer	Engineers & Architects	AI knowledge generation workflows	AI coding assistants
Structural Extraction	Parser / index-based	Often combines parsing with LLM summarization	Deterministic parser-based extraction
Deployment Model	Desktop or centralized infrastructure	Frequently cloud-assisted	Local-first embedded infrastructure
Primary Interaction	Search, navigation, visualization	Repository understanding and documentation	Real-time MCP tool calls
Optimization Target	Human understanding	AI-generated repository knowledge	Persistent structural memory for AI

These categories aren’t mutually exclusive. In many organizations they complement one another. The difference lies in which problem each one is optimized to solve. This distinction matters because our optimization target was fundamentally different.

We weren’t building another interface for engineers to explore repositories OR building another cloud pipeline that asks an external LLM to understand a repository before a developer can ask a question.

We were trying to answer a much narrower architectural question:

How do we make structural knowledge continuously available to AI assistants without paying to rediscover it every conversation?

That single question explains almost every architectural decision described in this article.

Represent software as a persistent graph
Persist structural memory locally
Retrieve structural context instead of raw files
Expose that knowledge through MCP
Keep extraction extensible across languages

Everything else follows from that design center.

Choose Infigraph when…

Your primary development workflow revolves around AI coding assistants, such as Claude Code, Cursor, GitHub Copilot, etc
Your agents repeatedly ask structural questions about callers, dependencies, ownership or impact analysis
You want local-first execution without repeatedly sending repository context to external services
You want persistent structural context that survives beyond individual AI conversations

Continue using existing tools when…

Your primary need is enterprise-scale code search
You’re performing security or compliance analysis
You need architecture visualization or reverse engineering for human exploration

Instead of asking the AI to reconstruct relationships every session, Infigraph provides them as persistent structural memory that can be queried locally in milliseconds.

Our goal isn’t to replace the existing code intelligence ecosystem. It’s to become the lightweight local-first, structural memory layer that complements it for AI-native software development.

Looking Ahead

I don’t think Infigraph is the final answer to AI-native software development. In fact, I suspect we’re only beginning to define what this architecture layer should become.

Today, persistent structural memory captures relationships between software entities. Tomorrow, it may also incorporate architectural evolution, ownership boundaries, runtime behavior, operational telemetry, organizational knowledge and historical change patterns.

The better AI becomes at generating code, the more important these structural layers become. Generated code is only valuable if it fits coherently inside the system around it. I believe our responsibility is gradually shifting toward building better representations of the systems AI increasingly helps us evolve.

That’s ultimately why we open-sourced Infigraph .

Not because we think we’ve solved the problem, but because we believe persistent structural memory is an architectural direction worth exploring together.

If this way of thinking resonates with you, I’d encourage you to try Infigraph against your own repositories, challenge the assumptions we’ve made and contribute where you think the architecture can be improved.

We’re still learning.

Hopefully, we’ll learn together.

. Sandeep Mewara Github
Tech Explore
Trend
Learn Machine Learning with Examples
Machine Learning workflow

https://learnbyinsight.com/wp-content/uploads/2026/07/infigraph-vertical-light.png

GitHub : https://github.com/intuit/infigraph
Documentation : Detailed design specs and contribution guidelines are included in the repo.

Agentic Development: From Divergence to a Self-Evolving Platform

June 21, 2026June 21, 2026Sandeep Mewara Leave a comment

In my previous article, I shared a perspective that divergence is a natural and often necessary phase of agentic development adoption. As organizations move from a handful of alpha teams toward broader adoption, multiple implementations like PDLC orchestrators, agent workflows and skills naturally emerge.

The challenge is not divergence.

The challenge is ensuring that divergence eventually compounds into reusable organizational capability rather than permanent fragmentation. The next logical question becomes:

How do we converge without losing the innovation, learning and investment accumulated during exploration?
More importantly, how do we unify architecture without selecting arbitrary winners and forcing teams to discard everything they have built?

This article proposes a practical strategy that treats early duplication not as architectural waste, but as a high-value parallel testing playground. It maps out the transformation from Managed Divergence to an Overrideable Platform Baseline, offering an operational roadmap, clear component definitions, code blueprints and a self-evolving governance model designed to scale AI execution safely across the enterprise.

The intent is not to prescribe a single path. It is to share one possible blueprint based on observation, experimentation and lessons learned while navigating agentic adoption.

1. The Core Principle: Converge Capabilities

Successful architecture convergence follows an advanced strategy where instead of crowning individual team’s codebase as “standard”, it asks a far more valuable question: Which specific capabilities have consistently delivered production value across our teams?

This subtle shift changes everything. Instead of comparing entire software implementations as rigid and monolithic blocks, we break them down to harvest their underlying functional components:

Requirements Analysis & Guardrailing
Automated Architecture Review Loops
Distributed Traceability & Prompt Diagnostics
Human-in-the-Loop Approval Gates
Context Management & Core Memory Primitives

The goal is not to converge implementations but capabilities.

Standard vs. Baseline

This distinction becomes critical to our engineering culture.

A Standard says: Everyone must use this.
A Baseline says: Everyone starts here unless they have a compelling reason not to.

Standards optimize for control, while baselines optimize for adoption. In fast-moving domains such as agentic development, baselines tend to age much better than mandates.

The baseline should provide shared governance, shared infrastructure, shared orchestration and shared capabilities, while still allowing product-specific workflows, domain-specific skills, specialized agents and contextual optimizations.

Convergence Readiness Signals

We look for specific architectural markers to know when an engineering organization is ready to move into a baseline:

Signal	Meaning
Multiple teams solving the same problem	Candidate for convergence
Repeated primitives emerging	Baseline opportunity
New adopters unsure where to start	Discovery problem
Maintenance burden increasing	Consolidation opportunity
Overrides becoming common	Promote capability evaluation

2. The Platform Evolution View

The platform should continuously absorb innovation rather than requiring periodic reinvention. To track this loop, we map the entire platform evolution path in one comprehensive architectural view:

This structural flow establishes a permanent, virtuous evolution loop across our active software engineering ecosystem.

By implementing this transition blueprint, we do not ask teams to throw away their clones. Instead, we catalog what they built, harvest repeated capabilities, engineer reusable core components and let teams adopt the baseline progressively.

3. The Two-Track Convergence Model

Convergence should happen through two coordinated, pipelined tracks. At the organizational level, these tracks run in parallel to maintain velocity. At the component level, they operate sequentially, ensuring that technical construction is always guided by harvested organizational data.

We run Track A (Knowledge Convergence) to harvest organizational data alongside Track B (Technical Convergence) to engineer the reusable execution environment. As soon as Track A distills early patterns (like standard telemetry schemas), Track B immediately begins platform construction on those components while Track A moves ahead to discover more complex primitives.

4. The Target Reference Architecture

Once structural patterns become visible, we construct a reusable platform baseline. I believe a Core + Hook architecture provides the right balance between consistency and flexibility. It offers unified enterprise boundaries while providing localized teams complete room to execute custom intelligence loops.

Layers Deep-Dive

Knowledge & Context Layer: Holds trusted organizational knowledge (ADRs, Standards, APIs, Design Systems) that acts as the ultimate source of truth, thus helping agents to not experience semantic drift. For example, an Architecture Agent retrieves existing architecture decisions and approved patterns before generating recommendations.
Platform Contracts Interface Layer: The missing enabler that defines standard interfaces between governance and execution components. Standardized schemas enable convergence while keeping the platform baseline implementation-agnostic.
Skills & Tools Layer: Supplies the shared, stateless action utilities that agents ingest natively. While short-term memory retrieval policies can be altered at the override layer, the underlying database connection pooling, key-value serialization and caching infrastructure reside permanently here. The principal guideline: Agents consume skills; agents do not rebuild skills.
Agent Capabilities Layer: Houses standardized, virtual organizational personas that model common roles. For example, Requirements Agent converts business requests into structured requirements. Architecture Agent turns requirements into architecture proposals. QA Agent converts implementation plans into test strategies.
Baseline Orchestration Layer: Defines the default, customizable, cross-cutting multi-stage lifecycle flow (Requirements → Planning → Architecture Review → Implementation → Testing → Release Readiness). Traceability, HITL gates and security checks remain mandatory, while sequence and domain reviews are overrideable.
Override & Extension Layer: Preserves rapid innovation directly at the product edge. Squads inject custom business logic (e.g. Healthcare Compliance Agents, Tax Validation Agents) while keeping the core platform baseline stable.
Registry & Discovery Layer: Makes reusability easier. It continuously indexes and exposes real-time ownership, quality metrics and metadata context for all active agents, skills, workflows and patterns.
Experience & Adoption Layer: Makes adoption easier than unmanaged divergence. It contains starter kits, documentation templates and migration guides.

Cross Cutting Verticals

Governance & Control Plane Layer: The silent operational engine running underneath all execution steps. It enforces non-negotiable enterprise requirements – such as real-time PII data masking, automated token rate-limiting and immutable audit logs.
Evaluation Plane Layer: The objective, metrics-driven testing environment. It relies on standard benchmarks, regression runners and corporate golden datasets to continuously measure agent quality, runtime cost and latency profiles. It is also responsible for determining whether localized overrides should remain product-specific or be promoted into the platform baseline. This ensures the platform continuously learns from future divergence rather than treating it as technical debt.

5. The Transition Architecture

The layered reference architecture describes our target destination. The missing operational bridge is: How do we transform existing, divergent implementations into structured, reusable platform assets?

This is where Capability Mapping and Core Component Engineering come together.

We systematically catalog what product squads built, extract the underlying functional features and engineer them into decoupled platform primitives.

The Component Classification Model

To create a frictionless, predictable path from localized experimentation to enterprise-wide reuse, every discovered capability is classified into one of five structural component types in the plaform engine:

Component Type	Real-World Example	Target Placement Slot
Contract	Standard Trace Schema, Unified HITL Payloads, Input/Output Boundaries	Platform Contracts / Governance Interface Plane
Platform Service	Centralized OpenTelemetry Engines, Token Cost Budgeting, Security Sanitizers (PII Redaction)	Control Plane / Shared Operational Primitives
Skill	Semantic Vector Search, Document Extraction APIs, Active Cache Primitives	Reusable Skills Layer
Agent	Automated Architecture Compliance Reviewers, Virtual PMs, Testing Primitives	Reusable Agent Layer
Workflow	Default multi-stage PDLC Orchestration Flow, Agent-to-Agent Critique Loops	Baseline Orchestration Layer

This mapping creates a direct path from experimentation to reuse, changing the migration conversation from a compliance chore to an evolutionary contribution.

6. Core Runtime Platform

The Common Core Runtime is owned, versioned and centrally governed by the platform team. It abstracts away standard non-functional requirements (NFRs) and cross-cutting platform concerns, ensuring they are never rebuilt by individual product teams:

Core Capability	Platform Examples
Security	PII Masking, Prompt Injection Detection, Compliance Enforcement
Reliability	Retry Logic, Circuit Breakers, Fallback Models
Cost Controls	Token Budgets, Usage Quotas, Model Throttling
Observability	Logs, Traces, Metrics

Individual product teams interface with this core natively through a decoupled Lifecycle Hook Registry. The registry acts as our primary stitching layer:

The design boundary is explicit: The platform owns lifecycle management, while local product teams inject contextual intelligence.

7. The Enterprise Repository Blueprint

To ground this architecture in production reality, the entire ecosystem is organized into a single, unified repository structure. This layout provides an explicit separation between our core platform governance, reusable inner-source capabilities and localized product extension:

To complement the architecture and operating model described here, I have also created a companion repository that translates these concepts into a concrete implementation structure. It includes a reference layout. My hope is that it helps make the ideas discussed in this article easier to visualize, evaluate and adapt within different organizational contexts.

Companion Repository: https://github.com/sandeep-mewara/agentic-platform

8. Ownership Model: The Core Matrix

To eliminate delivery ambiguity between centralized infrastructure squads and individual product feature teams, ownership across the platform layers is distributed cleanly:

Architectural Artifact / Layer	Primary Engineering Owner
Common Core Platform & Registry	Platform Team
Governance & Control Plane	Platform Team
Evaluation Plane	Platform Team + Architecture Council
Shared Skills Pool	Platform Team + Distributed Contributors (Inner-Source)
Shared Agent Personas	Dedicated Capability Owners (Specialized Feature Squads)
Product Overrides & Domain Logic	Individual Product Teams
Baseline Core Release Sign-Off	Architecture Council

9. The Progressive Adoption Runbook

Instead of launching a high-risk “big-bang migration” that stalls existing feature roadmaps, product teams adopt the baseline progressively using a step-by-step outside-in sequence:

Step 1: Keep local clones active. Do not freeze or disrupt active product delivery.
Step 2: Connect unified telemetry. Integrate the platform’s standardized logging and tracing hooks to get instant cost and visibility tracking.
Step 3: Offload foundational plumbing. Replace duplicate local code with common core platform services like PII sanitizers and token budgets.
Step 4: Swap out common skills. Deprecate redundant local utility code in favor of shared utilities from the Inner Commons repository.
Step 5: Migrate the core orchestrator. Hot-swap the custom local execution loop for the official ConvergedAgentOrchestrator base engine.
Step 6: Register unique overrides. Protect remaining specialized behavioral prompts and domain-specific tools by preserving them cleanly inside the platform hook registry.

10. Common Failure Modes to Monitor

When executing a convergence strategy, architectural roadblocks are rarely purely technical. They are usually cultural and operational. Watch for these critical organizational anti-patterns:

Converging implementations instead of capabilities: Trying to force everyone onto a single team’s codebase instead of extracting core features.
Selecting architectural winners too early: Enforcing standardization before running an audit track, destroying valuable edge-case engineering.
Creating rigid standards instead of flexible baselines: Building rigid mandates that force teams to bypass the platform entirely to ship specialized capabilities on time.
Omitting critical extension hooks: Designing a centralized orchestration engine that lacks an open, extensible Lifecycle Hook Registry.
Operating without an evaluation mechanism: Making architectural decisions based on subjective opinion rather than objective benchmarks and golden datasets.
Treating convergence as a forced migration exercise: Framing the entire platform shift as a compliance checkbox for engineering teams.

Final Thoughts

One vital lesson I continue to learn through agentic adoption is this:

Convergence is not the opposite of divergence. Done correctly, convergence is built on top of divergence.

The experimentation, duplication and architectural variation that feel messy in the moment contain the exact technical insights required to build a stronger foundation. The goal should not be to eliminate divergence as quickly as possible. The goal is to learn from it intentionally enough that convergence becomes the natural, frictionless next step for your engineering culture.

Furthermore, the platform should be designed so that future divergence can be harvested, evaluated and promoted back into the baseline over time – transforming your core infrastructure into an evolving, learning system.

Enable divergence. Harvest learning. Build a baseline. Preserve flexibility. Evolve continuously.

. Sandeep Mewara Github
Tech Explore
Trend
samples GitHub Profile Readme
Machine Learning workflow

The Hidden Cost of Code Blindness in the Age of AI

June 14, 2026June 14, 2026Sandeep Mewara Leave a comment

Last month, I was looking over the shoulder of one of our engineers as they worked with an AI coding assistant. They asked a question that should have been entirely straightforward: “Who calls the validate_user function in our codebase?” The answer eventually came back. But watching them get there required a familiar and surprisingly expensive loop: reading multiple files, tracing imports, reconstructing call paths and inferring relationships that already existed inside the system.

As we stood there brainstorming around their screen, a realization struck us. What broke the workflow wasn’t the token count. It was the repetition. If anyone on the team opened a new session tomorrow and asked the exact same question, the model would perform much of the same work all over again. The relationship hadn’t changed. The code hadn’t changed. Only the cost and our collective time had.

The Cost of Rediscovery

That moment exposed a fundamental flaw in how we approach AI-assisted development. The problem isn’t that AI is inherently expensive. The problem is that AI keeps paying a premium to repeatedly rediscover the same foundational knowledge. What looks like a token limitation is actually a structural understanding problem. And that’s ultimately why our engineering team set out to build Infigraph.

AI Has Context. It Doesn’t Have Structure.

The last few years have been dominated by a single, brute-force idea: give AI more context. Bigger context windows, more capable models, better reasoning and smarter agents. All of those advances matter. But many of the questions engineers ask every day aren’t really code-understanding questions. They are system-understanding questions.

Engineers ask questions like: Who calls this function? What breaks if I change this API? Which services depend on this component? What is the blast radius of this change?

These are not primarily language problems. They are relationship problems. They are graph problems. A model can read raw text files incredibly well, but what it lacks is a persistent understanding of the architecture that connects those files together. Software systems are not just collections of files, they are collections of relationships. The industry has spent years teaching machines how to read code, but we are only beginning to teach them how to understand systems.

The Economics of Reconstructing Knowledge

Every engineering organization already possesses a vast amount of implicit structural knowledge. The system already knows which modules depend on each other, which symbols are reachable, which services communicate and which changes create downstream impact. Yet, most AI workflows require that knowledge to be rediscovered from first principles, repeatedly.

When you ask who calls validate_user, the model reads files and reconstructs relationships. Open a new session tomorrow, ask the same question and the model performs much of the same work again. The relationship didn’t change, but the cost did.

We don’t rebuild database schemas every time a SQL query executes and we don’t rebuild search indexes every time a user types a keyword. We persist structure because persistence is more efficient than rediscovery. Software systems deserve the same treatment:

Persist the knowledge once. Query it many times.

The Shift I Think We’re Entering

I don’t pretend to have all the answers for how AI and complex architectures will evolve together. But as an architect looking at how our workflows are changing, I know where the responsibility is moving. Historically, our primary effort as developers was spent translating intent into syntax. Increasingly, AI handles that translation smoothly. As that happens, the bottleneck shifts away from writing code and toward understanding architecture, change impact, dependency boundaries and system behavior.

The better AI becomes at generating code, the more critical structural understanding becomes. Generated code is only an asset if it fits correctly inside the system around it. Otherwise, it’s just technical debt written at supersonic speed. We would never build an application that rediscovered its data schema for every transaction, yet that is effectively how many AI-assisted workflows approach codebases today.

Why We Built Infigraph

As we discussed this pattern internally, a simple question emerged: If structural knowledge is repeatedly rediscovered, why aren’t we persisting it? Instead of parsing relationships from raw source files every time a question is asked, what if those relationships were represented directly? What if structural understanding became infrastructure?

That idea became Infigraph. Infigraph creates a persistent representation of codebase structure that AI agents can query directly. Rather than repeatedly reading files to discover relationships, agents can ask questions about relationships that already exist. The goal was never to replace AI reasoning; the goal was to make AI contextually aware of the broader systems it operates within.

Same Question. Same Codebase. Different Architecture.

Three principles shaped our approach:

Structure First: Code contains explicit relationships. Those relationships deserve first-class, deterministic representation.
Local First: Code intelligence should be private, fast, and fully available even when disconnected from the cloud.
Polyglot Reality: Real systems span many languages, frameworks, technologies, and internal platforms. Infigraph currently supports 63 languages out of the box because the tool should adapt to your system—not the other way around.

The Byproducts of Structural Awareness

Cost is simply the easiest metric to measure, but it isn’t the most important outcome. The more important outcome is quality. When structural relationships are treated as a foundational layer, the system answers questions with greater consistency and more complete coverage than transient inference from raw files can reliably provide.

A cheaper answer is useful, but a more complete answer is transformative. Architects care about correctness, engineering leaders care about confidence and developers care about understanding impact before making a change. Structural awareness improves all three.

When we stopped asking, “How do we slash our token bill?” and started asking, “Why are we repeatedly paying to rediscover the same relationships?” the economics fell into place naturally. Fewer files needed to be pulled into context, tool call chains became shorter, latency dropped and cost followed. Cost savings are not the primary innovation but they are a consequence of eliminating redundant engineering work.

Why We Open-Sourced It

We originally built Infigraph to solve systemic problems inside our own development workflows. But as more engineers and teams began using it, we realized that this challenge isn’t unique to us. The entire industry is moving aggressively toward AI-assisted development while software systems continue growing larger and more interconnected.

Those two trends collide around a simple question: How do we help machines understand software systems, not just individual files? We know the current trajectory: repeatedly paying to rediscover knowledge that already exists within our own codebases. That model isn’t sustainable. We believe the next step deserves community participation, scrutiny and collective engineering.

That’s why we released Infigraph as an open-source project under the Apache 2.0 license. Not because we think it’s finished, but because we believe this is a direction worth building together.

What’s Next

This article focused on the core problem. The next article (in upcoming week) will focus entirely on the engineering decisions behind our approach from graph-based representations and retrieval strategies to the tradeoffs we encountered while building local-first code intelligence.

But you don’t have to wait for that deep dive to start exploring.

⭐ Star the repository on GitHub and follow the project.
👥 Assess, Contribute and raise PR.
🚀 Install it and try it now against your own codebase.

If you hit issues, open a GitHub issue. If you want to contribute, whether that’s a new language parser, search improvements or new MCP integrations, we’d love to collaborate.

Thanks for reading. And, a special thanks to the engineers on our team who transformed a whiteboard conversation into a tool we can now share with the broader community.

. Sandeep Mewara Github
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow

Building a Guardrail Control Plane for Agentic AI

May 24, 2026May 24, 2026Sandeep Mewara Leave a comment

In my earlier article on Architecting Guardrails: the Control Plane for Agentic AI, I explored why guardrails can no longer be treated as isolated validators sitting at the edge of an LLM workflow. As agents gain autonomy, guardrails increasingly become part of the system’s operational control plane itself.

https://learnbyinsight.com/wp-content/uploads/2026/05/gaurdrail-control-plane-1.png

The Execution Gap

What that article intentionally did not explore in depth was the runtime architecture behind that idea because the real challenge begins after the model generates a response.

Most AI guardrails today still focus primarily on prompts and outputs:

Moderation APIs
Jailbreak filters
Output classifiers
Prompt hardening

That architecture made sense when models were passive generators. But autonomous agents do not simply generate text. They invoke tools, mutate state, persist memory, trigger workflows, coordinate infrastructure and operate across multiple execution boundaries. At that point, semantic safety alone becomes insufficient.

A production system can remain technically “safe” while still failing operationally:

An agent enters a recursive retry loop
Exceeds runtime budget limits
Escalates permissions unintentionally
Persists corrupted reasoning into memory
Triggers irreversible downstream actions

This is no longer a content moderation problem. It is a runtime systems governance problem.

Runtime Mediation

The core architectural shift is moving from edge filtering to runtime mediation.

Guardrails are not filters around the model. They are policy enforcement layers around behavior.

The model proposes intent. The control plane determines whether that intent is permissible within the current operational context. That distinction becomes critical in agentic systems because execution is no longer a single deterministic path.

The operational challenge is no longer just “What did the model say?” It becomes:

What did the agent attempt to do?
Under what authority?
Against which systems?
With what runtime constraints?
Under which policy version?
With what blast radius if wrong?

This is where traditional guardrail architectures begin to break down.

Traditional vs. Agentic Guardrails

Traditional Guardrails	Agentic Guardrails
Validate content	Govern execution
Static checks	Runtime mediation
Prompt-centric	Action-centric
Edge filtering	Distributed enforcement
Single request	Multi-step orchestration

Decoupling Policy from the Workload

One of the most common mistakes in early agent deployments is embedding guardrails directly inside prompts, orchestration chains or tool wrappers. At small scale, this appears manageable. At production scale, it becomes operationally fragile.

A control plane embedded inside the workload eventually becomes invisible to governance.

Once policy becomes tightly coupled with agent reasoning, business rules drift across agents, enforcement becomes inconsistent, operational audits become fragmented and policy changes require redeploying probabilistic systems. More critically, if the reasoning path itself becomes compromised, the protections embedded within that reasoning path are compromised alongside it.

Modern distributed systems solved this problem years ago by externalizing governance into identity providers, policy engines, API gateways and service meshes. Agentic systems require the same separation:

https://learnbyinsight.com/wp-content/uploads/2026/05/gaurdrail-decoupled-archi.png

The agent reasons. The infrastructure governs. That separation becomes the deterministic boundary around probabilistic execution.

The Guardrail Control Plane

A production-grade guardrail system is not a single validator sitting at the edge of the model. It is a layered runtime mediation architecture intercepting execution decisions throughout the agent lifecycle.

The goal is not to “block bad outputs”. The goal is to continuously govern autonomous execution.

Layer 1: Identity and Request Policy

Agents should inherit constrained authority, not implicit trust. One of the fastest ways to destabilize an agentic system is giving agents broad infrastructure permissions through generic service accounts. Most production failures begin with over-scoped execution authority.

The control plane must continuously mediate scoped identities, tenant isolation and user-bound execution contexts. The operational principle is simple: the agent should never possess more authority than the initiating user or workflow context.

def enforce_identity_policy(session_context, proposed_action):
    permitted_tools = identity_registry.get_tools_for_role(
        session_context.user_role
    )

    if proposed_action.tool_name not in permitted_tools:
        raise SecurityBoundaryException("Unauthorized tool access attempt.")

    proposed_action.context.auth_token = (
        session_context.impersonation_token
    )

The important detail is not the implementation itself. It is the mediation boundary. The agent does not directly decide what it is allowed to execute. Infrastructure policy does.

Layer 2: Planning Constraints

Planning without constraints becomes speculative execution. Traditional software systems operate through deterministic execution paths. Agentic systems dynamically generate execution topology at runtime.

Left unconstrained, agents tend to produce recursive loops, cyclic dependencies, retry amplification, unstable orchestration chains and excessive planning depth.

One of the more subtle realities of production agent systems is that failures rarely appear catastrophic initially. They resemble ordinary infrastructure anomalies: elevated retries, abnormal tool sequencing, execution fan-out or accelerating token usage. By the time the final output visibly appears incorrect, the operational deviation has often already propagated several layers into the system.

The control plane must therefore mediate orchestration before infrastructure resources are committed.

def validate_planning_topology(execution_graph, current_depth):
    MAX_DEPTH = 8

    if current_depth > MAX_DEPTH:
        raise LoopDetectedException("Maximum orchestration graph depth breached.")

    if contains_cyclic_dependencies(execution_graph):
        raise InvalidPlanException("Cyclic loop detected in generated plan topology.")

Exception handling assumes known failure paths. Agentic systems generate failure paths dynamically.

Layer 3: Runtime Enforcement

Most production failures are economic before they are semantic. While security teams focus on prompt injection, infrastructure teams watch token consumption graphs turn vertical.

Autonomous agents introduce entirely new operational failure modes: retry storms, recursive execution amplification, cascading tool failures, uncontrolled token burn and asynchronous fan-out explosions. Without hard operational ceilings, a single unstable agent can consume disproportionate infrastructure capacity within minutes.

This layer acts as a runtime circuit breaker enforcing token ceilings, execution budgets, timeout policies, concurrency limits, retry thresholds and forced termination.

class RuntimeBudgetTracker:
    def __enter__(self):
        if self.current_session_tokens() > SESSION_TOKEN_CEILING:
            raise CircuitBreakerException("Hard session resource budget exhausted.")
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.update_billing_metrics()

In mature systems, autonomy is always bounded by economics.

Layer 4: Memory and Context Boundaries

Memory without lifecycle policy becomes operational liability. Persistent memory is increasingly becoming the hidden state layer of agentic systems. Many implementations treat vector memory as an infinitely accumulating reasoning substrate.

In practice, unmanaged memory introduces stale reasoning persistence, cross-session contamination, unauthorized context carryover, retrieval instability and policy drift over time. Once agents begin operating from accumulated state rather than immediate prompts, memory governance becomes infrastructure governance.

def retrieve_scoped_memory(agent_id, session_id):
    raw_context = vector_store.query_by_agent(agent_id)

    return [
        fact for fact in raw_context
        if fact.session_id == session_id
        and not fact.is_stale()
    ]

The operational challenge is subtle: memory persistence slowly shifts the behavioral center of the system away from prompts and toward accumulated state. That changes the governance model entirely.

Layer 5: Action Validation and Approval Gates

Certain actions cannot be undone. Human approval is not a fallback mechanism for AI failure. It is a deliberate risk-tier escalation strategy designed directly into the execution topology. High-risk operations such as financial transactions, infrastructure mutations, privileged access escalation or customer-impacting workflows should move through deterministic approval states before execution proceeds.

Importantly, confidence scores should not be treated as indicators of correctness. They are routing signals. The role of the control plane is not to trust the model. It is to determine how much autonomy the current runtime context permits.

def evaluate_action_risk(proposed_action):
    if (
        proposed_action.is_irreversible
        or proposed_action.financial_value > TRANSACTION_THRESHOLD
    ):
        state_store.park_action(
            proposed_action.id,
            status="PENDING_HUMAN_SIGN_OFF"
        )
        return ActionResolution(status="ESCALATED")

    return ActionResolution(status="APPROVED")

Layer 6: Observability and Auditability

If agent decisions cannot be reconstructed, they cannot be governed. Traditional logs are insufficient because the execution path itself is dynamic. Production-grade observability requires capturing reasoning checkpoints, tool lineage, policy decisions, runtime state transitions and replayable execution history.

Governance itself becomes versioned infrastructure. Every execution decision must be attributable not only to prompt context and model state, but also to the exact runtime policy active at execution time, the mediation decisions applied and the operational constraints enforced.

def log_execution_checkpoint(agent_id, step_id, tool_proposal, policy_decision):
    audit_ledger.append({
        "timestamp": current_timestamp(),
        "agent": agent_id,
        "step": step_id,
        "intent": tool_proposal.to_dict(),
        "policy_verdict": policy_decision.status,
        "lineage_hash": generate_execution_hash(tool_proposal, policy_decision)
    })

Without replayability, governance becomes unverifiable.

Failure Isolation and Blast-Radius Engineering

Traditional software architectures assume deterministic execution paths. Agentic systems introduce probabilistic orchestration. That changes how failures propagate.

A conventional application failure typically throws predictable exceptions across known boundaries. Autonomous agents generate execution paths dynamically, meaning instability itself becomes emergent behavior.

Agentic systems require blast-radius engineering, not just exception handling.

https://learnbyinsight.com/wp-content/uploads/2026/05/gaurdrails-failure-isolation.png

The control plane must therefore support tool sandboxing, bounded execution spaces, scoped rollbacks, isolated transactional state and forced termination policies.

One of the more dangerous architectural assumptions is believing unstable agents can always self-correct through additional reasoning. Recursive self-correction frequently amplifies the original failure condition. Sometimes the safest operational response is termination. The infrastructure must retain authority over the agent at all times.

Anatomy of a Mediated Execution Flow

Consider a Customer Refund Agent operating inside an enterprise support system.

In an unmediated architecture, the agent retrieves order history, determines refund eligibility and directly invokes the payment gateway. Operationally, this means the model effectively controls financial execution.

In a mediated architecture, the agent never directly accesses infrastructure actions. Instead, the process is intercepted by the control plane:

The agent proposes a refund intent.
The control plane intercepts the request.
The policy engine evaluates: refund thresholds, fraud indicators, user permissions, confidence signals and runtime policy state.
The system decides to approve, deny or escalate for review.

Only then is execution permitted.

class GuardrailControlPlane:
    def mediate_action(self, context, proposed_action):
        policy_decision = self.policy_engine.evaluate(
            actor=context.agent_id,
            action_type=proposed_action.type,
            payload=proposed_action.payload
        )

        self.audit_logger.log_execution_checkpoint(
            context.agent_id,
            context.step_id,
            proposed_action,
            policy_decision
        )

        if policy_decision.status == "DENIED":
            raise SecurityBoundaryException("Execution blocked by external policy.")

        if policy_decision.status == "ESCALATE":
            return self.route_to_approval_gate(context, proposed_action)

        return self.execute_tool_in_sandbox(proposed_action)

Without runtime mediation, the system technically “works,” but governance collapses. The model proposes execution; the control plane governs execution.

Principles of Execution Governance

Building production-grade agentic systems increasingly requires architectural discipline rather than model sophistication:

Decouple policy from reasoning: The model should never determine whether it is allowed to execute a privileged action.
Design for asymmetry: Assume the agent will eventually generate unstable, adversarial or incorrect execution paths. The surrounding control plane must remain deterministic enough to contain them.
Treat memory as governed state: Persistent memory requires the same lifecycle, retention and authorization rigor as any production datastore.
Govern execution, not outputs: The most consequential failures in autonomous systems increasingly occur after generation and before infrastructure mutation.

Here’s a consolidated view of how these guardrails come together.

The defining characteristic of mature AI systems will not be model intelligence alone, but the quality of the control planes governing execution.

As agents gain autonomy, guardrails stop being defensive layers and become operational infrastructure.

. Sandeep Mewara Github
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow

The Understanding Lag: AI Makes Code Faster Than Comprehension

May 17, 2026May 24, 2026Sandeep Mewara Leave a comment

We have spent decades making code easier to write. Now that AI can generate working code with minimal effort, something becomes clear: writing was never the hardest part of the job.

The Speed Paradox

The prevailing narrative is that AI makes engineers 10x faster. If you measure speed by lines of code, that’s true. But if you measure speed by how long it takes to move a system from a working demo to production-ready, the improvement is far less clear.

The reality is this: we have made writing code faster than our ability to comprehend it. That gap – the Understanding Lag, is where the real work of modern software engineering now lives.

From Construction to Forensic Analysis

In traditional development, context was built as you wrote code. You made decisions step by step, grappling with constraints in real time. By the time the code was finished, the reasoning behind it was already embedded in your mental model.

When you actually try building systems with AI, that process flips. Code appears fully formed. You didn’t evolve it instead you are reading the outcome. You are a forensic investigator of your own codebase, trying to answer:

Why was this done this way?
What assumptions are hidden in this logic?
What breaks elsewhere if I change this?

This is not a tooling shift. It’s a cognitive one.

Where This Shows Up in Practice

The Understanding Lag is easy to ignore – until you have to work with the code. It shows up when:

A “simple change” requires tracing through unfamiliar logic
A generated solution works, but you can’t explain why
A production issue forces you to debug code you didn’t reason through

The system moves fast. Your confidence catches up slowly.

Patterns of the New Bottleneck

1. Context Reconstruction – We have moved from build-to-understand to read-to-understand. The cognitive load hasn’t disappeared. It has moved from creation to interpretation. The effort is no longer in writing logic but it’s in reconstructing intent.

2. Fragile Ownership – Ownership is no longer about who wrote the code. It’s about who can defend it. When you don’t build the path, your confidence in the system is borrowed, not earned. This becomes very real during a 2:00 AM outage, when you’re debugging a system you technically own but didn’t fully construct.

3. The Demo-to-Prod Chasm – AI is excellent at getting the “happy path” running. But production systems don’t fail at “does it run?” They fail at the boundaries:

Security & Compliance: Where does data move?
Auditability: Why was a decision made?
Resilience: How does the system behave under stress?

The demo works because it lacks constraints. The system fails because it is defined by them.

The Great Inversion of Effort

The effort hasn’t disappeared. It has moved. We are seeing an inversion where implementation is becoming a commodity and understanding and validation are becoming the real work.

We have moved from:

Implementing → Validating
Building → Reviewing
Typing → Thinking

The cost of change is no longer in writing code. It’s in verifying that the change didn’t violate a constraint you didn’t know existed.

The Architectural Implication

If understanding is the bottleneck, then systems must be designed for it. Not for cleverness. Not for brevity. But for legibility, traceability and verifiability.

In real systems, decisions must be defensible, behavior must be auditable and changes must be safe. The difference between a demo and a system is not code. It’s constraints.

Toward Managed Divergence

AI can generate multiple valid solutions for the same problem. That flexibility is powerful, but uncontrolled, it increases the Understanding Lag. This is where Managed Divergence becomes necessary. Not to restrict AI’s capability, but to constrain where it can have impact:

Limit where variation is allowed
Keep critical paths predictable
Enforce guardrails as part of the architecture

So while code is generated dynamically, the system remains within human comprehension.

The Bottom Line

AI didn’t simplify engineering. It changed the job. You’re no longer just writing code. You’re reconstructing context, validating assumptions and defending systems you didn’t fully build.

AI writes the code. You catch up and decide if it should exist at all.

. Sandeep Mewara Github
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow

Disclaimer: The views and opinions expressed in this article are my own and do not necessarily reflect the official policy or position of my current employer. This reflects a point-in-time perspective on a rapidly evolving field, intended to foster dialogue and shared learning within the engineering community.

Agentic Development: The Case for Managed Divergence

May 10, 2026May 10, 2026Sandeep Mewara Leave a comment

Today, many organizations are adopting agentic development, both to unlock its potential and to stay ahead of the curve. My current organization is no different. As part of this effort, a set of alpha teams are exploring its adoption, building early capabilities and sharing learnings to guide broader rollout.

https://learnbyinsight.com/wp-content/uploads/2026/05/agentic-development-divergence.png

Being part of one such alpha team, I have been observing an emerging pattern. Many teams are building similar capabilities (like PDLC orchestrators, agent workflows and supporting skills) but in slightly different ways, often tailored to their specific product contexts.

While this can feel like duplication at first, I believe it is actually driving rapid organizational learning. Sharing a few thoughts on why this phase exists and how we might navigate it more intentionally.

The Paradox: Standardization Needs Maturity

In mature engineering domains, we standardize because the patterns are well understood. With agentic development, we are still discovering the primitives:

Evolving Problem Space: Moving from deterministic execution to probabilistic reasoning
Forming Abstractions: Defining what an “agent” fundamentally is in our organizational context
Emerging Operating Models: Especially how we handle “Human-in-the-loop” (HITL) handoffs

The Risk: In this context, early standardization doesn’t create a foundation instead it creates a ceiling. It constrains exploration before we know what is actually worth scaling.

The “Divergence” Phase: Learning at Scale

What we are seeing right now is a natural progression. It’s a phase characterized by:

Parallel Experimentation: Teams building similar capabilities to solve immediate problems
Local Optimizations: Moving faster by tailoring tools to specific team contexts
The “Almost-Right” Stage: Multiple versions of the same idea, each slightly different

This is the “Broad Adoption” stage. It may look like duplication, but it is actually increasing our learning velocity. We are effectively running parallel A/B tests on architecture across the company.

The Real Danger: Fragmentation Without Direction

Divergence is healthy, but unmanaged fragmentation is not. The challenge arises when:

Teams are unaware of parallel efforts
Learnings are trapped in silos
Solutions are too tightly coupled to be reused or migrated later

If we don’t have a path to converge, we aren’t innovating as effectively, we’re just drifting.

A Balanced Way Forward

To ensure this divergence leads to a stronger future state, I’m leaning into three guiding principles:

https://learnbyinsight.com/wp-content/uploads/2026/05/agentic-balanced-way.png

1. Visibility Over Restriction

We shouldn’t stop teams from building, but we should require them to share. Visibility through demos, shared registries or internal “RFCs” (Requests For Comments) allows the best ideas to gain natural gravity. It reduces “accidental” duplication while allowing “intentional” experimentation.

2. Standardize the Contract, Not the Tool

Instead of enforcing a single framework today, we should align on interfaces:

Expected Outputs: What artifacts or checkpoints must an agent produce?
Interaction Models: How does an agent request human intervention?

Aligning on the what allows teams to remain flexible on the how.

3. Modular “Build-for-Reuse” Thinking

Even in an alpha phase, we should avoid the “monolithic agent”. By keeping skills and orchestrators modular, we can ensure that when the time comes to converge, we can reuse the best components from different teams rather than rebuilding from scratch.

The “In-Flight” Reality: Our Journey

In our organization, we are currently in this “Go-Broad” phase. We are seeing this divergence play out in real time, with different teams exploring their own agentic implementations based on their context.

While it may look like multiple directions from the outside, from within it feels like a natural extension of the learning process where real-world constraints are shaping what works and what doesn’t.

https://learnbyinsight.com/wp-content/uploads/2026/05/agentic-ai-convergence.png

My expectation is that convergence will happen in due course, potentially evolving into shared patterns similar to those described here. At the same time, this is still unfolding and we remain open to different paths as we continue to learn what truly scales.

Final Thought

One way I have started thinking about this transition is:

Enable divergence. Design for convergence. Execute with discipline.

We are still in an exploration phase and that is a healthy, if sometimes noisy place to be. The focus may not be to eliminate variation today, but to ensure that when convergence happens, it is grounded in real usage and shared learning.

If we continue to build, share and learn openly, the path toward a more unified approach should emerge more naturally.

. Sandeep Mewara Github
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow

Architecting Guardrails: The Control Plane for Agentic AI

May 3, 2026May 9, 2026Sandeep Mewara Leave a comment

We are entering a new architectural phase and navigating a meaningful shift. AI systems are moving beyond static responses and into systems that can take actions like triggering workflows, calling APIs and making decisions within production environments. This is transformative.

At the same time, this shift is happening faster than most teams can fully operationalize or standardize. Across industry conversations, early implementations and emerging case studies, I believe a pattern is starting to become clear:

Most AI failures are not model failures – they are control failures.

Not necessarily because systems are poorly designed, but because:

boundaries are still evolving
failure modes are not fully understood
recovery paths are often under-defined

As we move toward more autonomous systems, we are effectively taking cautious steps into production – without always knowing how and when things might surface as unintended outcomes.

When they do, the impact is rarely isolated:

it can affect multiple customers
it can impact trust and brand perception
it can translate into real cost

When these systems scale, we don’t just scale capability. We scale uncertainty and potentially, mistakes.

I believe we are collectively building the playbook as we go and this is my attempt to make sense of what that might look like.

Guardrails: More Than Just a Safety Feature

Guardrails are no longer a theoretical concept or something that can be deferred for later. Increasingly, they are becoming a real and necessary part of building agentic AI systems.

What I still observe, though, is that in many implementations, guardrails are treated as an add-on introduced after the core system is already designed or applied post-facto to fill gaps.

Even when guardrails are considered early, they can sometimes become a checkbox exercise that makes us feel the system is “covered”, while important aspects may still be missing.

Part of the challenge is that we are still learning what “complete” actually looks like. As AI systems continue to evolve, new behaviors, edge cases and failure modes emerge – often faster than teams can fully anticipate.

This is where I have found it useful to shift how I think about guardrails. Instead of treating them as isolated checks, it helps to think of guardrails as the control plane of agentic AI.

https://learnbyinsight.com/wp-content/uploads/2026/05/gaurdrail-control-plane.png

Just as modern systems separate execution (data plane) from governance and coordination (control plane), agentic AI needs a layer that defines:

what the system can do
what it should do
how it behaves under uncertainty or failure

Without this Control Plane, we’re not really building systems – we’re simply reacting to them.

Three Questions Every Architect Should Ask

To make the idea of a control plane more practical, I have found it useful to step back and ask a few simple questions – often before writing a single prompt.

1. Can it do this? (Capability & Access)

Does the agent have the right permissions?
Are tool calls constrained?
Are access boundaries clearly defined?

Example – Billing Agent
An agent generating invoices should not have unrestricted access to pricing configuration.

2. Should it do this? (Policy & Context)

Is the action aligned with business rules?
Does it respect compliance and intent?
Is context being interpreted correctly?

Example – Support AI
Issuing refunds requires understanding policy thresholds and not just user sentiment.

3. What if it goes wrong? (Resiliency & Recovery)

Can actions be rolled back?
Is there an audit trail?
Is there a clear escalation path?

Example – Workflow Agent
Deleting or modifying customer data should always be recoverable.

If these questions are unclear, the agentic system will eventually surface that ambiguity – usually in production.

A Practical Framework for Control

Building on the idea of Guardrails as a Control Plane, it helps to think of them not as a single gate, but as a distributed system of controls.

One way to reason about this is across a few key areas:

Category	Focus	Example
Technical	Validation & thresholds	Prevent hallucinated financial metrics
Security	Access & abuse prevention	Mask PII based on user roles
Ethical	Bias & responsible behavior	Ensure fair hiring recommendations
Operational	Runtime control	Rate limits and kill switches
Infrastructure	Platform safety	Sandboxing, isolation and cost boundaries
Business	Alignment & compliance	Enforce pricing rules and customer tiers

Note: These are not independent layers – they interact continuously across the system lifecycle.

Where Guardrails Actually Live

To make this more concrete, it helps to think about where guardrails show up within a system.

They exist across the lifecycle and surface at different points as the system processes inputs, makes decisions and produces outcomes.

In practice, this often looks like:

Input validation & policy enforcement
Orchestration decisions & tool execution controls
Model grounding & memory handling
Output validation, monitoring and feedback

Each of these points represents a place where control can be applied or missed.

Hard-Earned Realities of Scaling

For engineers and architects building these systems, the gap between theory and production is where most learning happens.

https://learnbyinsight.com/wp-content/uploads/2026/05/hard-earned-reality-scaling.jpg

Here are a few patterns I have seen emerge across implementations and industry discussions:

1. The Trap of Human-in-the-Loop (HITL)

HITL is often used as a safety net. In many cases today, it’s a necessary part of deploying AI systems responsibly. At the same time, as systems begin to scale, it’s worth being mindful of how it’s used.

In practice:

humans can become bottlenecks
alert fatigue can set in
approvals can turn into routine “rubber-stamping”

The shift is not to remove HITL, but to use it more intentionally.

Design systems to be safe by default and rely on human intervention primarily for:

high-risk actions
policy exceptions
low-confidence scenarios

If every decision requires human approval, I believe the system isn’t truly autonomous instead it’s closer to a complex UI with an approval layer.

2. The Latency Tax

Safety introduces latency where every validation adds a cost in time.

Rather than forcing everything into synchronous checks, it helps to distribute controls across the lifecycle:

Pre-execution: Prevent obvious failures
In-line: Enforce business logic
Asynchronous: Audit and reconcile

3. Policy-as-Code vs. Prompt Engineering

Prompts are flexible, but brittle. Policies are enforceable. Decoupling rules from the model (using tools like Open Policy Agent (OPA) or similar approaches) allows for version control, auditability and model independence. For example, instead of encoding refund limits inside prompts, define them as policies that can be updated independently as business rules evolve.

In many ways, this becomes a key part of the control plane:

Prompts guide behaviour
Policies enforce behaviour

4. Guardrails Break Silently

A guardrail that works with one model may behave differently with another. Different models interpret constraints differently and edge cases surface in unexpected ways. For example, switching models can silently weaken compliance checks by a Contract Review Agent.

The Takeaway: Maintain a guardrail testing suite. Test adversarial cases, edge scenarios and validate across model versions. If guardrails aren’t tested, they’re just assumptions.

Two Often Overlooked Risks

As systems mature, a couple of areas tend to surface as more “silent” failure modes. They don’t always show up immediately but can have significant impact over time.

1. Economic Guardrails

Agents can loop recursively or call expensive APIs repeatedly, leading to what can effectively become a “Financial Denial of Service”.

In practice, this makes it important to introduce controls such as:

session-level budgets
token or usage limits
execution caps

Cost, in this context, becomes a control boundary – not just a metric.

2. Memory & State Management

Agents don’t just act, they remember. Over time, this introduces challenges around PII retention, long-term context storage and unintended persistence of sensitive data.

Mitigation often involves:

retention policies
PII filtering
memory scrubbing workflows

Memory becomes a liability if not managed intentionally.

The Strategic Bottom Line

To build production-grade agentic AI systems, it becomes important to think in terms of controlling:

What the system does (actions)
What it spends (economics)
What it remembers (state)

Guardrails are not just about safety – they are about sustainability and trust.

Here’s a consolidated view of how these guardrails come together:

https://learnbyinsight.com/wp-content/uploads/2026/05/poster-agent-gaurdrails-dark-v2.png

* This is still evolving but having a structured way to think about it helps in designing systems that scale.

Final Thought

Autonomy is the promise of agentic AI. But autonomy without control isn’t innovation – it’s risk.

As architects, our goal isn’t just to make AI systems work but to make them predictable, controllable and trustworthy over time.

The model is the engine.
Guardrails are the steering, the brakes and the dashboard.

.Sandeep Mewara Github
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow

Agentic AI for Existing Codebases: A Practical Path to Getting Started

April 26, 2026April 27, 2026Sandeep Mewara Leave a comment

In the current engineering landscape, there is an unrelenting pressure to chase the “new”. Our LinkedIn feeds are dominated by AI-native learnings, startups and autonomous agents building entire applications from a single prompt in days. For many of us, this creates a strange disconnect.

Most engineers aren’t working on greenfield AI experiments. They are responsible for systems that have been running for five, ten or even fifteen years. These are the stable, revenue-generating engines that form the backbone of successful businesses. They are battle-tested, high-stakes and complex.

If you are maintaining one of these systems, it is easy to assume the Agentic AI Wave isn’t meant for you. You might look at your unique architectural patterns or your “legacy” constraints and conclude that an AI agent simply wouldn’t understand.

I’d offer a different perspective: These tools are most transformative in the systems you already understand deeply. You haven’t missed the wave instead you are simply waiting for the right entry point.

From Manual Assistance to Actual Leverage

You might not have integrated AI into your workflow yet. Many teams have already begun doing so and those who have started likely use it for tactical tasks: explaining an obscure regex, generating a unit test for a utility function or writing a quick bash script.

This is a significant step forward, but it remains manual and reactive. Using AI this way is like hiring a brilliant senior consultant but refusing to give them a badge, documentation or context. You spend half your mental energy explaining the “why” before they can even start on the “how”.

When you attempt to move toward Agentic AI – you allow an agent to navigate your repository and suggest multi-file changes. This lack of context becomes a technical liability. Without a “Project Constitution”, the agent is forced to make guesses. Usually, it will:

Default to modern “generic” patterns that are incompatible with your specific tech stack.
Miss hidden architectural constraints decided years ago for specific performance or security reasons.
Suggest “best practice” refactors that look correct in isolation but break your production logic.

The result isn’t just a failed task but it’s wasted time and unnecessary token burn.

The Missing Piece: Contextual Onboarding

Agentic AI doesn’t fail because it lacks power. It fails because it lacks context. Much of your system’s “source of truth” doesn’t actually live in the code. It lives in your head, in tribal memory, in wikis or buried in old Jira or PR descriptions.

The goal isn’t to “teach” the AI everything. It is to provide a minimalist, structured map that allows the agent to operate safely within your boundaries.

The same idea applies to any work with structured systems of any kind like operations workflows, data pipelines, internal tools, etc. Whether it’s code, processes or documentation, the moment you define the rules clearly, the quality of output improves dramatically.

A Practical Starting Point: The `claude.md`

You don’t need a massive infrastructure change to begin. You can start by creating a claude.md file in your project root. This is your “Project Constitution” – a system guide. It should be precise, technical and grounded in reality.

Start simple, example claude.md:

# Project Guidelines

## Tech Stack
- Node.js 16
- Express
- MongoDB

## Rules
- Do not upgrade dependencies unless asked
- Follow the existing folder structure
- Write tests using Jest

## Notes
- This is a legacy system, avoid large refactors

That’s it. No perfection needed to start. By spending fifteen minutes defining these boundaries, you give the agent more leverage than 90% of teams currently provide. You can refine it over time.

Expanding the Framework: Skills

Once your “Constitution” is set, you can begin defining Skills via a skills.md file. While the claude.md is global, Skills are modular playbooks for recurring workflows.

For example, if you frequently ask the agent to “Add a new API endpoint” or “Migrate a component to TypeScript”, you should document the exact steps those tasks require in your specific environment. These acts as a repeatable playbooks that reduces the back-and-forth and ensures the agent follows your team’s established SOPs (Standard Operating Procedures) when needed.

A Mentor in Your Pocket: Codex-Claude

As you begin to rely more on these agents, you’ll find that “Instruction Engineering” is a skill in itself. If your agent is still going off-track, the issue is almost always an ambiguity in your instructions.

This is why I have been developing Codex-Claude. Think of it as a Linter for your Agentic Strategy. Just as a code linter catches syntax errors, Codex-Claude analyzes your claude.md and skills.md to catch “intent errors”.

The tool helps you with:

Automated Architectural Audit: Instantly evaluates your files against best practices and provides a weighted score across structure, specificity and completeness
Precision Refactoring & Compaction: Identifies ambiguity and redundancy, rewriting instructions to be more concise and context-efficient
Intelligent Conflict Resolution: Detects contradictions and instruction drift, ensuring rules are placed correctly – either within global rules or specialized skill files
Progressive Learning Loop: Turns every optimization into a learning opportunity by explaining the “why” behind changes by linking changes to official documentation

You don’t need this to get started, but it helps once you begin refining your setup for more complex tasks.

You can explore and try it out LIVE here: https://sandeep-mewara.github.io/codex-claude/

Watchouts

As you start this journey, keep these three principles in mind:

Be precise, not verbose: Every line is context the agent must process. Clear constraints beat long explanations
Use tests as safety rails: The agent provides speed. Your test suite provides safety. Never accept changes that have not passed your CI/CD baseline
Iterate on Instructions: If an agent fails a task, it likely misunderstood something. Treat it as a bug in your claude.md and fix the instruction

The Architect’s Path Forward

The expectation for delivery speed in our industry is fundamentally shifting. However, adopting Agentic AI isn’t about “coding faster” but it’s about reducing the mental tax of working with mature, complex systems.

You don’t need a new project or deep AI expertise to benefit from this. You just need to start small:

Select one module or one feature
Draft a simple claude.md that defines that module’s rules
Run one task with an agent and observe the difference

The systems that power today’s businesses don’t need to be replaced. They just need the right leverage to move into the future.

. Sandeep Mewara Github
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow

The Lifecycle Is the Product: AI Development Engine

April 19, 2026April 19, 2026Sandeep Mewara Leave a comment

Every team eventually finds itself rebuilding the same foundational setup in every project. In many organizations, this is still a manual struggle. We write “how we work” docs, define naming conventions and establish review gates that live only in our wikis. For teams already leveraging AI, this setup often exists in isolated pockets like a collection of disconnected prompts telling an assistant to “act as a PM” or “design like an architect”.

In both cases, the expertise remains trapped in silos. For those working manually, the immediate opportunity is to use AI skills to jump-start their specific tasks. But once you do, you quickly reach a plateau – while individual skills and prompts have become portable, the lifecycle around them has not.

That gap is what the Lifecycle Agent Orchestrator (LAO) tries to close. It’s a plugin for Claude Code and Cursor that uses multi-agent orchestration to ship the development process itself as a versioned, overridable artifact. Not just the individual role skills. The stitching between them.

The Problem: Skills Are Portable, Process Is Not

Current AI tools offer impressive specialized skills – performing architecture reviews or enforcing testing conventions with high precision. However, these tools still operate in isolation.

Modern software delivery doesn’t happen in a vacuum. It flows through a series of high-stakes handoffs. This is where even the most advanced teams hit a wall. Despite the promise of automation, the broader lifecycle remains stubbornly manual:

The Cognitive Load of Fragmentation: Engineers must still manually orchestrate which tools to run and when, creating a massive tax on context.
Traceability Decay (Drift): We lose the “intent” of a feature as it travels from a product ticket to a design mock and finally into the codebase.
Simulated Handoffs: We still rely on manual “persona-switching” – manually checking if a design works for a developer or if an architecture suits a product goal.
The Proof Problem: At the point of release, we still rely on assumptions rather than programmatic proof that we’ve satisfied every original requirement.

LAO moves beyond single-prompt interactions by employing multi-agent orchestration to bridge the high-stakes handoffs between roles.

The skills are the actors. The lifecycle is the director.

Step 1: Jump-Start Your Work with AI Skills

The quickest way to see value isn’t by changing your entire workflow. It’s by using individual skills to improve the work you’re already doing.

Each role in the plugin is independently usable. You don’t need the full pipeline to get an immediate win:

Engineers can use the Intake skill to turn a messy Jira ticket into a clean scope with real acceptance criteria.
PMs can use the Product Management skill to draft a structured PRD.
Architects can define a technical design running the Architecture skill against a requirement.

# Direct use of individual skills
Invoke code-review skill to review these changes
Invoke intake skill to extract scope from PROJ-5678

This “Step 1” approach pays back immediately.

You get a senior-level assistant for specific tasks without committing to a new workflow.

Step 2: The Orchestrator as the Director

Over time, this pattern exposes a critical bottleneck. We find ourselves questioning: Which skill comes next? Did we skip a step? Are we aligning roles or just checking boxes? Here, we hit the ceiling of isolated tools.

Once you trust the individual skills, the orchestrator stitches them together into a cohesive system.

# command inside a Claude Code or Cursor session
# Direct Jira story ingestion
/lao Work on PROJ-1234

# An ungrounded requirement
/lao Add a user notification preferences API endpoint

# a tire-kick before committing to anything
/lao-dry-run

Through multi-agent orchestration, LAO ensures that the PM, Designer and Architect personas actively review and challenge each other’s outputs.

The goal isn’t just automation – it’s coordination.

Core Internals

The following are a few key design decisions that power the LAO.

The Nine-Phase Engine

The pipeline is nine phases, structured into two halves with different personalities:

Alignment (Phases 1–3): Product, Design and Architecture align early. They cross-review every output to catch gaps before engineers write a single line of code.
Execution (Phases 4–9): Once the team establishes alignment, the system drives the project through scope, design, planning, implementation, validation and shipping.

The key shift is simple: Alignment happens once, upfront. Execution happens without rework.

Project-Specific Infrastructure

Overlays let you define how your specific system works – ensuring your project remains the domain authority. Under the hood, each phase composes up to three layers of knowledge:

Layer	Lives in	Contains
Base	Plugin	Universal rules for the role
Overlay	Project	Project-specific patterns, stack, conventions
Domain	Project	Cross-cutting domain knowledge (auth, payments, compliance)

A project looks like this once it’s connected:

If a project already has its architecture docs scattered across docs/, there’s no need to move anything. A lao.config.yaml at the project root maps existing files into the engine:

project_name: my-app
languages: [python, react]

overlays:
  architecture: docs/architecture/standards.md
  coding-standards: .cursor/rules/coding.md

domain:
  - docs/domain/*.md
  - src/payments/DESIGN.md

extra_roles:
  compliance-review: tools/compliance/SKILL.md

There are two discovery paths – the convention directory or the config file. If both exist, the config file wins because project-specific overlays take priority.

This is the project respecting itself as the domain authority.

Preview, Then Execute

Every run begins in simulation – a preview of the nine-phase pipeline that writes no files, creates no branches and posts no Jira comments. The orchestrator walks through Phases 1–6, simulating execution to produce realistic PhaseOutput objects and checkpoints for your iteration. It then summarizes Phases 7–9 as projected outcomes, as these require real code execution.

When you’re ready, you say proceed and the pipeline replays – but with the preview’s decisions carried forward instead of regenerated:

The system eliminates both upfront cost and the risk of committing to a flawed plan.

Acceptance Criteria, Tracked Across Phases

The system captures acceptance criteria during Intake and tracks them through to Validation, where you must prove each one with recorded evidence to unlock the “Ship” gate. The CLI renders this data as text today, but a dashboard could render it visually tomorrow without requiring any changes to the engine. Every phase emits a PhaseOutput – a structured object with a defined schema.

--- Phase: Tech Design (Phase 5 of 9) ---
Status: Needs Approval

SUMMARY:
  Add rate limiting middleware to API gateway.
  No new dependencies, config-driven thresholds.

ARTIFACTS:
  - [design_doc] docs/design/rate-limiting.md

ACCEPTANCE CRITERIA (tracked):
  AC1: Rate limit of 100 req/min/user ...... pending
  AC2: Returns 429 with retry header ....... pending
  AC3: Configurable per environment ........ pending

→ Approve to proceed to Plan or request changes.

No claims without fresh proof – that’s the whole point of the validation gate.

Multi-Language, Without a Fork Per Language

Four skills need to know what language they’re looking at: coding-standards, testing-conventions, code-review and security. Each has a universal base and a language pack for the specifics:

Currently, plugin supports Python, Java, C# and React. Detection runs once at pipeline start: if lao.config.yaml lists languages, use them, otherwise scan for pyproject.toml, pom.xml, *.csproj, package.json with a React dep and collect every match. A full-stack repo auto-detects as [python, react] and both packs get loaded. The agent applies each to the right file types.

Adding a new language – Go, Rust, anything – means creating a references/<language>/directory in those four skills with the expected files, plus a couple of lines in detection and validation scripts.

No change to the universal base. That separation is worth preserving.

Role vs. Workflow Split

The design deliberately separates Phases 1–5 (Role-based) from Phases 6–9 (Execution-based).

Phases 1–5 (Roles): These phases use individual skill files (PM, XD, Architecture, Intake) through multi-agent orchestration because judgment varies by project. A fintech audit requires different logic than a game engine pipeline, so project overlays merge with these base skills to provide local context.

Phases 6–9 (Workflows): These phases power the orchestrator’s core engine (TDD, validation, shipping) and maintain tight coupling for continuity. Unlike roles, workflows use substitutions. If you override a workflow, such as swapping TDD for BDD, the new logic replaces the built-in engine entirely rather than layering on top of it.

# Override the workflow for a single phase
workflows:
  
  # BDD instead of TDD
  implement: docs/workflows/our-bdd-process.md

  # custom release flow
  ship: docs/workflows/our-release-process.md

Separate judgment from execution to protect flexible strategy without sacrificing delivery.

When to Use This and When Not

The Sweet Spot: Use this if you pair Claude Code or Cursor with Jira/PRD-driven intake. It excels for teams that front-load design and track ACs to the finish line. The engine treats multi-language and monorepos as first-class citizens, using config-based discovery to navigate complex structures.

The Breaking Point: Avoid this for ad-hoc work lacking tickets or defined ACs. The fit weakens if your “ship” phase involves unmodeled complexity – like mobile store submissions or if you require unattended, autonomous execution. The plugin is designed as a human-in-the-loop engine. It doesn’t chase full autonomy – yet.

What Changes When You Adopt This?

The most immediate change is practical: Your development process leaves the wiki and enters your repository as a versioned artifact.

But adoption doesn’t have to be a cliff. You start by using individual skills (single agent) to improve local tasks. As you build trust, you let the orchestrator (multi-agent) handle the parts that are hardest to do manually – the handoffs, the alignment and the validation.

Over time, the shift becomes structural:

Handoffs become explicit gates
Requirements become traceable
Validation becomes evidence-driven
The lifecycle becomes consistent

I expect the next iteration of the tool to automate this entire flow. For now, we must build it by hand – or, more precisely, install it.

Closing Thought

This isn’t about replacing how teams work. It’s about making how they work explicit and reliable.

The lifecycle stops being something you document. It becomes something you execute and once that happens, it’s no longer just process.

It’s part of the product.

. Sandeep Mewara Github
Tech Explore
Trend

Repository & Contribution

The Lifecycle Multi-Agent Orchestrator is available as an open-source project. I encourage you to explore the repository, use the individual skills to jump-start your own work and contribute to the evolution of portable development engine.

GitHub: sandeep-mewara/lifecycle-agent-orchestrator
Documentation: Detailed design specs and phase contracts are included in the repo.

In practice, this kind of artifact only gets better when it’s applied across different projects and constraints.

.
Machine Learning workflow