Building a Guardrail Control Plane for Agentic AI

In my earlier article on Architecting Guardrails: the Control Plane for Agentic AI, I explored why guardrails can no longer be treated as isolated validators sitting at the edge of an LLM workflow. As agents gain autonomy, guardrails increasingly become part of the system’s operational control plane itself.

https://learnbyinsight.com/wp-content/uploads/2026/05/gaurdrail-control-plane-1.png

The Execution Gap

What that article intentionally did not explore in depth was the runtime architecture behind that idea because the real challenge begins after the model generates a response.

Most AI guardrails today still focus primarily on prompts and outputs:

  • Moderation APIs
  • Jailbreak filters
  • Output classifiers
  • Prompt hardening

That architecture made sense when models were passive generators. But autonomous agents do not simply generate text. They invoke tools, mutate state, persist memory, trigger workflows, coordinate infrastructure and operate across multiple execution boundaries. At that point, semantic safety alone becomes insufficient.

A production system can remain technically “safe” while still failing operationally:

  • An agent enters a recursive retry loop
  • Exceeds runtime budget limits
  • Escalates permissions unintentionally
  • Persists corrupted reasoning into memory
  • Triggers irreversible downstream actions

This is no longer a content moderation problem. It is a runtime systems governance problem.

Runtime Mediation

The core architectural shift is moving from edge filtering to runtime mediation.

Guardrails are not filters around the model. They are policy enforcement layers around behavior.

The model proposes intent. The control plane determines whether that intent is permissible within the current operational context. That distinction becomes critical in agentic systems because execution is no longer a single deterministic path.

The operational challenge is no longer just “What did the model say?” It becomes:

  • What did the agent attempt to do?
  • Under what authority?
  • Against which systems?
  • With what runtime constraints?
  • Under which policy version?
  • With what blast radius if wrong?

This is where traditional guardrail architectures begin to break down.

Traditional vs. Agentic Guardrails

Traditional GuardrailsAgentic Guardrails
Validate contentGovern execution
Static checksRuntime mediation
Prompt-centricAction-centric
Edge filteringDistributed enforcement
Single requestMulti-step orchestration

Decoupling Policy from the Workload

One of the most common mistakes in early agent deployments is embedding guardrails directly inside prompts, orchestration chains or tool wrappers. At small scale, this appears manageable. At production scale, it becomes operationally fragile.

A control plane embedded inside the workload eventually becomes invisible to governance.

Once policy becomes tightly coupled with agent reasoning, business rules drift across agents, enforcement becomes inconsistent, operational audits become fragmented and policy changes require redeploying probabilistic systems. More critically, if the reasoning path itself becomes compromised, the protections embedded within that reasoning path are compromised alongside it.

Modern distributed systems solved this problem years ago by externalizing governance into identity providers, policy engines, API gateways and service meshes. Agentic systems require the same separation:

https://learnbyinsight.com/wp-content/uploads/2026/05/gaurdrail-decoupled-archi.png


The agent reasons. The infrastructure governs. That separation becomes the deterministic boundary around probabilistic execution.

The Guardrail Control Plane

A production-grade guardrail system is not a single validator sitting at the edge of the model. It is a layered runtime mediation architecture intercepting execution decisions throughout the agent lifecycle.

The goal is not to “block bad outputs”. The goal is to continuously govern autonomous execution.

Layer 1: Identity and Request Policy

Agents should inherit constrained authority, not implicit trust. One of the fastest ways to destabilize an agentic system is giving agents broad infrastructure permissions through generic service accounts. Most production failures begin with over-scoped execution authority.

The control plane must continuously mediate scoped identities, tenant isolation and user-bound execution contexts. The operational principle is simple: the agent should never possess more authority than the initiating user or workflow context.

def enforce_identity_policy(session_context, proposed_action):
    permitted_tools = identity_registry.get_tools_for_role(
        session_context.user_role
    )

    if proposed_action.tool_name not in permitted_tools:
        raise SecurityBoundaryException("Unauthorized tool access attempt.")

    proposed_action.context.auth_token = (
        session_context.impersonation_token
    )

The important detail is not the implementation itself. It is the mediation boundary. The agent does not directly decide what it is allowed to execute. Infrastructure policy does.

Layer 2: Planning Constraints

Planning without constraints becomes speculative execution. Traditional software systems operate through deterministic execution paths. Agentic systems dynamically generate execution topology at runtime.

Left unconstrained, agents tend to produce recursive loops, cyclic dependencies, retry amplification, unstable orchestration chains and excessive planning depth.

One of the more subtle realities of production agent systems is that failures rarely appear catastrophic initially. They resemble ordinary infrastructure anomalies: elevated retries, abnormal tool sequencing, execution fan-out or accelerating token usage. By the time the final output visibly appears incorrect, the operational deviation has often already propagated several layers into the system.

The control plane must therefore mediate orchestration before infrastructure resources are committed.

def validate_planning_topology(execution_graph, current_depth):
    MAX_DEPTH = 8

    if current_depth > MAX_DEPTH:
        raise LoopDetectedException("Maximum orchestration graph depth breached.")

    if contains_cyclic_dependencies(execution_graph):
        raise InvalidPlanException("Cyclic loop detected in generated plan topology.")

Exception handling assumes known failure paths. Agentic systems generate failure paths dynamically.

Layer 3: Runtime Enforcement

Most production failures are economic before they are semantic. While security teams focus on prompt injection, infrastructure teams watch token consumption graphs turn vertical.

Autonomous agents introduce entirely new operational failure modes: retry storms, recursive execution amplification, cascading tool failures, uncontrolled token burn and asynchronous fan-out explosions. Without hard operational ceilings, a single unstable agent can consume disproportionate infrastructure capacity within minutes.

This layer acts as a runtime circuit breaker enforcing token ceilings, execution budgets, timeout policies, concurrency limits, retry thresholds and forced termination.

class RuntimeBudgetTracker:
    def __enter__(self):
        if self.current_session_tokens() > SESSION_TOKEN_CEILING:
            raise CircuitBreakerException("Hard session resource budget exhausted.")
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.update_billing_metrics()

In mature systems, autonomy is always bounded by economics.

Layer 4: Memory and Context Boundaries

Memory without lifecycle policy becomes operational liability. Persistent memory is increasingly becoming the hidden state layer of agentic systems. Many implementations treat vector memory as an infinitely accumulating reasoning substrate.

In practice, unmanaged memory introduces stale reasoning persistence, cross-session contamination, unauthorized context carryover, retrieval instability and policy drift over time. Once agents begin operating from accumulated state rather than immediate prompts, memory governance becomes infrastructure governance.

def retrieve_scoped_memory(agent_id, session_id):
    raw_context = vector_store.query_by_agent(agent_id)

    return [
        fact for fact in raw_context
        if fact.session_id == session_id
        and not fact.is_stale()
    ]

The operational challenge is subtle: memory persistence slowly shifts the behavioral center of the system away from prompts and toward accumulated state. That changes the governance model entirely.

Layer 5: Action Validation and Approval Gates

Certain actions cannot be undone. Human approval is not a fallback mechanism for AI failure. It is a deliberate risk-tier escalation strategy designed directly into the execution topology. High-risk operations such as financial transactions, infrastructure mutations, privileged access escalation or customer-impacting workflows should move through deterministic approval states before execution proceeds.

Importantly, confidence scores should not be treated as indicators of correctness. They are routing signals. The role of the control plane is not to trust the model. It is to determine how much autonomy the current runtime context permits.

def evaluate_action_risk(proposed_action):
    if (
        proposed_action.is_irreversible
        or proposed_action.financial_value > TRANSACTION_THRESHOLD
    ):
        state_store.park_action(
            proposed_action.id,
            status="PENDING_HUMAN_SIGN_OFF"
        )
        return ActionResolution(status="ESCALATED")

    return ActionResolution(status="APPROVED")

Layer 6: Observability and Auditability

If agent decisions cannot be reconstructed, they cannot be governed. Traditional logs are insufficient because the execution path itself is dynamic. Production-grade observability requires capturing reasoning checkpoints, tool lineage, policy decisions, runtime state transitions and replayable execution history.

Governance itself becomes versioned infrastructure. Every execution decision must be attributable not only to prompt context and model state, but also to the exact runtime policy active at execution time, the mediation decisions applied and the operational constraints enforced.

def log_execution_checkpoint(agent_id, step_id, tool_proposal, policy_decision):
    audit_ledger.append({
        "timestamp": current_timestamp(),
        "agent": agent_id,
        "step": step_id,
        "intent": tool_proposal.to_dict(),
        "policy_verdict": policy_decision.status,
        "lineage_hash": generate_execution_hash(tool_proposal, policy_decision)
    })

Without replayability, governance becomes unverifiable.

Failure Isolation and Blast-Radius Engineering

Traditional software architectures assume deterministic execution paths. Agentic systems introduce probabilistic orchestration. That changes how failures propagate.

A conventional application failure typically throws predictable exceptions across known boundaries. Autonomous agents generate execution paths dynamically, meaning instability itself becomes emergent behavior.

Agentic systems require blast-radius engineering, not just exception handling.

https://learnbyinsight.com/wp-content/uploads/2026/05/gaurdrails-failure-isolation.png


The control plane must therefore support tool sandboxing, bounded execution spaces, scoped rollbacks, isolated transactional state and forced termination policies.

One of the more dangerous architectural assumptions is believing unstable agents can always self-correct through additional reasoning. Recursive self-correction frequently amplifies the original failure condition. Sometimes the safest operational response is termination. The infrastructure must retain authority over the agent at all times.

Anatomy of a Mediated Execution Flow

Consider a Customer Refund Agent operating inside an enterprise support system.

In an unmediated architecture, the agent retrieves order history, determines refund eligibility and directly invokes the payment gateway. Operationally, this means the model effectively controls financial execution.

In a mediated architecture, the agent never directly accesses infrastructure actions. Instead, the process is intercepted by the control plane:

  • The agent proposes a refund intent.
  • The control plane intercepts the request.
  • The policy engine evaluates: refund thresholds, fraud indicators, user permissions, confidence signals and runtime policy state.
  • The system decides to approve, deny or escalate for review.

Only then is execution permitted.

class GuardrailControlPlane:
    def mediate_action(self, context, proposed_action):
        policy_decision = self.policy_engine.evaluate(
            actor=context.agent_id,
            action_type=proposed_action.type,
            payload=proposed_action.payload
        )

        self.audit_logger.log_execution_checkpoint(
            context.agent_id,
            context.step_id,
            proposed_action,
            policy_decision
        )

        if policy_decision.status == "DENIED":
            raise SecurityBoundaryException("Execution blocked by external policy.")

        if policy_decision.status == "ESCALATE":
            return self.route_to_approval_gate(context, proposed_action)

        return self.execute_tool_in_sandbox(proposed_action)

Without runtime mediation, the system technically “works,” but governance collapses. The model proposes execution; the control plane governs execution.

Principles of Execution Governance

Building production-grade agentic systems increasingly requires architectural discipline rather than model sophistication:

  • Decouple policy from reasoning: The model should never determine whether it is allowed to execute a privileged action.
  • Design for asymmetry: Assume the agent will eventually generate unstable, adversarial or incorrect execution paths. The surrounding control plane must remain deterministic enough to contain them.
  • Treat memory as governed state: Persistent memory requires the same lifecycle, retention and authorization rigor as any production datastore.
  • Govern execution, not outputs: The most consequential failures in autonomous systems increasingly occur after generation and before infrastructure mutation.

Here’s a consolidated view of how these guardrails come together.


The defining characteristic of mature AI systems will not be model intelligence alone, but the quality of the control planes governing execution.

As agents gain autonomy, guardrails stop being defensive layers and become operational infrastructure.

. Sandeep Mewara Github
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow
Architects’ Evolution in the Age of Autonomous AI
Agentic AI for Beginners: My Journey into Building with Claude
Architecting Guardrails: The Control Plane for Agentic AI
Agentic AI for Existing Codebases: A Practical Path to Getting Started


The Understanding Lag: AI Makes Code Faster Than Comprehension

We have spent decades making code easier to write. Now that AI can generate working code with minimal effort, something becomes clear: writing was never the hardest part of the job.

The Speed Paradox

The prevailing narrative is that AI makes engineers 10x faster. If you measure speed by lines of code, that’s true. But if you measure speed by how long it takes to move a system from a working demo to production-ready, the improvement is far less clear.

The reality is this: we have made writing code faster than our ability to comprehend it. That gap – the Understanding Lag, is where the real work of modern software engineering now lives.

From Construction to Forensic Analysis

In traditional development, context was built as you wrote code. You made decisions step by step, grappling with constraints in real time. By the time the code was finished, the reasoning behind it was already embedded in your mental model.

When you actually try building systems with AI, that process flips. Code appears fully formed. You didn’t evolve it instead you are reading the outcome. You are a forensic investigator of your own codebase, trying to answer:

  • Why was this done this way?
  • What assumptions are hidden in this logic?
  • What breaks elsewhere if I change this?

This is not a tooling shift. It’s a cognitive one.

Where This Shows Up in Practice

The Understanding Lag is easy to ignore – until you have to work with the code. It shows up when:

  • A “simple change” requires tracing through unfamiliar logic
  • A generated solution works, but you can’t explain why
  • A production issue forces you to debug code you didn’t reason through

The system moves fast. Your confidence catches up slowly.

Patterns of the New Bottleneck

1. Context Reconstruction – We have moved from build-to-understand to read-to-understand. The cognitive load hasn’t disappeared. It has moved from creation to interpretation. The effort is no longer in writing logic but it’s in reconstructing intent.

2. Fragile Ownership – Ownership is no longer about who wrote the code. It’s about who can defend it. When you don’t build the path, your confidence in the system is borrowed, not earned. This becomes very real during a 2:00 AM outage, when you’re debugging a system you technically own but didn’t fully construct.

3. The Demo-to-Prod Chasm – AI is excellent at getting the “happy path” running. But production systems don’t fail at “does it run?” They fail at the boundaries:

  • Security & Compliance: Where does data move?
  • Auditability: Why was a decision made?
  • Resilience: How does the system behave under stress?

The demo works because it lacks constraints. The system fails because it is defined by them.

The Great Inversion of Effort

The effort hasn’t disappeared. It has moved. We are seeing an inversion where implementation is becoming a commodity and understanding and validation are becoming the real work.


We have moved from:

  • Implementing → Validating
  • Building → Reviewing
  • Typing → Thinking

The cost of change is no longer in writing code. It’s in verifying that the change didn’t violate a constraint you didn’t know existed.

The Architectural Implication

If understanding is the bottleneck, then systems must be designed for it. Not for cleverness. Not for brevity. But for legibilitytraceability and verifiability.

In real systems, decisions must be defensible, behavior must be auditable and changes must be safe. The difference between a demo and a system is not code. It’s constraints.

Toward Managed Divergence

AI can generate multiple valid solutions for the same problem. That flexibility is powerful, but uncontrolled, it increases the Understanding Lag. This is where Managed Divergence becomes necessary. Not to restrict AI’s capability, but to constrain where it can have impact:

  • Limit where variation is allowed
  • Keep critical paths predictable
  • Enforce guardrails as part of the architecture

So while code is generated dynamically, the system remains within human comprehension.

The Bottom Line

AI didn’t simplify engineering. It changed the job. You’re no longer just writing code. You’re reconstructing context, validating assumptions and defending systems you didn’t fully build.

AI writes the code. You catch up and decide if it should exist at all.

. Sandeep Mewara Github
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow
Architects’ Evolution in the Age of Autonomous AI
Agentic AI for Beginners: My Journey into Building with Claude
Architecting Guardrails: The Control Plane for Agentic AI
Agentic AI for Existing Codebases: A Practical Path to Getting Started


Disclaimer: The views and opinions expressed in this article are my own and do not necessarily reflect the official policy or position of my current employer. This reflects a point-in-time perspective on a rapidly evolving field, intended to foster dialogue and shared learning within the engineering community.

Agentic Development: The Case for Managed Divergence

Today, many organizations are adopting agentic development, both to unlock its potential and to stay ahead of the curve. My current organization is no different. As part of this effort, a set of alpha teams are exploring its adoption, building early capabilities and sharing learnings to guide broader rollout.

https://learnbyinsight.com/wp-content/uploads/2026/05/agentic-development-divergence.png


Being part of one such alpha team, I have been observing an emerging pattern. Many teams are building similar capabilities (like PDLC orchestrators, agent workflows and supporting skills) but in slightly different ways, often tailored to their specific product contexts.

While this can feel like duplication at first, I believe it is actually driving rapid organizational learning. Sharing a few thoughts on why this phase exists and how we might navigate it more intentionally.

The Paradox: Standardization Needs Maturity

In mature engineering domains, we standardize because the patterns are well understood. With agentic development, we are still discovering the primitives:

  • Evolving Problem Space: Moving from deterministic execution to probabilistic reasoning
  • Forming Abstractions: Defining what an “agent” fundamentally is in our organizational context
  • Emerging Operating Models: Especially how we handle “Human-in-the-loop” (HITL) handoffs

The Risk: In this context, early standardization doesn’t create a foundation instead it creates a ceiling. It constrains exploration before we know what is actually worth scaling.

The “Divergence” Phase: Learning at Scale

What we are seeing right now is a natural progression. It’s a phase characterized by:

  1. Parallel Experimentation: Teams building similar capabilities to solve immediate problems
  2. Local Optimizations: Moving faster by tailoring tools to specific team contexts
  3. The “Almost-Right” Stage: Multiple versions of the same idea, each slightly different

This is the “Broad Adoption” stage. It may look like duplication, but it is actually increasing our learning velocity. We are effectively running parallel A/B tests on architecture across the company.

The Real Danger: Fragmentation Without Direction

Divergence is healthy, but unmanaged fragmentation is not. The challenge arises when:

  • Teams are unaware of parallel efforts
  • Learnings are trapped in silos
  • Solutions are too tightly coupled to be reused or migrated later

If we don’t have a path to converge, we aren’t innovating as effectively, we’re just drifting.

A Balanced Way Forward

To ensure this divergence leads to a stronger future state, I’m leaning into three guiding principles:

https://learnbyinsight.com/wp-content/uploads/2026/05/agentic-balanced-way.png

1. Visibility Over Restriction

We shouldn’t stop teams from building, but we should require them to share. Visibility through demos, shared registries or internal “RFCs” (Requests For Comments) allows the best ideas to gain natural gravity. It reduces “accidental” duplication while allowing “intentional” experimentation.

2. Standardize the Contract, Not the Tool

Instead of enforcing a single framework today, we should align on interfaces:

  • Expected Outputs: What artifacts or checkpoints must an agent produce?
  • Interaction Models: How does an agent request human intervention?

Aligning on the what allows teams to remain flexible on the how.

3. Modular “Build-for-Reuse” Thinking

Even in an alpha phase, we should avoid the “monolithic agent”. By keeping skills and orchestrators modular, we can ensure that when the time comes to converge, we can reuse the best components from different teams rather than rebuilding from scratch.

The “In-Flight” Reality: Our Journey

In our organization, we are currently in this “Go-Broad” phase. We are seeing this divergence play out in real time, with different teams exploring their own agentic implementations based on their context.

While it may look like multiple directions from the outside, from within it feels like a natural extension of the learning process where real-world constraints are shaping what works and what doesn’t.

https://learnbyinsight.com/wp-content/uploads/2026/05/agentic-ai-convergence.png


My expectation is that convergence will happen in due course, potentially evolving into shared patterns similar to those described here. At the same time, this is still unfolding and we remain open to different paths as we continue to learn what truly scales.

Final Thought

One way I have started thinking about this transition is:

Enable divergence. Design for convergence. Execute with discipline.

We are still in an exploration phase and that is a healthy, if sometimes noisy place to be. The focus may not be to eliminate variation today, but to ensure that when convergence happens, it is grounded in real usage and shared learning.

If we continue to build, share and learn openly, the path toward a more unified approach should emerge more naturally.

. Sandeep Mewara Github
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow
Architects’ Evolution in the Age of Autonomous AI
Agentic AI for Beginners: My Journey into Building with Claude
Architecting Guardrails: The Control Plane for Agentic AI
Agentic AI for Existing Codebases: A Practical Path to Getting Started


Disclaimer: The views and opinions expressed in this article are my own and do not necessarily reflect the official policy or position of my current employer. This reflects a point-in-time perspective on a rapidly evolving field, intended to foster dialogue and shared learning within the engineering community.

Architecting Guardrails: The Control Plane for Agentic AI

We are entering a new architectural phase and navigating a meaningful shift. AI systems are moving beyond static responses and into systems that can take actions like triggering workflows, calling APIs and making decisions within production environments. This is transformative.

architect-gaurdrail-agentic-ai-lbi


At the same time, this shift is happening faster than most teams can fully operationalize or standardize. Across industry conversations, early implementations and emerging case studies, I believe a pattern is starting to become clear:

Most AI failures are not model failures – they are control failures.

Not necessarily because systems are poorly designed, but because:

  • boundaries are still evolving
  • failure modes are not fully understood
  • recovery paths are often under-defined

As we move toward more autonomous systems, we are effectively taking cautious steps into production – without always knowing how and when things might surface as unintended outcomes.

When they do, the impact is rarely isolated:

  • it can affect multiple customers
  • it can impact trust and brand perception
  • it can translate into real cost

When these systems scale, we don’t just scale capability. We scale uncertainty and potentially, mistakes.

I believe we are collectively building the playbook as we go and this is my attempt to make sense of what that might look like.

Guardrails: More Than Just a Safety Feature

Guardrails are no longer a theoretical concept or something that can be deferred for later. Increasingly, they are becoming a real and necessary part of building agentic AI systems.

What I still observe, though, is that in many implementations, guardrails are treated as an add-on introduced after the core system is already designed or applied post-facto to fill gaps.

Even when guardrails are considered early, they can sometimes become a checkbox exercise that makes us feel the system is “covered”, while important aspects may still be missing.

Part of the challenge is that we are still learning what “complete” actually looks like. As AI systems continue to evolve, new behaviors, edge cases and failure modes emerge – often faster than teams can fully anticipate.

This is where I have found it useful to shift how I think about guardrails. Instead of treating them as isolated checks, it helps to think of guardrails as the control plane of agentic AI.

https://learnbyinsight.com/wp-content/uploads/2026/05/gaurdrail-control-plane.png


Just as modern systems separate execution (data plane) from governance and coordination (control plane), agentic AI needs a layer that defines:

  • what the system can do
  • what it should do
  • how it behaves under uncertainty or failure

Without this Control Plane, we’re not really building systems – we’re simply reacting to them.

Three Questions Every Architect Should Ask

To make the idea of a control plane more practical, I have found it useful to step back and ask a few simple questions – often before writing a single prompt.

1. Can it do this? (Capability & Access)

  • Does the agent have the right permissions?
  • Are tool calls constrained?
  • Are access boundaries clearly defined?

Example – Billing Agent
An agent generating invoices should not have unrestricted access to pricing configuration.

2. Should it do this? (Policy & Context)

  • Is the action aligned with business rules?
  • Does it respect compliance and intent?
  • Is context being interpreted correctly?

Example – Support AI
Issuing refunds requires understanding policy thresholds and not just user sentiment.

3. What if it goes wrong? (Resiliency & Recovery)

  • Can actions be rolled back?
  • Is there an audit trail?
  • Is there a clear escalation path?

Example – Workflow Agent
Deleting or modifying customer data should always be recoverable.

If these questions are unclear, the agentic system will eventually surface that ambiguity – usually in production.

A Practical Framework for Control

Building on the idea of Guardrails as a Control Plane, it helps to think of them not as a single gate, but as a distributed system of controls.

One way to reason about this is across a few key areas:

CategoryFocusExample
TechnicalValidation & thresholdsPrevent hallucinated financial metrics
SecurityAccess & abuse preventionMask PII based on user roles
EthicalBias & responsible behaviorEnsure fair hiring recommendations
OperationalRuntime controlRate limits and kill switches
InfrastructurePlatform safetySandboxing, isolation and cost boundaries
BusinessAlignment & complianceEnforce pricing rules and customer tiers

Note: These are not independent layers – they interact continuously across the system lifecycle.

Where Guardrails Actually Live

To make this more concrete, it helps to think about where guardrails show up within a system.

They exist across the lifecycle and surface at different points as the system processes inputs, makes decisions and produces outcomes.

In practice, this often looks like:

  • Input validation & policy enforcement
  • Orchestration decisions & tool execution controls
  • Model grounding & memory handling
  • Output validation, monitoring and feedback

Each of these points represents a place where control can be applied or missed.

Hard-Earned Realities of Scaling

For engineers and architects building these systems, the gap between theory and production is where most learning happens.

https://learnbyinsight.com/wp-content/uploads/2026/05/hard-earned-reality-scaling.jpg


Here are a few patterns I have seen emerge across implementations and industry discussions:

1. The Trap of Human-in-the-Loop (HITL)

HITL is often used as a safety net. In many cases today, it’s a necessary part of deploying AI systems responsibly. At the same time, as systems begin to scale, it’s worth being mindful of how it’s used.

In practice:

  • humans can become bottlenecks
  • alert fatigue can set in
  • approvals can turn into routine “rubber-stamping”

The shift is not to remove HITL, but to use it more intentionally.

Design systems to be safe by default and rely on human intervention primarily for:

  • high-risk actions
  • policy exceptions
  • low-confidence scenarios

If every decision requires human approval, I believe the system isn’t truly autonomous instead it’s closer to a complex UI with an approval layer.

2. The Latency Tax

Safety introduces latency where every validation adds a cost in time.

Rather than forcing everything into synchronous checks, it helps to distribute controls across the lifecycle:

  • Pre-execution: Prevent obvious failures
  • In-line: Enforce business logic
  • Asynchronous: Audit and reconcile

3. Policy-as-Code vs. Prompt Engineering

Prompts are flexible, but brittle. Policies are enforceable. Decoupling rules from the model (using tools like Open Policy Agent (OPA) or similar approaches) allows for version control, auditability and model independence. For example, instead of encoding refund limits inside prompts, define them as policies that can be updated independently as business rules evolve.

In many ways, this becomes a key part of the control plane:

  • Prompts guide behaviour
  • Policies enforce behaviour

4. Guardrails Break Silently

A guardrail that works with one model may behave differently with another. Different models interpret constraints differently and edge cases surface in unexpected ways. For example, switching models can silently weaken compliance checks by a Contract Review Agent.

The Takeaway: Maintain a guardrail testing suite. Test adversarial cases, edge scenarios and validate across model versions. If guardrails aren’t tested, they’re just assumptions.

Two Often Overlooked Risks

As systems mature, a couple of areas tend to surface as more “silent” failure modes. They don’t always show up immediately but can have significant impact over time.

1. Economic Guardrails

Agents can loop recursively or call expensive APIs repeatedly, leading to what can effectively become a “Financial Denial of Service”.

In practice, this makes it important to introduce controls such as:

  • session-level budgets
  • token or usage limits
  • execution caps

Cost, in this context, becomes a control boundary – not just a metric.

2. Memory & State Management

Agents don’t just act, they remember. Over time, this introduces challenges around PII retention, long-term context storage and unintended persistence of sensitive data.

Mitigation often involves:

  • retention policies
  • PII filtering
  • memory scrubbing workflows

Memory becomes a liability if not managed intentionally.

The Strategic Bottom Line

To build production-grade agentic AI systems, it becomes important to think in terms of controlling:

  • What the system does (actions)
  • What it spends (economics)
  • What it remembers (state)

Guardrails are not just about safety – they are about sustainability and trust.

Here’s a consolidated view of how these guardrails come together:

https://learnbyinsight.com/wp-content/uploads/2026/05/poster-agent-gaurdrails-dark-v2.png

* This is still evolving but having a structured way to think about it helps in designing systems that scale.

Final Thought

Autonomy is the promise of agentic AI. But autonomy without control isn’t innovation – it’s risk.

As architects, our goal isn’t just to make AI systems work but to make them predictable, controllable and trustworthy over time.

The model is the engine.
Guardrails are the steering, the brakes and the dashboard.

.Sandeep Mewara Github
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow
Architects’ Evolution in the Age of Autonomous AI
Agentic AI for Beginners: My Journey into Building with Claude
Agentic AI for Existing Codebases: A Practical Path to Getting Started


Agentic AI for Existing Codebases: A Practical Path to Getting Started

In the current engineering landscape, there is an unrelenting pressure to chase the “new”. Our LinkedIn feeds are dominated by AI-native learnings, startups and autonomous agents building entire applications from a single prompt in days. For many of us, this creates a strange disconnect.

guide-agentic-ai-existing-codebase


Most engineers aren’t working on greenfield AI experiments. They are responsible for systems that have been running for five, ten or even fifteen years. These are the stable, revenue-generating engines that form the backbone of successful businesses. They are battle-tested, high-stakes and complex.

If you are maintaining one of these systems, it is easy to assume the Agentic AI Wave isn’t meant for you. You might look at your unique architectural patterns or your “legacy” constraints and conclude that an AI agent simply wouldn’t understand.

I’d offer a different perspective: These tools are most transformative in the systems you already understand deeply. You haven’t missed the wave instead you are simply waiting for the right entry point.

From Manual Assistance to Actual Leverage

You might not have integrated AI into your workflow yet. Many teams have already begun doing so and those who have started likely use it for tactical tasks: explaining an obscure regex, generating a unit test for a utility function or writing a quick bash script.

This is a significant step forward, but it remains manual and reactive. Using AI this way is like hiring a brilliant senior consultant but refusing to give them a badge, documentation or context. You spend half your mental energy explaining the “why” before they can even start on the “how”.

When you attempt to move toward Agentic AI – you allow an agent to navigate your repository and suggest multi-file changes. This lack of context becomes a technical liability. Without a “Project Constitution”, the agent is forced to make guesses. Usually, it will:

  • Default to modern “generic” patterns that are incompatible with your specific tech stack.
  • Miss hidden architectural constraints decided years ago for specific performance or security reasons.
  • Suggest “best practice” refactors that look correct in isolation but break your production logic.

The result isn’t just a failed task but it’s wasted time and unnecessary token burn.

The Missing Piece: Contextual Onboarding

Agentic AI doesn’t fail because it lacks power. It fails because it lacks context. Much of your system’s “source of truth” doesn’t actually live in the code. It lives in your head, in tribal memory, in wikis or buried in old Jira or PR descriptions.

The goal isn’t to “teach” the AI everything. It is to provide a minimalist, structured map that allows the agent to operate safely within your boundaries.

The same idea applies to any work with structured systems of any kind like operations workflows, data pipelines, internal tools, etc. Whether it’s code, processes or documentation, the moment you define the rules clearly, the quality of output improves dramatically.

A Practical Starting Point: The claude.md

You don’t need a massive infrastructure change to begin. You can start by creating a claude.md file in your project root. This is your “Project Constitution” – a system guide. It should be precise, technical and grounded in reality.

Start simple, example claude.md:

# Project Guidelines

## Tech Stack
- Node.js 16
- Express
- MongoDB

## Rules
- Do not upgrade dependencies unless asked
- Follow the existing folder structure
- Write tests using Jest

## Notes
- This is a legacy system, avoid large refactors

That’s it. No perfection needed to start. By spending fifteen minutes defining these boundaries, you give the agent more leverage than 90% of teams currently provide. You can refine it over time.

Expanding the Framework: Skills

Once your “Constitution” is set, you can begin defining Skills via a skills.md file. While the claude.md is global, Skills are modular playbooks for recurring workflows.

For example, if you frequently ask the agent to “Add a new API endpoint” or “Migrate a component to TypeScript”, you should document the exact steps those tasks require in your specific environment. These acts as a repeatable playbooks that reduces the back-and-forth and ensures the agent follows your team’s established SOPs (Standard Operating Procedures) when needed.

A Mentor in Your Pocket: Codex-Claude

As you begin to rely more on these agents, you’ll find that “Instruction Engineering” is a skill in itself. If your agent is still going off-track, the issue is almost always an ambiguity in your instructions.

This is why I have been developing Codex-Claude. Think of it as a Linter for your Agentic Strategy. Just as a code linter catches syntax errors, Codex-Claude analyzes your claude.md and skills.md to catch “intent errors”.

The tool helps you with:

  • Automated Architectural Audit: Instantly evaluates your files against best practices and provides a weighted score across structure, specificity and completeness
  • Precision Refactoring & Compaction: Identifies ambiguity and redundancy, rewriting instructions to be more concise and context-efficient
  • Intelligent Conflict Resolution: Detects contradictions and instruction drift, ensuring rules are placed correctly – either within global rules or specialized skill files
  • Progressive Learning Loop: Turns every optimization into a learning opportunity by explaining the “why” behind changes by linking changes to official documentation

You don’t need this to get started, but it helps once you begin refining your setup for more complex tasks.

codex-claude-analysis-result


You can explore and try it out LIVE here: https://sandeep-mewara.github.io/codex-claude/

Watchouts

As you start this journey, keep these three principles in mind:

  • Be precise, not verbose: Every line is context the agent must process. Clear constraints beat long explanations
  • Use tests as safety rails: The agent provides speed. Your test suite provides safety. Never accept changes that have not passed your CI/CD baseline
  • Iterate on Instructions: If an agent fails a task, it likely misunderstood something. Treat it as a bug in your claude.md and fix the instruction

The Architect’s Path Forward

The expectation for delivery speed in our industry is fundamentally shifting. However, adopting Agentic AI isn’t about “coding faster” but it’s about reducing the mental tax of working with mature, complex systems.

You don’t need a new project or deep AI expertise to benefit from this. You just need to start small:

  1. Select one module or one feature
  2. Draft a simple claude.md that defines that module’s rules
  3. Run one task with an agent and observe the difference

The systems that power today’s businesses don’t need to be replaced. They just need the right leverage to move into the future.

. Sandeep Mewara Github
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow
Architects’ Evolution in the Age of Autonomous AI
Agentic AI for Beginners: My Journey into Building with Claude
The Great Inversion: Why AI is Moving from Cloud to Desktop


The Lifecycle Is the Product: AI Development Engine

Every team eventually finds itself rebuilding the same foundational setup in every project. In many organizations, this is still a manual struggle. We write “how we work” docs, define naming conventions and establish review gates that live only in our wikis. For teams already leveraging AI, this setup often exists in isolated pockets like a collection of disconnected prompts telling an assistant to “act as a PM” or “design like an architect”.

agentic-ai-development-engine


In both cases, the expertise remains trapped in silos. For those working manually, the immediate opportunity is to use AI skills to jump-start their specific tasks. But once you do, you quickly reach a plateau – while individual skills and prompts have become portable, the lifecycle around them has not.

That gap is what the Lifecycle Agent Orchestrator (LAO) tries to close. It’s a plugin for Claude Code and Cursor that uses multi-agent orchestration to ship the development process itself as a versioned, overridable artifact. Not just the individual role skills. The stitching between them.

The Problem: Skills Are Portable, Process Is Not

Current AI tools offer impressive specialized skills – performing architecture reviews or enforcing testing conventions with high precision. However, these tools still operate in isolation.

Modern software delivery doesn’t happen in a vacuum. It flows through a series of high-stakes handoffs. This is where even the most advanced teams hit a wall. Despite the promise of automation, the broader lifecycle remains stubbornly manual:

  • The Cognitive Load of Fragmentation: Engineers must still manually orchestrate which tools to run and when, creating a massive tax on context.
  • Traceability Decay (Drift): We lose the “intent” of a feature as it travels from a product ticket to a design mock and finally into the codebase.
  • Simulated Handoffs: We still rely on manual “persona-switching” – manually checking if a design works for a developer or if an architecture suits a product goal.
  • The Proof Problem: At the point of release, we still rely on assumptions rather than programmatic proof that we’ve satisfied every original requirement.

LAO moves beyond single-prompt interactions by employing multi-agent orchestration to bridge the high-stakes handoffs between roles.

The skills are the actors. The lifecycle is the director.

Step 1: Jump-Start Your Work with AI Skills

The quickest way to see value isn’t by changing your entire workflow. It’s by using individual skills to improve the work you’re already doing.

Each role in the plugin is independently usable. You don’t need the full pipeline to get an immediate win:

  • Engineers can use the Intake skill to turn a messy Jira ticket into a clean scope with real acceptance criteria.
  • PMs can use the Product Management skill to draft a structured PRD.
  • Architects can define a technical design running the Architecture skill against a requirement.
# Direct use of individual skills
Invoke code-review skill to review these changes
Invoke intake skill to extract scope from PROJ-5678

This “Step 1” approach pays back immediately.

You get a senior-level assistant for specific tasks without committing to a new workflow.

Step 2: The Orchestrator as the Director

Over time, this pattern exposes a critical bottleneck. We find ourselves questioning: Which skill comes next? Did we skip a step? Are we aligning roles or just checking boxes? Here, we hit the ceiling of isolated tools.

Once you trust the individual skills, the orchestrator stitches them together into a cohesive system.

# command inside a Claude Code or Cursor session
# Direct Jira story ingestion
/lao Work on PROJ-1234

# An ungrounded requirement
/lao Add a user notification preferences API endpoint

# a tire-kick before committing to anything
/lao-dry-run

Through multi-agent orchestration, LAO ensures that the PM, Designer and Architect personas actively review and challenge each other’s outputs.

The goal isn’t just automation – it’s coordination.

Core Internals

The following are a few key design decisions that power the LAO.

The Nine-Phase Engine

The pipeline is nine phases, structured into two halves with different personalities:

agentic-life-cycle-nine-phases

.

  1. Alignment (Phases 1–3): Product, Design and Architecture align early. They cross-review every output to catch gaps before engineers write a single line of code.
  2. Execution (Phases 4–9): Once the team establishes alignment, the system drives the project through scope, design, planning, implementation, validation and shipping.

The key shift is simple: Alignment happens once, upfront. Execution happens without rework.

Project-Specific Infrastructure

Overlays let you define how your specific system works – ensuring your project remains the domain authority. Under the hood, each phase composes up to three layers of knowledge:

LayerLives inContains
BasePluginUniversal rules for the role
OverlayProjectProject-specific patterns, stack, conventions
DomainProjectCross-cutting domain knowledge (auth, payments, compliance)

A project looks like this once it’s connected:

lao-overlay-structure


If a project already has its architecture docs scattered across docs/, there’s no need to move anything. A lao.config.yaml at the project root maps existing files into the engine:

project_name: my-app
languages: [python, react]

overlays:
  architecture: docs/architecture/standards.md
  coding-standards: .cursor/rules/coding.md

domain:
  - docs/domain/*.md
  - src/payments/DESIGN.md

extra_roles:
  compliance-review: tools/compliance/SKILL.md

There are two discovery paths – the convention directory or the config file. If both exist, the config file wins because project-specific overlays take priority.

This is the project respecting itself as the domain authority.

Preview, Then Execute

Every run begins in simulation – a preview of the nine-phase pipeline that writes no files, creates no branches and posts no Jira comments. The orchestrator walks through Phases 1–6, simulating execution to produce realistic PhaseOutput objects and checkpoints for your iteration. It then summarizes Phases 7–9 as projected outcomes, as these require real code execution.

When you’re ready, you say proceed and the pipeline replays – but with the preview’s decisions carried forward instead of regenerated:

agentic-lc-skills-workflow


The system eliminates both upfront cost and the risk of committing to a flawed plan.

Acceptance Criteria, Tracked Across Phases

The system captures acceptance criteria during Intake and tracks them through to Validation, where you must prove each one with recorded evidence to unlock the “Ship” gate. The CLI renders this data as text today, but a dashboard could render it visually tomorrow without requiring any changes to the engine. Every phase emits a PhaseOutput – a structured object with a defined schema.

--- Phase: Tech Design (Phase 5 of 9) ---
Status: Needs Approval

SUMMARY:
  Add rate limiting middleware to API gateway.
  No new dependencies, config-driven thresholds.

ARTIFACTS:
  - [design_doc] docs/design/rate-limiting.md

ACCEPTANCE CRITERIA (tracked):
  AC1: Rate limit of 100 req/min/user ...... pending
  AC2: Returns 429 with retry header ....... pending
  AC3: Configurable per environment ........ pending

→ Approve to proceed to Plan or request changes.

No claims without fresh proof – that’s the whole point of the validation gate.

Multi-Language, Without a Fork Per Language

Four skills need to know what language they’re looking at: coding-standardstesting-conventionscode-review and security. Each has a universal base and a language pack for the specifics:

lao-multi-lang-support


Currently, plugin supports Python, Java, C# and React. Detection runs once at pipeline start: if lao.config.yaml lists languages, use them, otherwise scan for pyproject.tomlpom.xml*.csprojpackage.json with a React dep and collect every match. A full-stack repo auto-detects as [python, react] and both packs get loaded. The agent applies each to the right file types.

Adding a new language – Go, Rust, anything – means creating a references/<language>/directory in those four skills with the expected files, plus a couple of lines in detection and validation scripts.

No change to the universal base. That separation is worth preserving.

Role vs. Workflow Split

The design deliberately separates Phases 1–5 (Role-based) from Phases 6–9 (Execution-based).

Phases 1–5 (Roles): These phases use individual skill files (PM, XD, Architecture, Intake) through multi-agent orchestration because judgment varies by project. A fintech audit requires different logic than a game engine pipeline, so project overlays merge with these base skills to provide local context.

Phases 6–9 (Workflows): These phases power the orchestrator’s core engine (TDD, validation, shipping) and maintain tight coupling for continuity. Unlike roles, workflows use substitutions. If you override a workflow, such as swapping TDD for BDD, the new logic replaces the built-in engine entirely rather than layering on top of it.

# Override the workflow for a single phase
workflows:
  
  # BDD instead of TDD
  implement: docs/workflows/our-bdd-process.md

  # custom release flow
  ship: docs/workflows/our-release-process.md    

Separate judgment from execution to protect flexible strategy without sacrificing delivery.

When to Use This and When Not

The Sweet Spot: Use this if you pair Claude Code or Cursor with Jira/PRD-driven intake. It excels for teams that front-load design and track ACs to the finish line. The engine treats multi-language and monorepos as first-class citizens, using config-based discovery to navigate complex structures.

The Breaking Point: Avoid this for ad-hoc work lacking tickets or defined ACs. The fit weakens if your “ship” phase involves unmodeled complexity – like mobile store submissions or if you require unattended, autonomous execution. The plugin is designed as a human-in-the-loop engine. It doesn’t chase full autonomy – yet.

What Changes When You Adopt This?

The most immediate change is practical: Your development process leaves the wiki and enters your repository as a versioned artifact.

But adoption doesn’t have to be a cliff. You start by using individual skills (single agent) to improve local tasks. As you build trust, you let the orchestrator (multi-agent) handle the parts that are hardest to do manually – the handoffs, the alignment and the validation.

Over time, the shift becomes structural:

  • Handoffs become explicit gates
  • Requirements become traceable
  • Validation becomes evidence-driven
  • The lifecycle becomes consistent

I expect the next iteration of the tool to automate this entire flow. For now, we must build it by hand – or, more precisely, install it.

Closing Thought

This isn’t about replacing how teams work. It’s about making how they work explicit and reliable.

The lifecycle stops being something you document. It becomes something you execute and once that happens, it’s no longer just process. 

It’s part of the product.

. Sandeep Mewara Github
Tech Explore
Trend


Repository & Contribution

The Lifecycle Multi-Agent Orchestrator is available as an open-source project. I encourage you to explore the repository, use the individual skills to jump-start your own work and contribute to the evolution of portable development engine.

In practice, this kind of artifact only gets better when it’s applied across different projects and constraints.


. Architects’ Evolution in the Age of Autonomous AI
Machine Learning workflow
Agentic AI for Beginners: My Journey into Building with Claude

Architects’ Evolution in the Age of Autonomous AI

Lately, I’ve been watching the “3X World” move from a concept to a daily reality. In a recent project, AI allowed me to iterate through architectural options and tech stacks in days, exploring directions that would have been far too time-consuming to even consider a few years ago.

architect-ai-age-evolution


It’s a meaningful leap in productivity, but it also highlights a subtle gap. While the machine can optimize for the present with incredible speed, it doesn’t inherently account for longer-term consequences. It can give us a strong version of “today”, but it’s still on us to ensure we’re building for “tomorrow”.

That shift is what stands out to me. As the “grind” of production begins to fade, a more critical responsibility seems to be taking its place – what I’d describe as system-level judgment. Our role is moving from primarily designing and implementing components to being accountable for the integrity of the overall system.

Below are my thoughts on how the Architect’s role is evolving in this new era of autonomous AI and agentic automated stacks.

1. The 2026 Tipping Point: Breaking the “Model Collapse”

I believe we hit a documented wall in early 2026. Data shows that nearly 50% of the world’s software code is now AI-generated (Netcorp, 2026). This has triggered what researchers call “Model Collapse” – a degenerative loop where AI begins learning from its own average, synthetic outputs rather than high-quality human intent (IBM, 2026).

From my perspective, our role is no longer to just “produce” content. If we blindly follow AI, we aren’t just being efficient but also contributing to a loop of mediocrity. I see our new job as being the “Circuit Breaker” – the human who injects original, context-rich intelligence that the machine simply cannot generate on its own.

2. The New Blueprint: Governing the AI-First Stack

I believe the “Blueprint” has fundamentally changed. We are no longer just looking at isolated code repositories but are designing Layered Enterprise Systems. A typical architecture today is a sophisticated application layer that combines:

  • Orchestration & Agents: Coordinating complex workflows.
  • Knowledge Retrieval (RAG): Connecting models to vector databases and document stores.
  • Guardrails & Observability: Enforcing policy and monitoring system health.
architect-blueprint-new


When I look at this stack, I don’t just see a technical diagram. I see a new mandate for the Architect. We must be the ones to define the governance of these layers. Without our oversight, the “Orchestration” lacks logic and the “Knowledge Retrieval” becomes a graveyard of synthetic data.

3. The Divergent Advantage: Why the “Winner” is Augmented

In the past, we were limited by “Time-to-Sketch”. Today, I believe the “Winner” is the Architect who uses AI as an Iteration Engine to manage risk and explore scale.

  • Exploration at Scale: We can now test multiple different structural tech-stacks in less than a week. I don’t see this replacing our creativity, instead I see it liberating it. We can finally ask “What if?” without the fear of wasting a week of production time.


  • The Justified “Rule-Break”: I think about this like a leader looking at a team’s calendar. An AI might see a one-hour team lunch as a 15% drop in productivity and suggest shortening it. But a human leader knows that those lunch discussions help connect lead developers with others and sometimes they even end up solving the most pressing issues through informal conversation. The AI optimizes for output but I believe our value lies in optimizing for the environment that creates the output.

    ai-data-to-architect-intent

    Thus, while AI can handle 70% of the “grind”, it inevitably hits a ceiling where logic meets human reality. Further, in my experience, a junior engineer using AI can only optimize for Correctness, but only an architect can optimize for Meaning.

4. The Technical Translator and Context Provider

I’ve always felt that architecture is a bridge between logic and emotion. While a business leader owns the “Why” of the profit, I see the Architect as the Technical Translator.

architect-meaning-ai


AI can generate a “perfect” plan, but it cannot explain the trade-offs to a concerned stakeholder or negotiate the “Unspoken Brief” – the fears and desires of a community that never make it into a data prompt. Architects are the “Context Provider” who provides the connective tissue that links today’s prompt to a 2031 expansion, ensuring the system doesn’t just work, but scales.

5. The Guardrail Mandate: Catching the 1% Hallucination

I’ve come to see AI as a “Probability Machine”, not a “Judgment Machine”. It designs for the 99% most likely scenarios, often missing the 1% edge-cases that could lead to disaster.

  • The “Technically Legal” Trap: I think of it like a tax professional I spoke with recently. An AI can optimize a return to save a client $10,000 using a cold, logical loophole. It’s “correct” data. But the human professional says, “If we do this, we’ll trigger a three-year audit that will cost $50,000 in fees.” The AI saw a win but the professional saw a systemic risk.


  • The Technical Debt Trap: AI “dumps” 200 lines of code in seconds, creating a Reviewer’s Paradox. Under pressure to match machine speed, I’ve seen engineers fall into “Blind Acceptance“, assuming professional-looking code is logically sound. In 2026, I believe this is our greatest risk and is the leading cause of “AI Technical Debt” (Sonar, 2026).


  • Severity-Driven Review: We don’t audit every line. In our workflow, we focus our “scar tissue”, our experience on the High-Risk Nodes like accuracy, security, resiliancy and scalability.

6. Professional Integrity: The Non-Transferable Seal

The global consensus in 2026 is firm: You cannot sue an algorithm. Under the EU Product Liability Directive, liability follows control. If you deploy an AI system, you bear the responsibility for its “hallucinations”.

architect-ai-approval-seal


While a company may carry the financial responsibility, I still feel that the professional integrity largely rests with the individuals. When I approve a project, it feels less like a formality and more like a personal assurance that the solution, whether shaped by AI or otherwise, is robust. Ultimately, our professional reputation plays an important role in bridging the gap between a digital design and a product that is reliable, secure and compliant (NCARB, 2026).

Summary: My View on the Evolutionary Roadmap

DimensionJunior / AI
(Producer)
Technical Architect
(Gatekeeper)
FocusTask Execution: “How do I design this?”System Integrity: “Why are we doing this?”
GoalOptimization: The most efficient path.Curation: The most meaningful path.
System ViewComponent-level focus.Full-Stack Governance.
Risk RoleIdentifying Known Errors.PredictingUnknown Consequences.
Key ValueSpeed and Accuracy.Judgment and Liability.
AuthorityOperates the Tools.Signs the Professional Guarantee.

Final Thoughts: The Promotion of the Profession

In my view, Architects aren’t being replaced. I believe, we are being elevated to a higher level of responsibility. What I think of as a “3X World” – where AI significantly accelerates execution and reduces the grind of building, but seems to amplify the weight of our decisions.

architect-ai-intelligence-gatekeeper


I see us moving from being System Implementers to being Intelligence Gatekeepers. I’m not afraid of the machine’s speed – I’m afraid of the moment we stop asking “Why?”. In a world of infinite, automated options, I believe the person who can choose correctly is the only one who truly matters.

“The AI provides the options, the Architect creates meaning, make decisions and define guardrails”.

. Sandeep Mewara Github
Mastering the SKILL.md File in Agentic AI: A Complete Guide
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow
Agentic AI for Beginners: My Journey into Building with Claude
The Great Inversion: Why AI is Moving from Cloud to Desktop

Mastering the SKILL.md File in Agentic AI: A Complete Guide

In modern Agentic AI architectures, the primary engineering challenge is no longer generating language, but bridging the gap between conversational intent and reliable, repeatable and unambiguous execution. To achieve this, we must treat agent capabilities not as conversational shortcuts, but as well-defined engineering assets.

skill-md-agentic-ai.png


This requires a standardized contract for capability execution. That’s where SKILL.md comes in. A formal, machine-parsable definition file that acts as a Standard Interoperability Definition (SID) contract for systematic task execution within an agentic framework.

In this blog, I’ll dive deep into SKILL.md and share how it serves as a single source of truth for both conceptual planning (roles) and procedural execution (workflows) that power an automated, engineering-grade SDLC.

The Architectural Blueprint: The SKILL.md

SKILL.md is structured as an engineering specification, designed for zero-ambiguity parsing by an LLM like Claude. It defines the contract for interoperability, forcing teams to move from conversational requests to precise capability definitions.

Anatomy of an Engineering Contract

The specification consists of five required metadata fields that are immutable and machine-parsable:

  • Name: An immutable, unique, system-wide identifier for the capability (e.g., internal-token-manager-v1exec-raise-github-pr-v1, or sdlc-pm-v1). This is the system’s handle for the skill.


  • Description: Critically, this is not a summary. It is the definitive Trigger Event Definition. It must be written from the perspective of an event, user query or internal signal that activates this capability, allowing the framework to perform accurate skill matching. Example: “Triggers automatically after a successful code analysis scan…”


  • Commands: A list of executable operations or prompts defined by the contract. For procedural skills, these map to API endpoints or internal function calls. For conceptual skills, these map to defined prompt sequences. Example: get-linter-report(timestamp) or refresh-token(service_id).


  • Constraints: A critical safety and resource management section. It defines the limits, rules and error conditions of the contract. Example: “Internal authentication tokens must expire after 1 hour.”


  • Examples: These are not suggestions but are the gold standard of Expected Behavior. They define the intended output for specific input scenarios, providing the LLM with a definitive blueprint for successful execution and reducing non-deterministic output.
# Code Snippet 1: Sample Procedural SKILL.md (Raise GitHub PR)
---
# REQUIRED METADATA FIELDS (SID CONTRACT)

name: exec-raise-github-pr-v1
description: Triggers automatically after a successful 'exec-linter-code-analyzer-v1' scan or upon user request to systematically raise a new pull request on GitHub for reviewed code.
commands:
  - create-pr(repository_url, head_branch, base_branch, title, body)
constraints:
  - Must use a valid GitHub API token with 'repo' scope.
  - Head branch must differ from the base branch.
---

### Expected Behavior (Examples)

When this skill is matched against a standard JavaScript repository:
  - Input: create-pr("https://github.com/org/repo.git", "feat/new-api", "main", "Feat: Add API v2", "This PR introduces...")
  - Execution: Loads 'scripts/create_pr.py'.
  - Output: New PR URL.

Directory Structure & Progressive Disclosure

The SKILL.md is packaged within a defined directory structure, ensuring all supporting assets are decoupled and version-controlled alongside the specification.

skill-folder-structure.jpg

.Sandeep Mewara Github

  • 📄 SKILL.md (The only required asset, containing the definitions and contract).
  • 📁 scripts/ (Optional: Decoupled logic – Python, Bash, Node.js, etc. The implementation details of the contract).
  • 📁 references/ (Optional: Docs, checklists, design patterns or standards the skill must adhere to).
  • 📁 assets/ (Optional: Templates or sample data).

This decoupled architecture enables the Progressive Disclosure Pattern, which is critical for system efficiency and managing token constraints. A high-performance agentic system should not load every asset for every skill simultaneously. Progressive disclosure ensures assets are loaded only when necessary.

skill-md-activation-flow.jpg


Agents don’t load everything at once. They discover and expand context only when needed.

Architecting the Automated SDLC

The standardization offered by SKILL.md allows us to architect and separate the dynamic pillars of an automated SDLC, managing all capabilities via this single specification. In a professional lifecycle, conceptual setup (Defining Roles) always precedes procedural execution (Executing Workflows).

Conceptual Role-Based Skills: Defining the Contract for a Persona (Planning & Setup)

To initiate any SDLC phase (e.g., Requirements), we must first define the conceptual frameworks, knowledge bases and systematic planning workflows of specific roles that help organise content by domain (behaviour-driven). We apply the identical SKILL.md standard to define a persona’s “mindset”.

  • WHAT: SKILL.md definitions for Product Manager Persona or Lead Developer Persona.


  • APPLICATION: During the “Requirements” and “Design” phases of the SDLC.


  • ARCHITECTURAL FLOW: During planning, you activate the Product Manager Persona (Code Snippet 2). Claude adopts this mindset and leverages knowledge references (e.g., Agile standards) and the command contract (draft-prd(user_stories)) to provide focused, high-quality requirements.
Code Snippet 2: Sample Conceptual SKILL.md (Product Manager)
---
# REQUIRED METADATA FIELDS (SID CONTRACT)

name: sdlc-pm-v1
description: Triggers during project initiation to define the persona, responsibilities, knowledge base and systematic planning workflows of a senior Product Manager.
commands:
  - draft-prd(user_stories, acceptance_criteria)
  - run-feature-prioritization(prd_document)
constraints:
  - Must reference files in the optional 'references/' directory (e.g., 'references/agile-standards.md') for all Agile terminology.
---

### Expected Behavior (Examples)

When this skill is matched to a new project request:
  - Input: draft-prd(user_stories, acceptance_criteria)
  - Execution: Loads 'references/agile-standards.md' to define terminology.
  - Output: A structured PRD document based on the internal persona.

External Workflow Execution Skills: Defining the Contract for the Workflow to ‘Do’

Once the groundwork is established and the build begins, the agent’s focus shifts to user-triggered workflows (e.g., after a commit). These skills are guides that help perform specific, measurable steps in the automated pipeline, providing the user with domain-specific results (task-driven).

  • WHAT:SKILL.md definitions for exec-linter-code-analyzerexec-raise-github-pr, or jira-ticket-update.


  • APPLICATION: During the “Build,” “Test” and “Deploy” phases of the SDLC, typically automated by CI/CD events.


  • ARCHITECTURAL FLOW: After a successful code implementation event, the framework activates the exec-linter-code-analyzer-v1 (Code Snippet 3). Claude reads the inputs and expected behavior. The framework executes the decoupled logic (scripts/) to systematically create the pull request, ensuring a reliable result (the PR URL) is provided back to the user’s workflow or CI/CD pipeline.
Code Snippet 3: Sample Procedural SKILL.md (Code Analyzer Workflow)
---
# REQUIRED METADATA FIELDS (SID CONTRACT)
name: exec-linter-code-analyzer-v1
description: Triggers automatically after a code commit event to execute a static analysis and linter scan on the modified files in a specific repository, providing a systematic JSON report.
commands:
  - run-analysis(repository_url, branch)
constraints:
  - Must use a valid GitHub API token with 'repo' scope.
---

### Expected Behavior (Examples)
When this skill is matched following a code commit:
  - Input: run-analysis("https://github.com/org/repo.git", "main")
  - Execution: Loads 'scripts/run_analysis.py'.
  - Output: Linter report JSON.

Internal Agent Operational Skills: Defining the Contract for the Software to ‘Be’

To ensure system stability, the agent software itself requires precise, standardized contracts for core operational tasks (like authentication, state, error handling, api-call, etc). These skills are operational and invisible to the SDLC workflow itself. They focus on the agent’s internal robustness and platform integrity.

  • WHAT: SKILL.md definitions for internal-token-manager or agent-state-historian.


  • APPLICATION: Triggered automatically by the agent’s orchestration layer during defined lifecycle events (e.g., establishing a session state, refreshing an expired 401 token).


  • ARCHITECTURAL FLOW: When any skill requires access to a restricted API, it activates the internal-token-manager (Code Snippet 4). Claude reads the command contract (refresh-token(service_id)). The framework executes the decoupled logic (scripts/) to refresh the secure token, ensuring the agent software can authenticate without creating brittle, direct credential dependencies in the domain-level skills. This internal complexity is hidden from the user but critical for security and robustness.
Code Snippet 4: Sample Procedural SKILL.md (Token Manager)
---
# REQUIRED METADATA FIELDS (SID CONTRACT)
name: internal-token-manager-v1
description: An internal operational skill that triggers throughout a workflow when the agent detects it requires a secure token to authenticate against an external service (e.g., GitHub, Slack, Splunk).
commands:
  - refresh-token(service_id)
constraints:
  - Must use a valid agent credential secret (e.g., 'agent_platform_secret').
  - Tokens must expire after 1 hour.
---

### Expected Behavior (Examples)

When this skill is matched when a GitHub operation requires auth:
  - Input: refresh-token("github_api")
  - Execution: Loads 'scripts/refresh_token.py'.
  - Output: New OAuth token JSON.

The Boundary of Autonomy and the Expertise Gap

While standardizing capabilities via SKILL.md is essential, I believe it is critical for architects to also define where SKILL.md is not the right tool. My own perspective, based on recent project implementation, is that a common architectural failure is expecting SKILL.md to easily encode true Domain Expertise and Heuristic Judgment.

Offloading Heuristics vs. Offloading Wisdom

A well-defined SKILL.md is designed to be precise, measurable and standardized. It excels at offloading common known items, standard checklists and systematic patterns into reliable workflows (as seen in our Code Snippets 3 & 4). In my recent project, this precision made the skills function as excellent fixed checklists, significantly reducing operational ambiguity.

This same precision, however, means it can appear only as a checklist. A procedural skill like exec-linter-code-analyzer can identify a syntax error based on a rule, but I found it often lacked the domain wisdom to understand the conceptual design decision that led to that error.

Assisting Expertise, Not Replacing It

Based on the experience so far, I believe that you cannot easily encode a senior engineer’s years of nuanced design thinking into a SKILL.md description. The true architectural value of a standardized specification is that it offloads the reliable execution complexity, allowing the Human Expert (or a high-level Agentic Persona) to focus entirely on core domain and design reasoning.

For now, I believe following a model where three distinct pillars of knowledge are defined will work out:

  1. Systematic Workflows (Procedural Skills): Handled perfectly by SKILL.md. (The “What to Do”)
  2. Conceptual Frameworks (Persona Mindsets): Setup by SKILL.md. (How Claude “Thinks”)
  3. Domain Wisdom & Design Reasoning: Passed as the problem context in the main prompt. (Why Claude “Decides”)

Engineering Best Practices for SKILL.md Mastery

Achieving systematic capability definition requires adhering to these foundational best practices:

  1. Strict Decoupling: Never place the execution logic (e.g., Python code) directly within the SKILL.md file. The SKILL.md is the specification & the scripts/ directory is the implementation.


  2. Immutability: Once a skill is deployed, treat its metadata (Name, Description, Commands) as immutable. Any significant change requires a new version (e.g., exec-raise-github-pr-v2). Brittleness often stems from changing definitions in place.


  3. Description as a Trigger: Never write a summary description (e.g., “This skill runs a linter”). It must be written as a trigger definition (e.g., “Triggers automatically after a context save event…”). Skill matching depends entirely on this accuracy.


  4. Token Economy: Adhere to strict size constraints: < 500 lines and < 5k tokens for the SKILL.md. The Progressive Disclosure pattern will handle heavier assets, keeping the SID itself focused and parseable.


  5. Git-Managed Context: Treat SKILL.md files as code. They must be version-controlled in Git, promoting discoverability, reuse and providing a traceable history of how capabilities have evolved throughout the lifecycle.

Final Thought: A Standard for Scaling Autonomy

By adopting the SKILL.md specification, we move from fuzzy conversational AI to a structured engineering discipline, where all agent capabilities, whether they are internal operational requirements, external user workflows or conceptual roles framework – all are defined by precise, version-controlled contracts.

This foundation standardizes reliable execution complexity, not only making your automated SDLC predictable and robust but also ensuring that precious domain expertise remains focused on main design decisions, not common patterns. Mastering the SKILL.md standard is the definitive, interoperable foundation for building scalable, maintainable and engineering-grade AgenticAI architectures.

. Sandeep Mewara Github
News Update
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow
Agentic AI for Beginners: My Journey into Building with Claude
The Great Inversion: Why AI is Moving from Cloud to Desktop

[DOWNLOADskill.md Quick Reference Guide]

.

Kubernetes – Evolution of application deployment

Kubernetes (K8s) is turning out as the cutting-edge of application deployment. It is becoming core to the creation and operation of modern software (few call it as modern SaaS). Thus, I planned to look into it and see what Kubernetes is and how/what application design will help adapt it in the application deployment evolution.

Kubernetes is a portable, extensible, open-source platform for automating deployment, scaling, and management of containerized applications.

History

Google originally designed and open-sourced the Kubernetes project in 2014. Kubernetes has inputs from over 15 years of Google’s experience to run production workloads at scale with best ideas and practices from the community. It is maintained by the Cloud Native Computing Foundation now. It’s current development repository is here.

First challenge …

With modern goal parameters like: recoverability, release cycle time & release frequency – applications need to be designed and deployed in a way that makes them improve year over year.

This leads to first step of breaking the monolith into microservices such that the changes and impact are compartmentalized for easy deployment and recovery.

monolith2microservice

A monolithic application puts all it’s functionality in a single process. In need of scaling, it replicates entire monolith on multiple servers. On the other hand, a microservice architecture separates out (keeps) each functionality into a separate service. Thus in case of scaling need, these services are distributed across servers as required.

Second challenge …

With multiple microservices in play, a variance of stack versions or deployment styles kicks in as trouble. Each team would have their own set of tools, versions to build the artifacts, store them and then deploy them. Thus, different applications/services can have different patterns and network topology. This in turn makes managing security and infrastructure more challenging.

This leads to the step of abstracting infrastructure out to ease maintenance and relieve from security and other infrastructure related concerns.

deployment-progression
Deployment scheme evolution
  • Traditional: Applications running on a physical server. No way to define resource boundaries for applications.
  • Virtualization: Allows to run multiple Virtual Machines (VMs) on a single physical server’s CPU. This leads to better utilization of resources and better scalability as an application can be added or updated easily. Also, if needed, applications can be isolated between different VMs to provide a level of security.
  • Containers: Like VM, it has its own filesystem, CPU, memory, process space, etc. Are environment consistent, easy to scale, portable across clouds and OS distributions. This leads to loosely coupled setup where application is totally decoupled from infrastructure and makes it easy to move towards smaller, modular microservices.

Containers are abstraction to next level. It does not matter on which OS you are on (although there could be different containers for different OS and how they work underlying), all we need is to package our code and needed libraries together, which then runs inside a container based on configured resource need. Docker is an example of container runtime, a packaging software.

Final challenge …

So, the packaging has been simplified and running the application on a single node has been simplified. When we move to enterprise, we need to scale up/down our containers on need basis automatically. Further, one would scale the application to be served from multiple servers instead of just one for better load distribution and easy recovery/fail safe. Now, while distributing the load, we would need to ensure the availability of nodes, resources like space on node for running a container, etc.

This is where Kubernetes pitch in. It acts as a container orchestrator that help provides with a framework to run distributed systems resiliently. It takes care of scaling and failover of containers having application, provides deployment patterns, and more.

kubernetes-architecture

Kubernetes has master-slave architecture where there is one master node and multiple worker nodes. A Pod is the smallest deployable unit in it. In order to run a single container, we would need to create a Pod for that container. A Pod can contain more than one container if those containers are relatively tightly coupled (like a container to download all secret configs related before application starts in other container).

API Server is the heart of the architecture. User interacts with Kubernetes via it and master node communicates to worker nodes through it. Number of containers requested is stored in the etcd (key-value store). Controller acts as a manager that keeps a constant check on the store, schedules the request for scheduler to pick and execute, spins of another worker node in case of need.

Wrap Up …

I have just touched the surface of both containerization and Kubernetes. They seem to have much more and can be explored in depth. Along with vast benefits, it can also bring new challenges on the table with moving to cloud like security and networking.

It was good to know how application design and deployment are evolving, getting abstracted and loosely coupled.

Keep learning!

Reference: https://kubernetes.io/docs/home/

GitHub Readme Samples

Beginner’s Guide to understand Kafka

It’s a digital age. Wherever there is data, we hear about Kafka these days. One of my projects I work, involves entire data system (Java backend) that leverages Kafka to achieve what deals with tonnes of data through various channels and departments. While working on it, I thought of exploring the setup in Windows. Thus, this guide helps learn Kafka and showcases the setup and test of data pipeline in Windows.

Introduction

<kafka-logo>
An OpenSource Project in Java & Scala

Apache Kafka is a distributed streaming platform with three key capabilities:

  • Messaging system – Publish-Subscribe to stream of records
  • Availability & Reliability – Store streams of records in a fault tolerant durable way
  • Scalable & Real time – Process streams of records as they occur

Data system components

Kafka is generally used to stream data into applications, data lakes and real-time stream analytics systems.

<kafka-highlevel-architecture>

Application inputs messages onto the Kafka server. These messages can be any defined information planned to capture. It is passed across in a reliable (due to distributed Kafka architecture) way to another application or service to process or re-process them.

Internally, Kafka uses a data structure to manage its messages. These messages have a retention policy applied at a unit level of this data structure. Retention is configurable – time based or size based. By default, the data sent is stored for 168 hours (7 days).

Kafka Architecture

Typically, there would be multiples of producers, consumers, clusters working with messages across. Horizontal scaling can be easily done by adding more brokers. Diagram below depicts the sample architecture:

kafka-internals

Kafka communicates between the clients and servers with TCP protocol. For more details, refer: Kafka Protocol Guide

Kafka ecosystem provides REST proxy that allows an easy integration via HTTP and JSON too.

Primarily it has four key APIs: Producer API, Consumer API, Streams API, Connector API

Key Components & related terminology
  • Messages/Records – byte arrays of an object. Consists of a key, value & timestamp
  • Topic – feeds of messages in categories
  • Producer – processes that publish messages to a Kafka topic
  • Consumer – processes that subscribe to topics and process the feed of published messages
  • Broker – It hosts topics. Also referred as Kafka Server or Kafka Node
  • Cluster – comprises one or more brokers
  • Zookeeper – keeps the state of the cluster (brokers, topics, consumers)
  • Connector – connect topics to existing applications or data systems
  • Stream Processor – consumes an input stream from a topic and produces an output stream to an output topic
  • ISR (In-Sync Replica) – replication to support failover.
  • Controller – broker in a cluster responsible for maintaining the leader/follower relationship for all the partitions
Zookeeper

Apache ZooKeeper is an open source that helps build distributed applications. It’s a centralized service for maintaining configuration information. It holds responsibilities like:

  • Broker state – maintains list of active brokers and which cluster they are part of
  • Topics configured – maintains list of all topics, number of partitions for each topic, location of all replicas, who is the preferred leader, list of ISR for partitions
  • Controller election – selects a new controller whenever a node shuts down. Also, makes sure that there is only one controller at any given time
  • ACL info – maintains Access control lists (ACLs) for all the topics

Kafka Internals

Brokers in a cluster are differentiated based on an ID which typically are unique numbers. Connecting to one broker bootstraps a client to the entire Kafka cluster. They receive messages from producers and allow consumers to fetch messages by topic, partition and offset.

A Topic is spread across a Kafka cluster as a logical group of one or more partitions. A partition is defined as an ordered sequence of messages that are distributed across multiple brokers. The number of partitions per topic are configurable during creation.

Producers write to Topics. Consumers read from Topics.

<kafka-partition>

Kafka uses Log data structure to manage its messages. Log data structure is an ordered set of Segments that are collection of messages. Each segment has files that help locate a message:

  1. Log file – stores message
  2. Index file – stores message offset and its starting position in the log file

Kafka appends records from a producer to the end of a topic log. Consumers can read from any committed offset and are allowed to read from any offset point they choose. The record is considered committed only when all ISRs for partition write to their log.

leader-follower

Among the multiple partitions, there is one leader and remaining are replicas/followers to serve as back up. If a leader fails, an ISR is chosen as a new leader. Leader performs all reads and writes to a particular topic partition. Followers passively replicate the leader. Consumers are allowed to read only from the leader partition.

A leader and follower of a partition can never reside on the same node.

leader-follower2

Kafka also supports log compaction for records. With it, Kafka will keep the latest version of a record and delete the older versions. This leads to a granular retention mechanism where the last update for each key is kept.

Offset manager is responsible for storing, fetching and maintaining consumer offsets. Every live broker has one instance of an offset manager. By default, consumer is configured to use an automatic commit policy of periodic interval. Alternatively, consumer can use a commit API for manual offset management.

Kafka uses a particular topic, __consumer_offsets, to save consumer offsets. This offset records the read location of each consumer in each group. This helps a consumer to trace back its last location in case of need. With committing offsets to the broker, consumer no longer depends on ZooKeeper.

Older versions of Kafka (pre 0.9) stored offsets in ZooKeeper only, while newer version of Kafka, by default stores offsets in an internal Kafka topic __consumer_offsets

consumer-groups

Kafka allows consumer groups to read data in parallel from a topic. All the consumers in a group has same group ID. At a time, only one consumer from a group can consume messages from a partition to guarantee the order of reading messages from a partition. A consumer can read from more than one partition.

Kafka Setup On Windows

setup-on-windows
Pre-Requisite
Setup files
  1. Install JRE – default settings should be fine
  2. Un-tar Kafka files at C:\Installs (could be any location by choice). All the required script files for Kafka data pipeline setup will be located at: C:\Installs\kafka_2.12-2.5.0\bin\windows
  3. Configuration changes as per Windows need
    • Setup for Kafka logs – Create a folder ‘logs’ at location C:\Installs\kafka_2.12-2.5.0
    • Set this logs folder location in Kafka config file: C:\Installs\kafka_2.12-2.5.0\config\server.properties as log.dirs=C:\Installs\kafka_2.12-2.5.0\logs
    • Setup for Zookeeper data – Create a folder ‘data’ at location C:\Installs\kafka_2.12-2.5.0
    • Set this data folder location in Zookeeper config file: C:\Installs\kafka_2.12-2.5.0\config\zookeeper.properties as dataDir=C:\Installs\kafka_2.12-2.5.0\data
Execute
  1. ZooKeeper – Get a quick-and-dirty single-node ZooKeeper instance using the convenience script already packaged along with Kafka files.
    • Open a command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • Execute script: zookeeper-server-start.bat C:\Installs\kafka_2.12-2.5.0\config\zookeeper.properties
    • ZooKeeper started at localhost:2181. Keep it running.
      demo-zookeeper
  2. Kafka Server – Get a single-node Kafka instance.
    • Open another command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • ZooKeeper is already configured in the properties file as zookeeper.connect=localhost:2181
    • Execute script: kafka-server-start.bat C:\Installs\kafka_2.12-2.5.0\config\server.properties
    • Kafka server started at localhost: 9092. Keep it running.
      demo-kafka
      Now, topics can be created and messages can be stored. We can produce and consume data from any client. We will use command prompt for now.
  3. Topic – Create a topic named ‘testkafka’
    • Use replication factor as 1 & partitions as 1 given we have made a single instance node
    • Open another command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • Execute script: kafka-topics.bat --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic testkafka
    • Execute script to see created topic: kafka-topics.bat --list --bootstrap-server localhost:9092
      demo-topic
    • Keep the command prompt open just in case.
  4. Producer – setup to send messages to the server
    • Open another command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • Execute script: kafka-console-producer.bat --bootstrap-server localhost:9092 --topic testkafka
    • It will show a ‘>’ as a prompt to type a message. Type: “Kafka demo – Message from server”
      demo-producer
    • Keep the command prompt open. We will come back to it to push more messages
  5. Consumer – setup to receive messages from the server
    • Open another command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • Execute script: kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic testkafka --from-beginning
    • You would see the Producer sent message in this command prompt window – “Kafka demo – Message from server”
      demo-consumer
    • Go back to Producer command prompt and type any other message to see them appearing real time in Consumer command prompt
      kafka-demo
  6. Check/Observe – few key changes behind the scene
    • Files under topic created – they keep track of the messages pushed for a given topic
      topic-files
    • Data inside the log file – All the messages that are pushed by producer are stored here
      topic-log
    • Topics present in Kafka – once a consumer starts reading messages from topic, __consumer_offsets is automatically created as a topic
      topic-present

NOTE: In case you want to choose Zookeeper to store topics instead of Kafka server, it would require following script commands:

  • Topic create: kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic testkafka
  • Topics view: kafka-topics.bat --list --zookeeper localhost:2181

With above, we are able to see messages sent by Producer and received by Consumer using a Kafka setup.

When I tried to setup Kafka, I faced few issues on the way. I have documented them for reference to learn. This should also help others if they face something similar: Troubleshoot: Kafka setup on Windows.

One should not encounter any issues with below shared files and the steps/commands shared above.

Download entire modified setup files for Windows from here: https://github.com/sandeep-mewara/kafka-demo-windows

hurray

References:
https://kafka.apache.org
https://cwiki.apache.org/confluence/display/KAFKA
https://docs.confluent.io/2.0.0/clients/consumer.html