There’s been a lot of buzz around Agentic AI lately, especially around how powerful Claude can be when used beyond simple prompting. Naturally, I got curious.
As an architect, I wanted to understand what “agentic” really means in practice. What changes when we move from prompts to agents? And what does that mean for how we design systems? As I started exploring, it became clear this isn’t just about smarter chatbots, it’s something more.
From Prompts to Agents: What’s the Difference?
Before diving in, let’s distinguish between Generative AI and Agentic AI.
- Generative AI (Reactive) – Deals with Prompts, where we provide an input and the model provides a one-time response. We are the orchestrator.
- Agentic AI (Proactive) – Deals with Agents, where we provide a goal and the model determines the steps, uses tools and iterates until the goal is met. The model is the orchestrator.
“Agentic” means moving from chatting to delegating. It’s the difference between asking for instructions and having the task completed for you, like getting a recipe vs hiring a chef or asking for directions vs being driven there.
The “Agentic” Starter Pack
I started with the basics to see how Claude handles the “plumbing” of a real-world project. My exploration focused on three core hyped items:
- Agentic Implementation: I moved away from “one-off” prompts and built loops where Claude runs a
Plan -> Execute -> Test -> Fixcycle autonomously. - Model Context Protocol (MCP): I hooked Claude up to my local filesystem, Slack & GitHub. This was to see how the agent “reaches out” and queries the data it needs directly.
- Role-Based Division: I experimented with “Agent Teams” by giving different Claude instances specific roles: one as the Architect to handle planning and another as the Developer to handle implementation. Further, tried to put multiple hats for the clarity of work distribution and decision making for the agent.
My Learning Project: Endpoint Watch Agent (EWA)
The goal of this project was to build a hands-on learning kit for agentic systems. Endpoint Watch Agent (EWA) is a Python-based agent that continuously monitors configurable endpoints (websites or APIs). When an endpoint is down or unhealthy, the agent autonomously evaluates the incident, avoids duplicate alerts, creates a ticket and sends a contextual Slack notification.
Flow Diagram Plan
Structuring the Workflow
Starting from nothing, I worked with Claude itself to set up the structure and segregation of components, defining single responsibility. To keep things simple, created a single agent (Orchestrator) that runs one sequential loop: Check Endpoint 1 → Decide → Act → Check Endpoint 2 → Decide → Act...
The PolicyEngine is not an agent but a pure function called by the agent. The tools are interfaces that the agent dispatches, while the MCP servers are external services.
Explore or build on the Project available here: [Github Link]
What I Learnt: The “Pro” Framework
The real breakthrough wasn’t the model itself, but how I structured the project to guide it. Below project structure can be considered a good architectural template as a baseline start for any agentic development. The architectural pattern supports a clean separation of concerns, where we can add new tools, policy rules or tests without needing to restructure the entire system.
As a production-ready baseline though, it has gaps: no tests, single-threaded endpoint checking, no metrics, no graceful shutdown. These are solvable without rethinking the architecture, but they’d need to be added before shipping anything real.
I found that following four pillars are essential for any agentic workflow:
CLAUDE.md (The Project Brain)
This file lives in the root of your repo as the AI’s operating manual. It tells Claude agent who it is and how it should behave in this specific codebase. Thus, it helps to start with shared context instead of inferring everything from scratch each session.
# Project Context: Endpoint Watch Agent (EWA)
## Role & Mission
You are the **EWA Specialist**. Your goal is to maintain a high-availability monitoring system. You prioritize accuracy in incident detection and clarity in Slack notifications.
## Tech Stack
- **Runtime:** Python 3.12
- **Logic:** Policy-based reasoning (PolicyEngine)
- **Integrations:** Slack (Alerts), Jira (Tickets), GitHub (MCP)
## Architecture Rules
- **Separation of Concerns:** Keep tools in '/tools', logic in '/engine'.
- **Async First:** Use 'asyncio' for all network-bound endpoint checks.
- **No Deletions:** Never delete incident logs, only archive or update status.
## Dev Commands
- **Run:** 'python main.py'
CLAUDE.md is the interface between the human who designed the system and the AI that extends it. It’s not a documentation for users of the tool instead is a documentation for the next builder, human or AI.
SKILLS.md (The Capability Manual)
While CLAUDE.md is about the project, SKILLS.md is about what the agent is capable of doing. It provides pre-verified “recipes” for complex tasks, stopping the agent from hallucinating its own (often broken) logic.
# Agent Skills
## Skill: Incident Evaluation
- **When:** An endpoint returns a non-200 status.
- **Action:**
1. Check 'storage/incidents.json' for active tickets.
2. If new, invoke the 'JiraTool' to create a "Critical" task.
## Skill: Slack Formatting
- **Constraint:** Always include the Status Code, Response Time, and the "Runbook Link" from the configuration file.
- **Tone:** Professional and urgent.
These are the procedural instructions or documentation that teach the agent how to use a tool effectively in a specific context.
“Plan, then Execute” Workflow
I stopped asking Claude to “just do it”. Instead, I enforced a mandatory two-step gate:
- The Plan: Claude must output a step-by-step technical plan first.
- The Approval: I review the plan for architectural alignment.
- The Execution: Only after approval does the agent start writing code. This eliminates 90% of the “rabbit holes” agents often fall into.
Verification Criteria
Never ask an agent to “fix a bug”. Instead, ask it to “Fix the bug and provide the specific CLI command or test case to verify the fix”. It seems an agent that knows it has to prove its work is significantly more accurate and less likely to hallucinate a “done” state!
What I Learnt: Behavioral System Design
EWA is built like a Claude agent where it has a brain (orchestrator), reasoning (policy engine), senses (endpoint checker), hands (Jira + Slack tools) and memory (incident store).
Thus, moving beyond simple monitoring, this system creates a truly agentic closed loop: it observes, reasons, decides, acts and remembers, closing the gap between detection and autonomous resolution. This is what differentiates a single prompt from a system that operates.
If designed properly, the orchestrator never does anything directly. It asks tools to observe, asks the policy engine to reason, then dispatches to tools based on the decision. Every component has one job and knows nothing about the others.
Thus, with agentic systems, we start to define goals, shape decision boundaries, orchestrate tools and design workflows. The unit of design has moved from “What does this function do?” to “How does this system behave over time?”. This is very different and is a significant mindset shift.
What I Learnt: The Operational Reality
This is where Agentic AI gets interesting and at the same time risky. They are not just capable but are also more complex to reason about.
What’s Exciting (The Wins)
- Self-Healing Workflows: Automation of operational tasks where systems can adapt to minor changes instead of simply breaking
- Engineering Velocity: Drastic reduction in manual intervention for complex, multi-file refactors
What’s Hard (The Risks)
- Observability & Non-Linear Debugging: Traditional logs don’t help much when an agent enters a logic loop. It becomes difficult to answer: “Why did the agent choose this specific tool at this specific time?” Tracking these non-linear flows requires a completely different observability stack.
- Guardrails & Cost: Without structural “circuit breakers”, agents can enter recursive loops that transform a technical logic error into a financial one. In an agentic world, unguided autonomy doesn’t just crash a service, it can drain token budgets in minutes.
What I Learnt: The Shift to “Specification of Judgment”
The biggest realization was the shift in our roles: The engineer’s job is becoming the specification of judgment.
We are moving away from writing line-by-line code and towards translating domain knowledge (e.g., Don’t auto-close the Jira ticket on recovery instead leave that to humans), operational experience (e.g., What if the MCP server subprocess hangs instead of failing?) and trust calibrations (e.g., Trust the agent to send Slack alerts without human review: yes) into rules the agent can follow.
Claude handles the execution, but its success depends on our ability to articulate why a system should behave a certain way, not just what it should do. This requires architectural experience to anticipate what could go wrong and the clarity to express those constraints precisely.
Final Thoughts: The Evolution of How We Build
It’s only a matter of time. While the technical risks are real today, the pace of advancement is blistering. We are witnessing a total paradigm shift: we aren’t just writing code anymore, instead we are managing a digital workforce.
For architects, this means rethinking system boundaries. For developers, it means thinking in workflows. I am excited to adapt! This isn’t the evolution of standard coding but the evolution of how we build.
.
Explore or build on the Project available here: [Github Link]



