Agentic AI for Existing Codebases: A Practical Path to Getting Started

In the current engineering landscape, there is an unrelenting pressure to chase the “new”. Our LinkedIn feeds are dominated by AI-native learnings, startups and autonomous agents building entire applications from a single prompt in days. For many of us, this creates a strange disconnect.

guide-agentic-ai-existing-codebase


Most engineers aren’t working on greenfield AI experiments. They are responsible for systems that have been running for five, ten or even fifteen years. These are the stable, revenue-generating engines that form the backbone of successful businesses. They are battle-tested, high-stakes and complex.

If you are maintaining one of these systems, it is easy to assume the Agentic AI Wave isn’t meant for you. You might look at your unique architectural patterns or your “legacy” constraints and conclude that an AI agent simply wouldn’t understand.

I’d offer a different perspective: These tools are most transformative in the systems you already understand deeply. You haven’t missed the wave instead you are simply waiting for the right entry point.

From Manual Assistance to Actual Leverage

You might not have integrated AI into your workflow yet. Many teams have already begun doing so and those who have started likely use it for tactical tasks: explaining an obscure regex, generating a unit test for a utility function or writing a quick bash script.

This is a significant step forward, but it remains manual and reactive. Using AI this way is like hiring a brilliant senior consultant but refusing to give them a badge, documentation or context. You spend half your mental energy explaining the “why” before they can even start on the “how”.

When you attempt to move toward Agentic AI – you allow an agent to navigate your repository and suggest multi-file changes. This lack of context becomes a technical liability. Without a “Project Constitution”, the agent is forced to make guesses. Usually, it will:

  • Default to modern “generic” patterns that are incompatible with your specific tech stack.
  • Miss hidden architectural constraints decided years ago for specific performance or security reasons.
  • Suggest “best practice” refactors that look correct in isolation but break your production logic.

The result isn’t just a failed task but it’s wasted time and unnecessary token burn.

The Missing Piece: Contextual Onboarding

Agentic AI doesn’t fail because it lacks power. It fails because it lacks context. Much of your system’s “source of truth” doesn’t actually live in the code. It lives in your head, in tribal memory, in wikis or buried in old Jira or PR descriptions.

The goal isn’t to “teach” the AI everything. It is to provide a minimalist, structured map that allows the agent to operate safely within your boundaries.

The same idea applies to any work with structured systems of any kind like operations workflows, data pipelines, internal tools, etc. Whether it’s code, processes or documentation, the moment you define the rules clearly, the quality of output improves dramatically.

A Practical Starting Point: The claude.md

You don’t need a massive infrastructure change to begin. You can start by creating a claude.md file in your project root. This is your “Project Constitution” – a system guide. It should be precise, technical and grounded in reality.

Start simple, example claude.md:

# Project Guidelines

## Tech Stack
- Node.js 16
- Express
- MongoDB

## Rules
- Do not upgrade dependencies unless asked
- Follow the existing folder structure
- Write tests using Jest

## Notes
- This is a legacy system, avoid large refactors

That’s it. No perfection needed to start. By spending fifteen minutes defining these boundaries, you give the agent more leverage than 90% of teams currently provide. You can refine it over time.

Expanding the Framework: Skills

Once your “Constitution” is set, you can begin defining Skills via a skills.md file. While the claude.md is global, Skills are modular playbooks for recurring workflows.

For example, if you frequently ask the agent to “Add a new API endpoint” or “Migrate a component to TypeScript”, you should document the exact steps those tasks require in your specific environment. These acts as a repeatable playbooks that reduces the back-and-forth and ensures the agent follows your team’s established SOPs (Standard Operating Procedures) when needed.

A Mentor in Your Pocket: Codex-Claude

As you begin to rely more on these agents, you’ll find that “Instruction Engineering” is a skill in itself. If your agent is still going off-track, the issue is almost always an ambiguity in your instructions.

This is why I have been developing Codex-Claude. Think of it as a Linter for your Agentic Strategy. Just as a code linter catches syntax errors, Codex-Claude analyzes your claude.md and skills.md to catch “intent errors”.

The tool helps you with:

  • Automated Architectural Audit: Instantly evaluates your files against best practices and provides a weighted score across structure, specificity and completeness
  • Precision Refactoring & Compaction: Identifies ambiguity and redundancy, rewriting instructions to be more concise and context-efficient
  • Intelligent Conflict Resolution: Detects contradictions and instruction drift, ensuring rules are placed correctly – either within global rules or specialized skill files
  • Progressive Learning Loop: Turns every optimization into a learning opportunity by explaining the “why” behind changes by linking changes to official documentation

You don’t need this to get started, but it helps once you begin refining your setup for more complex tasks.

codex-claude-analysis-result


You can explore and try it out LIVE here: https://sandeep-mewara.github.io/codex-claude/

Watchouts

As you start this journey, keep these three principles in mind:

  • Be precise, not verbose: Every line is context the agent must process. Clear constraints beat long explanations
  • Use tests as safety rails: The agent provides speed. Your test suite provides safety. Never accept changes that have not passed your CI/CD baseline
  • Iterate on Instructions: If an agent fails a task, it likely misunderstood something. Treat it as a bug in your claude.md and fix the instruction

The Architect’s Path Forward

The expectation for delivery speed in our industry is fundamentally shifting. However, adopting Agentic AI isn’t about “coding faster” but it’s about reducing the mental tax of working with mature, complex systems.

You don’t need a new project or deep AI expertise to benefit from this. You just need to start small:

  1. Select one module or one feature
  2. Draft a simple claude.md that defines that module’s rules
  3. Run one task with an agent and observe the difference

The systems that power today’s businesses don’t need to be replaced. They just need the right leverage to move into the future.

. Sandeep Mewara Github
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow
Architects’ Evolution in the Age of Autonomous AI
Agentic AI for Beginners: My Journey into Building with Claude
The Great Inversion: Why AI is Moving from Cloud to Desktop


The Lifecycle Is the Product: AI Development Engine

Every team eventually finds itself rebuilding the same foundational setup in every project. In many organizations, this is still a manual struggle. We write “how we work” docs, define naming conventions and establish review gates that live only in our wikis. For teams already leveraging AI, this setup often exists in isolated pockets like a collection of disconnected prompts telling an assistant to “act as a PM” or “design like an architect”.

agentic-ai-development-engine


In both cases, the expertise remains trapped in silos. For those working manually, the immediate opportunity is to use AI skills to jump-start their specific tasks. But once you do, you quickly reach a plateau – while individual skills and prompts have become portable, the lifecycle around them has not.

That gap is what the Lifecycle Agent Orchestrator (LAO) tries to close. It’s a plugin for Claude Code and Cursor that uses multi-agent orchestration to ship the development process itself as a versioned, overridable artifact. Not just the individual role skills. The stitching between them.

The Problem: Skills Are Portable, Process Is Not

Current AI tools offer impressive specialized skills – performing architecture reviews or enforcing testing conventions with high precision. However, these tools still operate in isolation.

Modern software delivery doesn’t happen in a vacuum. It flows through a series of high-stakes handoffs. This is where even the most advanced teams hit a wall. Despite the promise of automation, the broader lifecycle remains stubbornly manual:

  • The Cognitive Load of Fragmentation: Engineers must still manually orchestrate which tools to run and when, creating a massive tax on context.
  • Traceability Decay (Drift): We lose the “intent” of a feature as it travels from a product ticket to a design mock and finally into the codebase.
  • Simulated Handoffs: We still rely on manual “persona-switching” – manually checking if a design works for a developer or if an architecture suits a product goal.
  • The Proof Problem: At the point of release, we still rely on assumptions rather than programmatic proof that we’ve satisfied every original requirement.

LAO moves beyond single-prompt interactions by employing multi-agent orchestration to bridge the high-stakes handoffs between roles.

The skills are the actors. The lifecycle is the director.

Step 1: Jump-Start Your Work with AI Skills

The quickest way to see value isn’t by changing your entire workflow. It’s by using individual skills to improve the work you’re already doing.

Each role in the plugin is independently usable. You don’t need the full pipeline to get an immediate win:

  • Engineers can use the Intake skill to turn a messy Jira ticket into a clean scope with real acceptance criteria.
  • PMs can use the Product Management skill to draft a structured PRD.
  • Architects can define a technical design running the Architecture skill against a requirement.
# Direct use of individual skills
Invoke code-review skill to review these changes
Invoke intake skill to extract scope from PROJ-5678

This “Step 1” approach pays back immediately.

You get a senior-level assistant for specific tasks without committing to a new workflow.

Step 2: The Orchestrator as the Director

Over time, this pattern exposes a critical bottleneck. We find ourselves questioning: Which skill comes next? Did we skip a step? Are we aligning roles or just checking boxes? Here, we hit the ceiling of isolated tools.

Once you trust the individual skills, the orchestrator stitches them together into a cohesive system.

# command inside a Claude Code or Cursor session
# Direct Jira story ingestion
/lao Work on PROJ-1234

# An ungrounded requirement
/lao Add a user notification preferences API endpoint

# a tire-kick before committing to anything
/lao-dry-run

Through multi-agent orchestration, LAO ensures that the PM, Designer and Architect personas actively review and challenge each other’s outputs.

The goal isn’t just automation – it’s coordination.

Core Internals

The following are a few key design decisions that power the LAO.

The Nine-Phase Engine

The pipeline is nine phases, structured into two halves with different personalities:

agentic-life-cycle-nine-phases

.

  1. Alignment (Phases 1–3): Product, Design and Architecture align early. They cross-review every output to catch gaps before engineers write a single line of code.
  2. Execution (Phases 4–9): Once the team establishes alignment, the system drives the project through scope, design, planning, implementation, validation and shipping.

The key shift is simple: Alignment happens once, upfront. Execution happens without rework.

Project-Specific Infrastructure

Overlays let you define how your specific system works – ensuring your project remains the domain authority. Under the hood, each phase composes up to three layers of knowledge:

LayerLives inContains
BasePluginUniversal rules for the role
OverlayProjectProject-specific patterns, stack, conventions
DomainProjectCross-cutting domain knowledge (auth, payments, compliance)

A project looks like this once it’s connected:

lao-overlay-structure


If a project already has its architecture docs scattered across docs/, there’s no need to move anything. A lao.config.yaml at the project root maps existing files into the engine:

project_name: my-app
languages: [python, react]

overlays:
  architecture: docs/architecture/standards.md
  coding-standards: .cursor/rules/coding.md

domain:
  - docs/domain/*.md
  - src/payments/DESIGN.md

extra_roles:
  compliance-review: tools/compliance/SKILL.md

There are two discovery paths – the convention directory or the config file. If both exist, the config file wins because project-specific overlays take priority.

This is the project respecting itself as the domain authority.

Preview, Then Execute

Every run begins in simulation – a preview of the nine-phase pipeline that writes no files, creates no branches and posts no Jira comments. The orchestrator walks through Phases 1–6, simulating execution to produce realistic PhaseOutput objects and checkpoints for your iteration. It then summarizes Phases 7–9 as projected outcomes, as these require real code execution.

When you’re ready, you say proceed and the pipeline replays – but with the preview’s decisions carried forward instead of regenerated:

agentic-lc-skills-workflow


The system eliminates both upfront cost and the risk of committing to a flawed plan.

Acceptance Criteria, Tracked Across Phases

The system captures acceptance criteria during Intake and tracks them through to Validation, where you must prove each one with recorded evidence to unlock the “Ship” gate. The CLI renders this data as text today, but a dashboard could render it visually tomorrow without requiring any changes to the engine. Every phase emits a PhaseOutput – a structured object with a defined schema.

--- Phase: Tech Design (Phase 5 of 9) ---
Status: Needs Approval

SUMMARY:
  Add rate limiting middleware to API gateway.
  No new dependencies, config-driven thresholds.

ARTIFACTS:
  - [design_doc] docs/design/rate-limiting.md

ACCEPTANCE CRITERIA (tracked):
  AC1: Rate limit of 100 req/min/user ...... pending
  AC2: Returns 429 with retry header ....... pending
  AC3: Configurable per environment ........ pending

→ Approve to proceed to Plan or request changes.

No claims without fresh proof – that’s the whole point of the validation gate.

Multi-Language, Without a Fork Per Language

Four skills need to know what language they’re looking at: coding-standardstesting-conventionscode-review and security. Each has a universal base and a language pack for the specifics:

lao-multi-lang-support


Currently, plugin supports Python, Java, C# and React. Detection runs once at pipeline start: if lao.config.yaml lists languages, use them, otherwise scan for pyproject.tomlpom.xml*.csprojpackage.json with a React dep and collect every match. A full-stack repo auto-detects as [python, react] and both packs get loaded. The agent applies each to the right file types.

Adding a new language – Go, Rust, anything – means creating a references/<language>/directory in those four skills with the expected files, plus a couple of lines in detection and validation scripts.

No change to the universal base. That separation is worth preserving.

Role vs. Workflow Split

The design deliberately separates Phases 1–5 (Role-based) from Phases 6–9 (Execution-based).

Phases 1–5 (Roles): These phases use individual skill files (PM, XD, Architecture, Intake) through multi-agent orchestration because judgment varies by project. A fintech audit requires different logic than a game engine pipeline, so project overlays merge with these base skills to provide local context.

Phases 6–9 (Workflows): These phases power the orchestrator’s core engine (TDD, validation, shipping) and maintain tight coupling for continuity. Unlike roles, workflows use substitutions. If you override a workflow, such as swapping TDD for BDD, the new logic replaces the built-in engine entirely rather than layering on top of it.

# Override the workflow for a single phase
workflows:
  
  # BDD instead of TDD
  implement: docs/workflows/our-bdd-process.md

  # custom release flow
  ship: docs/workflows/our-release-process.md    

Separate judgment from execution to protect flexible strategy without sacrificing delivery.

When to Use This and When Not

The Sweet Spot: Use this if you pair Claude Code or Cursor with Jira/PRD-driven intake. It excels for teams that front-load design and track ACs to the finish line. The engine treats multi-language and monorepos as first-class citizens, using config-based discovery to navigate complex structures.

The Breaking Point: Avoid this for ad-hoc work lacking tickets or defined ACs. The fit weakens if your “ship” phase involves unmodeled complexity – like mobile store submissions or if you require unattended, autonomous execution. The plugin is designed as a human-in-the-loop engine. It doesn’t chase full autonomy – yet.

What Changes When You Adopt This?

The most immediate change is practical: Your development process leaves the wiki and enters your repository as a versioned artifact.

But adoption doesn’t have to be a cliff. You start by using individual skills (single agent) to improve local tasks. As you build trust, you let the orchestrator (multi-agent) handle the parts that are hardest to do manually – the handoffs, the alignment and the validation.

Over time, the shift becomes structural:

  • Handoffs become explicit gates
  • Requirements become traceable
  • Validation becomes evidence-driven
  • The lifecycle becomes consistent

I expect the next iteration of the tool to automate this entire flow. For now, we must build it by hand – or, more precisely, install it.

Closing Thought

This isn’t about replacing how teams work. It’s about making how they work explicit and reliable.

The lifecycle stops being something you document. It becomes something you execute and once that happens, it’s no longer just process. 

It’s part of the product.

. Sandeep Mewara Github
Tech Explore
Trend


Repository & Contribution

The Lifecycle Multi-Agent Orchestrator is available as an open-source project. I encourage you to explore the repository, use the individual skills to jump-start your own work and contribute to the evolution of portable development engine.

In practice, this kind of artifact only gets better when it’s applied across different projects and constraints.


. Architects’ Evolution in the Age of Autonomous AI
Machine Learning workflow
Agentic AI for Beginners: My Journey into Building with Claude

Architects’ Evolution in the Age of Autonomous AI

Lately, I’ve been watching the “3X World” move from a concept to a daily reality. In a recent project, AI allowed me to iterate through architectural options and tech stacks in days, exploring directions that would have been far too time-consuming to even consider a few years ago.

architect-ai-age-evolution


It’s a meaningful leap in productivity, but it also highlights a subtle gap. While the machine can optimize for the present with incredible speed, it doesn’t inherently account for longer-term consequences. It can give us a strong version of “today”, but it’s still on us to ensure we’re building for “tomorrow”.

That shift is what stands out to me. As the “grind” of production begins to fade, a more critical responsibility seems to be taking its place – what I’d describe as system-level judgment. Our role is moving from primarily designing and implementing components to being accountable for the integrity of the overall system.

Below are my thoughts on how the Architect’s role is evolving in this new era of autonomous AI and agentic automated stacks.

1. The 2026 Tipping Point: Breaking the “Model Collapse”

I believe we hit a documented wall in early 2026. Data shows that nearly 50% of the world’s software code is now AI-generated (Netcorp, 2026). This has triggered what researchers call “Model Collapse” – a degenerative loop where AI begins learning from its own average, synthetic outputs rather than high-quality human intent (IBM, 2026).

From my perspective, our role is no longer to just “produce” content. If we blindly follow AI, we aren’t just being efficient but also contributing to a loop of mediocrity. I see our new job as being the “Circuit Breaker” – the human who injects original, context-rich intelligence that the machine simply cannot generate on its own.

2. The New Blueprint: Governing the AI-First Stack

I believe the “Blueprint” has fundamentally changed. We are no longer just looking at isolated code repositories but are designing Layered Enterprise Systems. A typical architecture today is a sophisticated application layer that combines:

  • Orchestration & Agents: Coordinating complex workflows.
  • Knowledge Retrieval (RAG): Connecting models to vector databases and document stores.
  • Guardrails & Observability: Enforcing policy and monitoring system health.
architect-blueprint-new


When I look at this stack, I don’t just see a technical diagram. I see a new mandate for the Architect. We must be the ones to define the governance of these layers. Without our oversight, the “Orchestration” lacks logic and the “Knowledge Retrieval” becomes a graveyard of synthetic data.

3. The Divergent Advantage: Why the “Winner” is Augmented

In the past, we were limited by “Time-to-Sketch”. Today, I believe the “Winner” is the Architect who uses AI as an Iteration Engine to manage risk and explore scale.

  • Exploration at Scale: We can now test multiple different structural tech-stacks in less than a week. I don’t see this replacing our creativity, instead I see it liberating it. We can finally ask “What if?” without the fear of wasting a week of production time.


  • The Justified “Rule-Break”: I think about this like a leader looking at a team’s calendar. An AI might see a one-hour team lunch as a 15% drop in productivity and suggest shortening it. But a human leader knows that those lunch discussions help connect lead developers with others and sometimes they even end up solving the most pressing issues through informal conversation. The AI optimizes for output but I believe our value lies in optimizing for the environment that creates the output.

    ai-data-to-architect-intent

    Thus, while AI can handle 70% of the “grind”, it inevitably hits a ceiling where logic meets human reality. Further, in my experience, a junior engineer using AI can only optimize for Correctness, but only an architect can optimize for Meaning.

4. The Technical Translator and Context Provider

I’ve always felt that architecture is a bridge between logic and emotion. While a business leader owns the “Why” of the profit, I see the Architect as the Technical Translator.

architect-meaning-ai


AI can generate a “perfect” plan, but it cannot explain the trade-offs to a concerned stakeholder or negotiate the “Unspoken Brief” – the fears and desires of a community that never make it into a data prompt. Architects are the “Context Provider” who provides the connective tissue that links today’s prompt to a 2031 expansion, ensuring the system doesn’t just work, but scales.

5. The Guardrail Mandate: Catching the 1% Hallucination

I’ve come to see AI as a “Probability Machine”, not a “Judgment Machine”. It designs for the 99% most likely scenarios, often missing the 1% edge-cases that could lead to disaster.

  • The “Technically Legal” Trap: I think of it like a tax professional I spoke with recently. An AI can optimize a return to save a client $10,000 using a cold, logical loophole. It’s “correct” data. But the human professional says, “If we do this, we’ll trigger a three-year audit that will cost $50,000 in fees.” The AI saw a win but the professional saw a systemic risk.


  • The Technical Debt Trap: AI “dumps” 200 lines of code in seconds, creating a Reviewer’s Paradox. Under pressure to match machine speed, I’ve seen engineers fall into “Blind Acceptance“, assuming professional-looking code is logically sound. In 2026, I believe this is our greatest risk and is the leading cause of “AI Technical Debt” (Sonar, 2026).


  • Severity-Driven Review: We don’t audit every line. In our workflow, we focus our “scar tissue”, our experience on the High-Risk Nodes like accuracy, security, resiliancy and scalability.

6. Professional Integrity: The Non-Transferable Seal

The global consensus in 2026 is firm: You cannot sue an algorithm. Under the EU Product Liability Directive, liability follows control. If you deploy an AI system, you bear the responsibility for its “hallucinations”.

architect-ai-approval-seal


While a company may carry the financial responsibility, I still feel that the professional integrity largely rests with the individuals. When I approve a project, it feels less like a formality and more like a personal assurance that the solution, whether shaped by AI or otherwise, is robust. Ultimately, our professional reputation plays an important role in bridging the gap between a digital design and a product that is reliable, secure and compliant (NCARB, 2026).

Summary: My View on the Evolutionary Roadmap

DimensionJunior / AI
(Producer)
Technical Architect
(Gatekeeper)
FocusTask Execution: “How do I design this?”System Integrity: “Why are we doing this?”
GoalOptimization: The most efficient path.Curation: The most meaningful path.
System ViewComponent-level focus.Full-Stack Governance.
Risk RoleIdentifying Known Errors.PredictingUnknown Consequences.
Key ValueSpeed and Accuracy.Judgment and Liability.
AuthorityOperates the Tools.Signs the Professional Guarantee.

Final Thoughts: The Promotion of the Profession

In my view, Architects aren’t being replaced. I believe, we are being elevated to a higher level of responsibility. What I think of as a “3X World” – where AI significantly accelerates execution and reduces the grind of building, but seems to amplify the weight of our decisions.

architect-ai-intelligence-gatekeeper


I see us moving from being System Implementers to being Intelligence Gatekeepers. I’m not afraid of the machine’s speed – I’m afraid of the moment we stop asking “Why?”. In a world of infinite, automated options, I believe the person who can choose correctly is the only one who truly matters.

“The AI provides the options, the Architect creates meaning, make decisions and define guardrails”.

. Sandeep Mewara Github
Mastering the SKILL.md File in Agentic AI: A Complete Guide
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow
Agentic AI for Beginners: My Journey into Building with Claude
The Great Inversion: Why AI is Moving from Cloud to Desktop

Mastering the SKILL.md File in Agentic AI: A Complete Guide

In modern Agentic AI architectures, the primary engineering challenge is no longer generating language, but bridging the gap between conversational intent and reliable, repeatable and unambiguous execution. To achieve this, we must treat agent capabilities not as conversational shortcuts, but as well-defined engineering assets.

skill-md-agentic-ai.png


This requires a standardized contract for capability execution. That’s where SKILL.md comes in. A formal, machine-parsable definition file that acts as a Standard Interoperability Definition (SID) contract for systematic task execution within an agentic framework.

In this blog, I’ll dive deep into SKILL.md and share how it serves as a single source of truth for both conceptual planning (roles) and procedural execution (workflows) that power an automated, engineering-grade SDLC.

The Architectural Blueprint: The SKILL.md

SKILL.md is structured as an engineering specification, designed for zero-ambiguity parsing by an LLM like Claude. It defines the contract for interoperability, forcing teams to move from conversational requests to precise capability definitions.

Anatomy of an Engineering Contract

The specification consists of five required metadata fields that are immutable and machine-parsable:

  • Name: An immutable, unique, system-wide identifier for the capability (e.g., internal-token-manager-v1exec-raise-github-pr-v1, or sdlc-pm-v1). This is the system’s handle for the skill.


  • Description: Critically, this is not a summary. It is the definitive Trigger Event Definition. It must be written from the perspective of an event, user query or internal signal that activates this capability, allowing the framework to perform accurate skill matching. Example: “Triggers automatically after a successful code analysis scan…”


  • Commands: A list of executable operations or prompts defined by the contract. For procedural skills, these map to API endpoints or internal function calls. For conceptual skills, these map to defined prompt sequences. Example: get-linter-report(timestamp) or refresh-token(service_id).


  • Constraints: A critical safety and resource management section. It defines the limits, rules and error conditions of the contract. Example: “Internal authentication tokens must expire after 1 hour.”


  • Examples: These are not suggestions but are the gold standard of Expected Behavior. They define the intended output for specific input scenarios, providing the LLM with a definitive blueprint for successful execution and reducing non-deterministic output.
# Code Snippet 1: Sample Procedural SKILL.md (Raise GitHub PR)
---
# REQUIRED METADATA FIELDS (SID CONTRACT)

name: exec-raise-github-pr-v1
description: Triggers automatically after a successful 'exec-linter-code-analyzer-v1' scan or upon user request to systematically raise a new pull request on GitHub for reviewed code.
commands:
  - create-pr(repository_url, head_branch, base_branch, title, body)
constraints:
  - Must use a valid GitHub API token with 'repo' scope.
  - Head branch must differ from the base branch.
---

### Expected Behavior (Examples)

When this skill is matched against a standard JavaScript repository:
  - Input: create-pr("https://github.com/org/repo.git", "feat/new-api", "main", "Feat: Add API v2", "This PR introduces...")
  - Execution: Loads 'scripts/create_pr.py'.
  - Output: New PR URL.

Directory Structure & Progressive Disclosure

The SKILL.md is packaged within a defined directory structure, ensuring all supporting assets are decoupled and version-controlled alongside the specification.

skill-folder-structure.jpg

.Sandeep Mewara Github

  • 📄 SKILL.md (The only required asset, containing the definitions and contract).
  • 📁 scripts/ (Optional: Decoupled logic – Python, Bash, Node.js, etc. The implementation details of the contract).
  • 📁 references/ (Optional: Docs, checklists, design patterns or standards the skill must adhere to).
  • 📁 assets/ (Optional: Templates or sample data).

This decoupled architecture enables the Progressive Disclosure Pattern, which is critical for system efficiency and managing token constraints. A high-performance agentic system should not load every asset for every skill simultaneously. Progressive disclosure ensures assets are loaded only when necessary.

skill-md-activation-flow.jpg


Agents don’t load everything at once. They discover and expand context only when needed.

Architecting the Automated SDLC

The standardization offered by SKILL.md allows us to architect and separate the dynamic pillars of an automated SDLC, managing all capabilities via this single specification. In a professional lifecycle, conceptual setup (Defining Roles) always precedes procedural execution (Executing Workflows).

Conceptual Role-Based Skills: Defining the Contract for a Persona (Planning & Setup)

To initiate any SDLC phase (e.g., Requirements), we must first define the conceptual frameworks, knowledge bases and systematic planning workflows of specific roles that help organise content by domain (behaviour-driven). We apply the identical SKILL.md standard to define a persona’s “mindset”.

  • WHAT: SKILL.md definitions for Product Manager Persona or Lead Developer Persona.


  • APPLICATION: During the “Requirements” and “Design” phases of the SDLC.


  • ARCHITECTURAL FLOW: During planning, you activate the Product Manager Persona (Code Snippet 2). Claude adopts this mindset and leverages knowledge references (e.g., Agile standards) and the command contract (draft-prd(user_stories)) to provide focused, high-quality requirements.
Code Snippet 2: Sample Conceptual SKILL.md (Product Manager)
---
# REQUIRED METADATA FIELDS (SID CONTRACT)

name: sdlc-pm-v1
description: Triggers during project initiation to define the persona, responsibilities, knowledge base and systematic planning workflows of a senior Product Manager.
commands:
  - draft-prd(user_stories, acceptance_criteria)
  - run-feature-prioritization(prd_document)
constraints:
  - Must reference files in the optional 'references/' directory (e.g., 'references/agile-standards.md') for all Agile terminology.
---

### Expected Behavior (Examples)

When this skill is matched to a new project request:
  - Input: draft-prd(user_stories, acceptance_criteria)
  - Execution: Loads 'references/agile-standards.md' to define terminology.
  - Output: A structured PRD document based on the internal persona.

External Workflow Execution Skills: Defining the Contract for the Workflow to ‘Do’

Once the groundwork is established and the build begins, the agent’s focus shifts to user-triggered workflows (e.g., after a commit). These skills are guides that help perform specific, measurable steps in the automated pipeline, providing the user with domain-specific results (task-driven).

  • WHAT:SKILL.md definitions for exec-linter-code-analyzerexec-raise-github-pr, or jira-ticket-update.


  • APPLICATION: During the “Build,” “Test” and “Deploy” phases of the SDLC, typically automated by CI/CD events.


  • ARCHITECTURAL FLOW: After a successful code implementation event, the framework activates the exec-linter-code-analyzer-v1 (Code Snippet 3). Claude reads the inputs and expected behavior. The framework executes the decoupled logic (scripts/) to systematically create the pull request, ensuring a reliable result (the PR URL) is provided back to the user’s workflow or CI/CD pipeline.
Code Snippet 3: Sample Procedural SKILL.md (Code Analyzer Workflow)
---
# REQUIRED METADATA FIELDS (SID CONTRACT)
name: exec-linter-code-analyzer-v1
description: Triggers automatically after a code commit event to execute a static analysis and linter scan on the modified files in a specific repository, providing a systematic JSON report.
commands:
  - run-analysis(repository_url, branch)
constraints:
  - Must use a valid GitHub API token with 'repo' scope.
---

### Expected Behavior (Examples)
When this skill is matched following a code commit:
  - Input: run-analysis("https://github.com/org/repo.git", "main")
  - Execution: Loads 'scripts/run_analysis.py'.
  - Output: Linter report JSON.

Internal Agent Operational Skills: Defining the Contract for the Software to ‘Be’

To ensure system stability, the agent software itself requires precise, standardized contracts for core operational tasks (like authentication, state, error handling, api-call, etc). These skills are operational and invisible to the SDLC workflow itself. They focus on the agent’s internal robustness and platform integrity.

  • WHAT: SKILL.md definitions for internal-token-manager or agent-state-historian.


  • APPLICATION: Triggered automatically by the agent’s orchestration layer during defined lifecycle events (e.g., establishing a session state, refreshing an expired 401 token).


  • ARCHITECTURAL FLOW: When any skill requires access to a restricted API, it activates the internal-token-manager (Code Snippet 4). Claude reads the command contract (refresh-token(service_id)). The framework executes the decoupled logic (scripts/) to refresh the secure token, ensuring the agent software can authenticate without creating brittle, direct credential dependencies in the domain-level skills. This internal complexity is hidden from the user but critical for security and robustness.
Code Snippet 4: Sample Procedural SKILL.md (Token Manager)
---
# REQUIRED METADATA FIELDS (SID CONTRACT)
name: internal-token-manager-v1
description: An internal operational skill that triggers throughout a workflow when the agent detects it requires a secure token to authenticate against an external service (e.g., GitHub, Slack, Splunk).
commands:
  - refresh-token(service_id)
constraints:
  - Must use a valid agent credential secret (e.g., 'agent_platform_secret').
  - Tokens must expire after 1 hour.
---

### Expected Behavior (Examples)

When this skill is matched when a GitHub operation requires auth:
  - Input: refresh-token("github_api")
  - Execution: Loads 'scripts/refresh_token.py'.
  - Output: New OAuth token JSON.

The Boundary of Autonomy and the Expertise Gap

While standardizing capabilities via SKILL.md is essential, I believe it is critical for architects to also define where SKILL.md is not the right tool. My own perspective, based on recent project implementation, is that a common architectural failure is expecting SKILL.md to easily encode true Domain Expertise and Heuristic Judgment.

Offloading Heuristics vs. Offloading Wisdom

A well-defined SKILL.md is designed to be precise, measurable and standardized. It excels at offloading common known items, standard checklists and systematic patterns into reliable workflows (as seen in our Code Snippets 3 & 4). In my recent project, this precision made the skills function as excellent fixed checklists, significantly reducing operational ambiguity.

This same precision, however, means it can appear only as a checklist. A procedural skill like exec-linter-code-analyzer can identify a syntax error based on a rule, but I found it often lacked the domain wisdom to understand the conceptual design decision that led to that error.

Assisting Expertise, Not Replacing It

Based on the experience so far, I believe that you cannot easily encode a senior engineer’s years of nuanced design thinking into a SKILL.md description. The true architectural value of a standardized specification is that it offloads the reliable execution complexity, allowing the Human Expert (or a high-level Agentic Persona) to focus entirely on core domain and design reasoning.

For now, I believe following a model where three distinct pillars of knowledge are defined will work out:

  1. Systematic Workflows (Procedural Skills): Handled perfectly by SKILL.md. (The “What to Do”)
  2. Conceptual Frameworks (Persona Mindsets): Setup by SKILL.md. (How Claude “Thinks”)
  3. Domain Wisdom & Design Reasoning: Passed as the problem context in the main prompt. (Why Claude “Decides”)

Engineering Best Practices for SKILL.md Mastery

Achieving systematic capability definition requires adhering to these foundational best practices:

  1. Strict Decoupling: Never place the execution logic (e.g., Python code) directly within the SKILL.md file. The SKILL.md is the specification & the scripts/ directory is the implementation.


  2. Immutability: Once a skill is deployed, treat its metadata (Name, Description, Commands) as immutable. Any significant change requires a new version (e.g., exec-raise-github-pr-v2). Brittleness often stems from changing definitions in place.


  3. Description as a Trigger: Never write a summary description (e.g., “This skill runs a linter”). It must be written as a trigger definition (e.g., “Triggers automatically after a context save event…”). Skill matching depends entirely on this accuracy.


  4. Token Economy: Adhere to strict size constraints: < 500 lines and < 5k tokens for the SKILL.md. The Progressive Disclosure pattern will handle heavier assets, keeping the SID itself focused and parseable.


  5. Git-Managed Context: Treat SKILL.md files as code. They must be version-controlled in Git, promoting discoverability, reuse and providing a traceable history of how capabilities have evolved throughout the lifecycle.

Final Thought: A Standard for Scaling Autonomy

By adopting the SKILL.md specification, we move from fuzzy conversational AI to a structured engineering discipline, where all agent capabilities, whether they are internal operational requirements, external user workflows or conceptual roles framework – all are defined by precise, version-controlled contracts.

This foundation standardizes reliable execution complexity, not only making your automated SDLC predictable and robust but also ensuring that precious domain expertise remains focused on main design decisions, not common patterns. Mastering the SKILL.md standard is the definitive, interoperable foundation for building scalable, maintainable and engineering-grade AgenticAI architectures.

. Sandeep Mewara Github
News Update
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow
Agentic AI for Beginners: My Journey into Building with Claude
The Great Inversion: Why AI is Moving from Cloud to Desktop

[DOWNLOADskill.md Quick Reference Guide]

.