Mastering the SKILL.md File in Agentic AI: A Complete Guide

In modern Agentic AI architectures, the primary engineering challenge is no longer generating language, but bridging the gap between conversational intent and reliable, repeatable and unambiguous execution. To achieve this, we must treat agent capabilities not as conversational shortcuts, but as well-defined engineering assets.

skill-md-agentic-ai.png


This requires a standardized contract for capability execution. That’s where SKILL.md comes in. A formal, machine-parsable definition file that acts as a Standard Interoperability Definition (SID) contract for systematic task execution within an agentic framework.

In this blog, I’ll dive deep into SKILL.md and share how it serves as a single source of truth for both conceptual planning (roles) and procedural execution (workflows) that power an automated, engineering-grade SDLC.

The Architectural Blueprint: The SKILL.md

SKILL.md is structured as an engineering specification, designed for zero-ambiguity parsing by an LLM like Claude. It defines the contract for interoperability, forcing teams to move from conversational requests to precise capability definitions.

Anatomy of an Engineering Contract

The specification consists of five required metadata fields that are immutable and machine-parsable:

  • Name: An immutable, unique, system-wide identifier for the capability (e.g., internal-token-manager-v1exec-raise-github-pr-v1, or sdlc-pm-v1). This is the system’s handle for the skill.


  • Description: Critically, this is not a summary. It is the definitive Trigger Event Definition. It must be written from the perspective of an event, user query or internal signal that activates this capability, allowing the framework to perform accurate skill matching. Example: “Triggers automatically after a successful code analysis scan…”


  • Commands: A list of executable operations or prompts defined by the contract. For procedural skills, these map to API endpoints or internal function calls. For conceptual skills, these map to defined prompt sequences. Example: get-linter-report(timestamp) or refresh-token(service_id).


  • Constraints: A critical safety and resource management section. It defines the limits, rules and error conditions of the contract. Example: “Internal authentication tokens must expire after 1 hour.”


  • Examples: These are not suggestions but are the gold standard of Expected Behavior. They define the intended output for specific input scenarios, providing the LLM with a definitive blueprint for successful execution and reducing non-deterministic output.
# Code Snippet 1: Sample Procedural SKILL.md (Raise GitHub PR)
---
# REQUIRED METADATA FIELDS (SID CONTRACT)

name: exec-raise-github-pr-v1
description: Triggers automatically after a successful 'exec-linter-code-analyzer-v1' scan or upon user request to systematically raise a new pull request on GitHub for reviewed code.
commands:
  - create-pr(repository_url, head_branch, base_branch, title, body)
constraints:
  - Must use a valid GitHub API token with 'repo' scope.
  - Head branch must differ from the base branch.
---

### Expected Behavior (Examples)

When this skill is matched against a standard JavaScript repository:
  - Input: create-pr("https://github.com/org/repo.git", "feat/new-api", "main", "Feat: Add API v2", "This PR introduces...")
  - Execution: Loads 'scripts/create_pr.py'.
  - Output: New PR URL.

Directory Structure & Progressive Disclosure

The SKILL.md is packaged within a defined directory structure, ensuring all supporting assets are decoupled and version-controlled alongside the specification.

skill-folder-structure.jpg

.Sandeep Mewara Github

  • 📄 SKILL.md (The only required asset, containing the definitions and contract).
  • 📁 scripts/ (Optional: Decoupled logic – Python, Bash, Node.js, etc. The implementation details of the contract).
  • 📁 references/ (Optional: Docs, checklists, design patterns or standards the skill must adhere to).
  • 📁 assets/ (Optional: Templates or sample data).

This decoupled architecture enables the Progressive Disclosure Pattern, which is critical for system efficiency and managing token constraints. A high-performance agentic system should not load every asset for every skill simultaneously. Progressive disclosure ensures assets are loaded only when necessary.

skill-md-activation-flow.jpg


Agents don’t load everything at once. They discover and expand context only when needed.

Architecting the Automated SDLC

The standardization offered by SKILL.md allows us to architect and separate the dynamic pillars of an automated SDLC, managing all capabilities via this single specification. In a professional lifecycle, conceptual setup (Defining Roles) always precedes procedural execution (Executing Workflows).

Conceptual Role-Based Skills: Defining the Contract for a Persona (Planning & Setup)

To initiate any SDLC phase (e.g., Requirements), we must first define the conceptual frameworks, knowledge bases and systematic planning workflows of specific roles that help organise content by domain (behaviour-driven). We apply the identical SKILL.md standard to define a persona’s “mindset”.

  • WHAT: SKILL.md definitions for Product Manager Persona or Lead Developer Persona.


  • APPLICATION: During the “Requirements” and “Design” phases of the SDLC.


  • ARCHITECTURAL FLOW: During planning, you activate the Product Manager Persona (Code Snippet 2). Claude adopts this mindset and leverages knowledge references (e.g., Agile standards) and the command contract (draft-prd(user_stories)) to provide focused, high-quality requirements.
Code Snippet 2: Sample Conceptual SKILL.md (Product Manager)
---
# REQUIRED METADATA FIELDS (SID CONTRACT)

name: sdlc-pm-v1
description: Triggers during project initiation to define the persona, responsibilities, knowledge base and systematic planning workflows of a senior Product Manager.
commands:
  - draft-prd(user_stories, acceptance_criteria)
  - run-feature-prioritization(prd_document)
constraints:
  - Must reference files in the optional 'references/' directory (e.g., 'references/agile-standards.md') for all Agile terminology.
---

### Expected Behavior (Examples)

When this skill is matched to a new project request:
  - Input: draft-prd(user_stories, acceptance_criteria)
  - Execution: Loads 'references/agile-standards.md' to define terminology.
  - Output: A structured PRD document based on the internal persona.

External Workflow Execution Skills: Defining the Contract for the Workflow to ‘Do’

Once the groundwork is established and the build begins, the agent’s focus shifts to user-triggered workflows (e.g., after a commit). These skills are guides that help perform specific, measurable steps in the automated pipeline, providing the user with domain-specific results (task-driven).

  • WHAT:SKILL.md definitions for exec-linter-code-analyzerexec-raise-github-pr, or jira-ticket-update.


  • APPLICATION: During the “Build,” “Test” and “Deploy” phases of the SDLC, typically automated by CI/CD events.


  • ARCHITECTURAL FLOW: After a successful code implementation event, the framework activates the exec-linter-code-analyzer-v1 (Code Snippet 3). Claude reads the inputs and expected behavior. The framework executes the decoupled logic (scripts/) to systematically create the pull request, ensuring a reliable result (the PR URL) is provided back to the user’s workflow or CI/CD pipeline.
Code Snippet 3: Sample Procedural SKILL.md (Code Analyzer Workflow)
---
# REQUIRED METADATA FIELDS (SID CONTRACT)
name: exec-linter-code-analyzer-v1
description: Triggers automatically after a code commit event to execute a static analysis and linter scan on the modified files in a specific repository, providing a systematic JSON report.
commands:
  - run-analysis(repository_url, branch)
constraints:
  - Must use a valid GitHub API token with 'repo' scope.
---

### Expected Behavior (Examples)
When this skill is matched following a code commit:
  - Input: run-analysis("https://github.com/org/repo.git", "main")
  - Execution: Loads 'scripts/run_analysis.py'.
  - Output: Linter report JSON.

Internal Agent Operational Skills: Defining the Contract for the Software to ‘Be’

To ensure system stability, the agent software itself requires precise, standardized contracts for core operational tasks (like authentication, state, error handling, api-call, etc). These skills are operational and invisible to the SDLC workflow itself. They focus on the agent’s internal robustness and platform integrity.

  • WHAT: SKILL.md definitions for internal-token-manager or agent-state-historian.


  • APPLICATION: Triggered automatically by the agent’s orchestration layer during defined lifecycle events (e.g., establishing a session state, refreshing an expired 401 token).


  • ARCHITECTURAL FLOW: When any skill requires access to a restricted API, it activates the internal-token-manager (Code Snippet 4). Claude reads the command contract (refresh-token(service_id)). The framework executes the decoupled logic (scripts/) to refresh the secure token, ensuring the agent software can authenticate without creating brittle, direct credential dependencies in the domain-level skills. This internal complexity is hidden from the user but critical for security and robustness.
Code Snippet 4: Sample Procedural SKILL.md (Token Manager)
---
# REQUIRED METADATA FIELDS (SID CONTRACT)
name: internal-token-manager-v1
description: An internal operational skill that triggers throughout a workflow when the agent detects it requires a secure token to authenticate against an external service (e.g., GitHub, Slack, Splunk).
commands:
  - refresh-token(service_id)
constraints:
  - Must use a valid agent credential secret (e.g., 'agent_platform_secret').
  - Tokens must expire after 1 hour.
---

### Expected Behavior (Examples)

When this skill is matched when a GitHub operation requires auth:
  - Input: refresh-token("github_api")
  - Execution: Loads 'scripts/refresh_token.py'.
  - Output: New OAuth token JSON.

The Boundary of Autonomy and the Expertise Gap

While standardizing capabilities via SKILL.md is essential, I believe it is critical for architects to also define where SKILL.md is not the right tool. My own perspective, based on recent project implementation, is that a common architectural failure is expecting SKILL.md to easily encode true Domain Expertise and Heuristic Judgment.

Offloading Heuristics vs. Offloading Wisdom

A well-defined SKILL.md is designed to be precise, measurable and standardized. It excels at offloading common known items, standard checklists and systematic patterns into reliable workflows (as seen in our Code Snippets 3 & 4). In my recent project, this precision made the skills function as excellent fixed checklists, significantly reducing operational ambiguity.

This same precision, however, means it can appear only as a checklist. A procedural skill like exec-linter-code-analyzer can identify a syntax error based on a rule, but I found it often lacked the domain wisdom to understand the conceptual design decision that led to that error.

Assisting Expertise, Not Replacing It

Based on the experience so far, I believe that you cannot easily encode a senior engineer’s years of nuanced design thinking into a SKILL.md description. The true architectural value of a standardized specification is that it offloads the reliable execution complexity, allowing the Human Expert (or a high-level Agentic Persona) to focus entirely on core domain and design reasoning.

For now, I believe following a model where three distinct pillars of knowledge are defined will work out:

  1. Systematic Workflows (Procedural Skills): Handled perfectly by SKILL.md. (The “What to Do”)
  2. Conceptual Frameworks (Persona Mindsets): Setup by SKILL.md. (How Claude “Thinks”)
  3. Domain Wisdom & Design Reasoning: Passed as the problem context in the main prompt. (Why Claude “Decides”)

Engineering Best Practices for SKILL.md Mastery

Achieving systematic capability definition requires adhering to these foundational best practices:

  1. Strict Decoupling: Never place the execution logic (e.g., Python code) directly within the SKILL.md file. The SKILL.md is the specification & the scripts/ directory is the implementation.


  2. Immutability: Once a skill is deployed, treat its metadata (Name, Description, Commands) as immutable. Any significant change requires a new version (e.g., exec-raise-github-pr-v2). Brittleness often stems from changing definitions in place.


  3. Description as a Trigger: Never write a summary description (e.g., “This skill runs a linter”). It must be written as a trigger definition (e.g., “Triggers automatically after a context save event…”). Skill matching depends entirely on this accuracy.


  4. Token Economy: Adhere to strict size constraints: < 500 lines and < 5k tokens for the SKILL.md. The Progressive Disclosure pattern will handle heavier assets, keeping the SID itself focused and parseable.


  5. Git-Managed Context: Treat SKILL.md files as code. They must be version-controlled in Git, promoting discoverability, reuse and providing a traceable history of how capabilities have evolved throughout the lifecycle.

Final Thought: A Standard for Scaling Autonomy

By adopting the SKILL.md specification, we move from fuzzy conversational AI to a structured engineering discipline, where all agent capabilities, whether they are internal operational requirements, external user workflows or conceptual roles framework – all are defined by precise, version-controlled contracts.

This foundation standardizes reliable execution complexity, not only making your automated SDLC predictable and robust but also ensuring that precious domain expertise remains focused on main design decisions, not common patterns. Mastering the SKILL.md standard is the definitive, interoperable foundation for building scalable, maintainable and engineering-grade AgenticAI architectures.

. Sandeep Mewara Github
News Update
Tech Explore
Trend
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow
Agentic AI for Beginners: My Journey into Building with Claude
The Great Inversion: Why AI is Moving from Cloud to Desktop

[DOWNLOADskill.md Quick Reference Guide]

.

Kubernetes – Evolution of application deployment

Kubernetes (K8s) is turning out as the cutting-edge of application deployment. It is becoming core to the creation and operation of modern software (few call it as modern SaaS). Thus, I planned to look into it and see what Kubernetes is and how/what application design will help adapt it in the application deployment evolution.

Kubernetes is a portable, extensible, open-source platform for automating deployment, scaling, and management of containerized applications.

History

Google originally designed and open-sourced the Kubernetes project in 2014. Kubernetes has inputs from over 15 years of Google’s experience to run production workloads at scale with best ideas and practices from the community. It is maintained by the Cloud Native Computing Foundation now. It’s current development repository is here.

First challenge …

With modern goal parameters like: recoverability, release cycle time & release frequency – applications need to be designed and deployed in a way that makes them improve year over year.

This leads to first step of breaking the monolith into microservices such that the changes and impact are compartmentalized for easy deployment and recovery.

monolith2microservice

A monolithic application puts all it’s functionality in a single process. In need of scaling, it replicates entire monolith on multiple servers. On the other hand, a microservice architecture separates out (keeps) each functionality into a separate service. Thus in case of scaling need, these services are distributed across servers as required.

Second challenge …

With multiple microservices in play, a variance of stack versions or deployment styles kicks in as trouble. Each team would have their own set of tools, versions to build the artifacts, store them and then deploy them. Thus, different applications/services can have different patterns and network topology. This in turn makes managing security and infrastructure more challenging.

This leads to the step of abstracting infrastructure out to ease maintenance and relieve from security and other infrastructure related concerns.

deployment-progression
Deployment scheme evolution
  • Traditional: Applications running on a physical server. No way to define resource boundaries for applications.
  • Virtualization: Allows to run multiple Virtual Machines (VMs) on a single physical server’s CPU. This leads to better utilization of resources and better scalability as an application can be added or updated easily. Also, if needed, applications can be isolated between different VMs to provide a level of security.
  • Containers: Like VM, it has its own filesystem, CPU, memory, process space, etc. Are environment consistent, easy to scale, portable across clouds and OS distributions. This leads to loosely coupled setup where application is totally decoupled from infrastructure and makes it easy to move towards smaller, modular microservices.

Containers are abstraction to next level. It does not matter on which OS you are on (although there could be different containers for different OS and how they work underlying), all we need is to package our code and needed libraries together, which then runs inside a container based on configured resource need. Docker is an example of container runtime, a packaging software.

Final challenge …

So, the packaging has been simplified and running the application on a single node has been simplified. When we move to enterprise, we need to scale up/down our containers on need basis automatically. Further, one would scale the application to be served from multiple servers instead of just one for better load distribution and easy recovery/fail safe. Now, while distributing the load, we would need to ensure the availability of nodes, resources like space on node for running a container, etc.

This is where Kubernetes pitch in. It acts as a container orchestrator that help provides with a framework to run distributed systems resiliently. It takes care of scaling and failover of containers having application, provides deployment patterns, and more.

kubernetes-architecture

Kubernetes has master-slave architecture where there is one master node and multiple worker nodes. A Pod is the smallest deployable unit in it. In order to run a single container, we would need to create a Pod for that container. A Pod can contain more than one container if those containers are relatively tightly coupled (like a container to download all secret configs related before application starts in other container).

API Server is the heart of the architecture. User interacts with Kubernetes via it and master node communicates to worker nodes through it. Number of containers requested is stored in the etcd (key-value store). Controller acts as a manager that keeps a constant check on the store, schedules the request for scheduler to pick and execute, spins of another worker node in case of need.

Wrap Up …

I have just touched the surface of both containerization and Kubernetes. They seem to have much more and can be explored in depth. Along with vast benefits, it can also bring new challenges on the table with moving to cloud like security and networking.

It was good to know how application design and deployment are evolving, getting abstracted and loosely coupled.

Keep learning!

Reference: https://kubernetes.io/docs/home/

GitHub Readme Samples

Beginner’s Guide to understand Kafka

It’s a digital age. Wherever there is data, we hear about Kafka these days. One of my projects I work, involves entire data system (Java backend) that leverages Kafka to achieve what deals with tonnes of data through various channels and departments. While working on it, I thought of exploring the setup in Windows. Thus, this guide helps learn Kafka and showcases the setup and test of data pipeline in Windows.

Introduction

<kafka-logo>
An OpenSource Project in Java & Scala

Apache Kafka is a distributed streaming platform with three key capabilities:

  • Messaging system – Publish-Subscribe to stream of records
  • Availability & Reliability – Store streams of records in a fault tolerant durable way
  • Scalable & Real time – Process streams of records as they occur

Data system components

Kafka is generally used to stream data into applications, data lakes and real-time stream analytics systems.

<kafka-highlevel-architecture>

Application inputs messages onto the Kafka server. These messages can be any defined information planned to capture. It is passed across in a reliable (due to distributed Kafka architecture) way to another application or service to process or re-process them.

Internally, Kafka uses a data structure to manage its messages. These messages have a retention policy applied at a unit level of this data structure. Retention is configurable – time based or size based. By default, the data sent is stored for 168 hours (7 days).

Kafka Architecture

Typically, there would be multiples of producers, consumers, clusters working with messages across. Horizontal scaling can be easily done by adding more brokers. Diagram below depicts the sample architecture:

kafka-internals

Kafka communicates between the clients and servers with TCP protocol. For more details, refer: Kafka Protocol Guide

Kafka ecosystem provides REST proxy that allows an easy integration via HTTP and JSON too.

Primarily it has four key APIs: Producer API, Consumer API, Streams API, Connector API

Key Components & related terminology
  • Messages/Records – byte arrays of an object. Consists of a key, value & timestamp
  • Topic – feeds of messages in categories
  • Producer – processes that publish messages to a Kafka topic
  • Consumer – processes that subscribe to topics and process the feed of published messages
  • Broker – It hosts topics. Also referred as Kafka Server or Kafka Node
  • Cluster – comprises one or more brokers
  • Zookeeper – keeps the state of the cluster (brokers, topics, consumers)
  • Connector – connect topics to existing applications or data systems
  • Stream Processor – consumes an input stream from a topic and produces an output stream to an output topic
  • ISR (In-Sync Replica) – replication to support failover.
  • Controller – broker in a cluster responsible for maintaining the leader/follower relationship for all the partitions
Zookeeper

Apache ZooKeeper is an open source that helps build distributed applications. It’s a centralized service for maintaining configuration information. It holds responsibilities like:

  • Broker state – maintains list of active brokers and which cluster they are part of
  • Topics configured – maintains list of all topics, number of partitions for each topic, location of all replicas, who is the preferred leader, list of ISR for partitions
  • Controller election – selects a new controller whenever a node shuts down. Also, makes sure that there is only one controller at any given time
  • ACL info – maintains Access control lists (ACLs) for all the topics

Kafka Internals

Brokers in a cluster are differentiated based on an ID which typically are unique numbers. Connecting to one broker bootstraps a client to the entire Kafka cluster. They receive messages from producers and allow consumers to fetch messages by topic, partition and offset.

A Topic is spread across a Kafka cluster as a logical group of one or more partitions. A partition is defined as an ordered sequence of messages that are distributed across multiple brokers. The number of partitions per topic are configurable during creation.

Producers write to Topics. Consumers read from Topics.

<kafka-partition>

Kafka uses Log data structure to manage its messages. Log data structure is an ordered set of Segments that are collection of messages. Each segment has files that help locate a message:

  1. Log file – stores message
  2. Index file – stores message offset and its starting position in the log file

Kafka appends records from a producer to the end of a topic log. Consumers can read from any committed offset and are allowed to read from any offset point they choose. The record is considered committed only when all ISRs for partition write to their log.

leader-follower

Among the multiple partitions, there is one leader and remaining are replicas/followers to serve as back up. If a leader fails, an ISR is chosen as a new leader. Leader performs all reads and writes to a particular topic partition. Followers passively replicate the leader. Consumers are allowed to read only from the leader partition.

A leader and follower of a partition can never reside on the same node.

leader-follower2

Kafka also supports log compaction for records. With it, Kafka will keep the latest version of a record and delete the older versions. This leads to a granular retention mechanism where the last update for each key is kept.

Offset manager is responsible for storing, fetching and maintaining consumer offsets. Every live broker has one instance of an offset manager. By default, consumer is configured to use an automatic commit policy of periodic interval. Alternatively, consumer can use a commit API for manual offset management.

Kafka uses a particular topic, __consumer_offsets, to save consumer offsets. This offset records the read location of each consumer in each group. This helps a consumer to trace back its last location in case of need. With committing offsets to the broker, consumer no longer depends on ZooKeeper.

Older versions of Kafka (pre 0.9) stored offsets in ZooKeeper only, while newer version of Kafka, by default stores offsets in an internal Kafka topic __consumer_offsets

consumer-groups

Kafka allows consumer groups to read data in parallel from a topic. All the consumers in a group has same group ID. At a time, only one consumer from a group can consume messages from a partition to guarantee the order of reading messages from a partition. A consumer can read from more than one partition.

Kafka Setup On Windows

setup-on-windows
Pre-Requisite
Setup files
  1. Install JRE – default settings should be fine
  2. Un-tar Kafka files at C:\Installs (could be any location by choice). All the required script files for Kafka data pipeline setup will be located at: C:\Installs\kafka_2.12-2.5.0\bin\windows
  3. Configuration changes as per Windows need
    • Setup for Kafka logs – Create a folder ‘logs’ at location C:\Installs\kafka_2.12-2.5.0
    • Set this logs folder location in Kafka config file: C:\Installs\kafka_2.12-2.5.0\config\server.properties as log.dirs=C:\Installs\kafka_2.12-2.5.0\logs
    • Setup for Zookeeper data – Create a folder ‘data’ at location C:\Installs\kafka_2.12-2.5.0
    • Set this data folder location in Zookeeper config file: C:\Installs\kafka_2.12-2.5.0\config\zookeeper.properties as dataDir=C:\Installs\kafka_2.12-2.5.0\data
Execute
  1. ZooKeeper – Get a quick-and-dirty single-node ZooKeeper instance using the convenience script already packaged along with Kafka files.
    • Open a command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • Execute script: zookeeper-server-start.bat C:\Installs\kafka_2.12-2.5.0\config\zookeeper.properties
    • ZooKeeper started at localhost:2181. Keep it running.
      demo-zookeeper
  2. Kafka Server – Get a single-node Kafka instance.
    • Open another command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • ZooKeeper is already configured in the properties file as zookeeper.connect=localhost:2181
    • Execute script: kafka-server-start.bat C:\Installs\kafka_2.12-2.5.0\config\server.properties
    • Kafka server started at localhost: 9092. Keep it running.
      demo-kafka
      Now, topics can be created and messages can be stored. We can produce and consume data from any client. We will use command prompt for now.
  3. Topic – Create a topic named ‘testkafka’
    • Use replication factor as 1 & partitions as 1 given we have made a single instance node
    • Open another command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • Execute script: kafka-topics.bat --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic testkafka
    • Execute script to see created topic: kafka-topics.bat --list --bootstrap-server localhost:9092
      demo-topic
    • Keep the command prompt open just in case.
  4. Producer – setup to send messages to the server
    • Open another command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • Execute script: kafka-console-producer.bat --bootstrap-server localhost:9092 --topic testkafka
    • It will show a ‘>’ as a prompt to type a message. Type: “Kafka demo – Message from server”
      demo-producer
    • Keep the command prompt open. We will come back to it to push more messages
  5. Consumer – setup to receive messages from the server
    • Open another command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • Execute script: kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic testkafka --from-beginning
    • You would see the Producer sent message in this command prompt window – “Kafka demo – Message from server”
      demo-consumer
    • Go back to Producer command prompt and type any other message to see them appearing real time in Consumer command prompt
      kafka-demo
  6. Check/Observe – few key changes behind the scene
    • Files under topic created – they keep track of the messages pushed for a given topic
      topic-files
    • Data inside the log file – All the messages that are pushed by producer are stored here
      topic-log
    • Topics present in Kafka – once a consumer starts reading messages from topic, __consumer_offsets is automatically created as a topic
      topic-present

NOTE: In case you want to choose Zookeeper to store topics instead of Kafka server, it would require following script commands:

  • Topic create: kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic testkafka
  • Topics view: kafka-topics.bat --list --zookeeper localhost:2181

With above, we are able to see messages sent by Producer and received by Consumer using a Kafka setup.

When I tried to setup Kafka, I faced few issues on the way. I have documented them for reference to learn. This should also help others if they face something similar: Troubleshoot: Kafka setup on Windows.

One should not encounter any issues with below shared files and the steps/commands shared above.

Download entire modified setup files for Windows from here: https://github.com/sandeep-mewara/kafka-demo-windows

hurray

References:
https://kafka.apache.org
https://cwiki.apache.org/confluence/display/KAFKA
https://docs.confluent.io/2.0.0/clients/consumer.html