Microsoft Build 2021 Highlights

As shared in my earlier post, recently Microsoft held its Build Conference 2021 as a virtual event. It spanned across for more than two days that touched upon various topics. It was not as big as it used to be in person for last several years. There were numerous updates that they shared which I have tried to cover here.

msbuild-highlights

To empower developers, to empower you, to empower the world.

Microsoft vision

Core Themes

Microsoft unveiled a handful of news and tools to empower developers. There were multiple Innovative AI, mixed reality solutions, intelligent cloud tools solutions shared. Core themes for the conference were as depicted below:

build-theme

With so much changing in working mode, Microsoft shared how they are looking to support for Operational excellence with their developer platform. A deep dive session on how to increase developer velocity with Microsoft’s end to end platform was shared during the conference.

Further, data being the key for future transformations, they showcased how Azure Data was being built as a platform that could support AI & data governance.

ms-data-ai
dev-toolchain
ms-ai

They now support building deeper Artificial Intelligence infrastructure for everyone. This would help in easy onboard and usage of Azure services to have more proactive approach to solutions. Details about how to harness the power of data in our applications with Azure can be viewed here.

With Microsoft vision to empower developers, there was a session on how to build cloud native solutions that can run on premise, on edge and on multiple cloud. Recording of the session can be seen here.

ms-cloud-native

Covid has probably made the biggest shift in how we work today. A hybrid future work style is being imagined with new operating models to work, learn or collaborate.

Microsoft Teams is one of the Microsofts fastest growing product with 145 million daily active users to date.

With new working style, transformation seems to be an opportunity where cross device collaboration would be the key. Microsoft surely knows the same and are investing in it. Following session shares on how they are progressing on the collaborative applications journey: Build the next generation of collaborative apps for hybrid work

ms-collaborative-app

Based on the different kind of services, need of support, scalability options and various other factors puts software as a service, a much needed way. Microsoft looked focused on the same and shared on how they are working towards a cloud native SaaS apps composed on top of other clouds and components.

Microsoft also held a session showcasing how they are working towards helping build Metaverse Apps (digital and physical mixing up) incorporating digital twins, IOT, Autonomous systems, Power platforms and Mesh.

digital-twins
autonomous-systems
ms-mesh

Announcements

Like always, there were few key announcements about new initiatives and solutions as mentioned below:

ms-announcements

Real World

Through sessions, Microsoft covered few examples on how various other organizations are leveraging Microsoft solutions to provide awesome customer experience with speed and accuracy.

ms-toyota
Fusion Teams for mission critical delivery app
ms-twitter
Generate captions for live audio conversations using MS Speech service
ms-walmart
Power eCommerce transactions using MS CosmosDB
ms-servicenow
Incident response using MS Teams, Graph & Bot
ms-finastra
Application development using MS Financial Services cloud, Teams & Azure
ms-abinbev
Track bottles till distribution processes using Metaverse stack

Nadella in his keynote also shared that because of Covid, digital transformations have accelerated and has advanced by 10 years.

ms-tech-intensity
ms-gdp10
ms-dev-growth
ms-github-growth

Post pandemic, virtual world will still have a significant role in the new normal. Solutions across industries would want to be connected and continue use the benefits of digital first responders. This would further fuel the innovations and development across technology.

Windows Update

Microsoft’s Build 2021 event didn’t had much of Windows-specific news. Few that came out were:

  • 21H1 windows update was rolled out with multiple security fixes
  • Support for Linux GUI apps on Windows 10 will come later this year
  • Continues promoting ARM – Qualcomm ARM/Snapdragon Developer Kit was announced
  • Fall update (aka Sun-Valley Update) might have renewed UX

Windows10 used by 1.3 billion – work, learn, connect and play

Microsoft

.NET Update

.NET 6 is the next version of .NET, a modern, open-source development platform for building apps for any OS with the best performance and productivity.

.NET 6 completes the unification of the platform and adds new capabilities for building web, native and hybrid apps for Linux, Windows, Mac, iOS and Android with a single codebase. Details can be viewed here: .NET 6 deep dive – what’s new and what’s coming

  • .NET 6 preview 4 available, .NET 6 planned to be released during .NET Conf 2021
  • Visual Studio 2022 under works, preview to be available soon

Imagine Cup 2021

imagine-cup

Make an impact through coding, collaboration, and competition. Innovate with passion to tackle global issues and bring your idea to life in the Imagine Cup.

Imagine Cup

One Winner out of four student teams from across the world was announced. Teams brought their innovations focused on four social good categories – Earth, Education, Healthcare, and Lifestyle. With intention to solve issues in their local and global communities, multiple teams participated and final four were selected after evaluation on various parameters.

Team REWEBA from Kenya were announced as the Imagine Cup 2021 World Champions.

What’s next

Learn, connect, and explore all of the sessions and on-demand content from Microsoft Build anytime, anywhere.

Microsoft

With virtual as normal, Microsoft hinted at having multiple What’s next virtual events to share updates through out the year. For now, they have asked to save the date June 24 as event to share all about the operating system’s updates.

All the MS Build sessions are recorded and can be viewed from here.

Reference: https://mybuild.microsoft.com/home

Sandeep Mewara Github
News Update
Tech Explore
Data Explore
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow
What is Data Science
Word Ladder solution
What is Dynamic Programming
Learn Microsoft Tech via Videos LiveTV Streams

Microsoft WebView2 – A new friend for native apps!

Now, we can embed web content (HTML, CSS, and JavaScript) in our native applications with Microsoft Edge WebView2. Earlier, it was announced only for Win32 C/C++ apps but later, it was announced available for use in .NET 5, .NET Core, and .NET Framework Windows Forms and WPF applications.

It uses the modern Microsoft Edge (Chromium) platform to host web content within native Windows applications.

edge-webview2

In the future, the Evergreen WebView2 Runtime plans to ship with future releases of Windows. Deploy the Runtime with your production app until the Runtime becomes more universally available.

Microsoft recommendation

Power of Native

With growing presence of online world, desktop applications are pushed to use more of online like capabilities. Further, web solution also helps reuse most of the code across different platforms and easy change delivery. This is leading desktop applications more and more towards hybrid approach where best of both (native and web) can be leveraged.

hybrid
Credit: Microsoft

Microsoft WebView2 comes to rescue. It helps build powerful applications with controlled access to native capabilities.

WebView2 uses the same process model as the Microsoft Edge browser. A browser process is associated with only one user data folder. A request process that specifies more than one user data folder is associated with the same number of browser processes.

browser-process-model
Credit: Microsoft

More details about browser process model can be read here.

WebView2 apps create a user data folder to store data such as cookies, credentials, permissions, and so on. After creating the folder, your app is responsible for managing the lifetime of the user data folder, including clean up when the app is uninstalled

Microsoft – Managing user data folder

Microsoft has laid here some best practices for developing secure WebView2 application.

Distribution

When distributing your WebView2 app, ensure the backing web platform, the WebView2 Runtime, is present before the app starts.

By default, WebView2 is evergreen and receives automatic updates to stay on the latest and most secure platform.

  • Evergreen Bootstrapper – a tiny installer that downloads the Evergreen Runtime matching device architecture and installs it locally.
  • Evergreen Standalone Installer – a full-blown installer that can install the Evergreen Runtime in offline environment.
  • Fixed Version – to select and package a specific version of the WebView2 Runtime with your application.

The WebView2 Runtime is a redistributable runtime and serves as the backing web platform for WebView2 apps.

webview2-exception

Download the runtime from here. Supported platforms are mentioned here.

Sample Application

I built a sample WPF application (runs on .NET Framework and not Core) to try WebView2. This was to evaluate how comparatively older .NET applications would work out.

webview2-sample

I tried to display my blog in the WPF application using a WebView2 control. Sample application had capabilities to post message to host application and back as well as hook events as per need.

public MainWindow()
{
    InitializeComponent();

    // NavigationEvents
    webView.NavigationStarting += WebView_NavigationStarting; ;
    webView.SourceChanged += WebView_SourceChanged;
    webView.ContentLoading += WebView_ContentLoading;
    webView.NavigationCompleted += WebView_NavigationCompleted;

    // Embedded at CoreWebView2 level
    InitializeOnceCoreWebView2Intialized();
}

/// <summary>
/// initialization of CoreWebView2 is asynchronous.
/// </summary>
async private void InitializeOnceCoreWebView2Intialized()
{
    await webView.EnsureCoreWebView2Async(null);

    // Hook other events
    webView.CoreWebView2.FrameNavigationStarting += CoreWebView2_FrameNavigationStarting;
    webView.CoreWebView2.HistoryChanged += CoreWebView2_HistoryChanged;

    // For communication host to webview & vice versa
    webView.CoreWebView2.WebMessageReceived += CoreWebView2_WebMessageReceived;
    await webView.CoreWebView2.AddScriptToExecuteOnDocumentCreatedAsync("window.chrome.webview.postMessage(window.document.URL);");
    await webView.CoreWebView2.AddScriptToExecuteOnDocumentCreatedAsync("window.chrome.webview.addEventListener(\'message\', event => alert(\'Message from App to WebView2 on navigation!\'));");
}

/// <summary>
/// Web content in a WebView2 control may post a message to the host 
/// </summary>
/// <param name="sender"></param>
/// <param name="e"></param>
private void CoreWebView2_WebMessageReceived(object sender, CoreWebView2WebMessageReceivedEventArgs e)
{
    // Retrieve message from Webview2
    String uri = e.TryGetWebMessageAsString();
    addressBar.Text = uri;

    // Send message to Webview2
    webView.CoreWebView2.PostWebMessageAsString(uri);
    log.Content = $"Address bar updated ({uri}) based on WebView2 message!";
}

/// <summary>
/// Execute URL
/// </summary>
/// <param name="sender"></param>
/// <param name="e"></param>
private void ButtonGo_Click(object sender, RoutedEventArgs e)
{
    try
    {
        Uri uri = new Uri(addressBar.Text);

        if (webView != null && webView.CoreWebView2 != null)
        {
            webView.CoreWebView2.Navigate(uri.OriginalString);
        }
    }
    catch (UriFormatException)
    {
        MessageBox.Show("Please enter correct format of url!");
    }
}

/// <summary>
/// Allow only HTTPS calls
/// WebView2 starts to navigate and the navigation results in a network request. 
/// The host may disallow the request during the event.
/// </summary>
/// <param name="sender"></param>
/// <param name="e"></param>
private void WebView_NavigationStarting(object sender, CoreWebView2NavigationStartingEventArgs e)
{
    String uri = e.Uri;
    if (!uri.StartsWith("https://"))
    {
        e.Cancel = true;
        //MessageBox.Show("Only HTTPS allowed!");

        // Inject JavaScript code into WebView2 controls at runtime
        webView.CoreWebView2.ExecuteScriptAsync($"alert('{uri} is not safe, try an https link please.')");
    }
}

Various events run when specific asynchronous actions occur to the content displayed in a WebView2 instance.

navigation-graph
NavigationStartingWebView2 starts to navigate and the navigation results in a network request. The host may disallow the request during the event.
SourceChangedThe source of WebView2 changes to a new URL. The event may result from a navigation action that does not cause a network request such as a fragment navigation.
ContentLoadingWebView starts loading content for the new page.
HistoryChangedThe navigation causes the history of WebView2 to update.
NavigationCompletedWebView2 completes loading content on the new page.
ProcessFailedTo react to crashes and hangs in the browser and renderer processes
CloseTo safely shut down associated browser and renderer processes
Key events

Working with the sample application, I was able to display a webpage, intercept calls both ways and embed message/code to my need. It provides all the capabilities that seems to be needed for a stable web app display control.

Complete sample application can be downloaded from here: https://github.com/sandeep-mewara/WebView2WpfBrowserApp

Reference

https://developer.microsoft.com/en-us/microsoft-edge/webview2/
https://docs.microsoft.com/en-us/microsoft-edge/webview2/gettingstarted/wpf
https://docs.microsoft.com/en-us/microsoft-edge/webview2/concepts/distribution

Sandeep Mewara Github
News Update
Tech Explore
Data Explore
samples GitHub Profile Readme
Learn Machine Learning with Examples
Machine Learning workflow
What is Data Science
Word Ladder solution
What is Dynamic Programming
Learn Microsoft Tech via Videos LiveTV Streams

Troubleshoot: Kafka setup on Windows

Recently, I did a setup of Kafka on a windows system and shared a Kafka guide to understand and learn. I was using a Win10 VM on my MacBook. It was not a breeze setup and had few hiccups on the way. It took some time for me to resolve them one after another looking around on web. Collating all of them here for quick reference.

ERROR #1

When:
I tried to start Zookeeper.

Command:
zookeeper-server-start.bat config\zookeeper.properties

Error:
java.lang.IllegalArgumentException: config/zookeeper.properties file is missing

Stack trace:

INFO Reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2014-08-21 11:53:55,748] FATAL Invalid config, exiting abnormally (org.apache.zookeeper.server.quorum.QuorumPeerMain)
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing config/zookeeper.properties
    at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:110)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:99)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
Caused by: java.lang.IllegalArgumentException: config/zookeeper.properties file is missing
    at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:94)
    ... 2 more

How I solved?
It was clearly the case of relative path. config/zookeeper.properties was at two roots lower than where the start up script was. Either I had to correct the level or use an absolute path to move ahead.

zookeeper-server-start.bat C:\Installs\kafka_2.12-2.5.0\config\zookeeper.properties
rem OR relative path option below

zookeeper-server-start.bat ../../config/zookeeper.properties

ERROR #2

When:
Zookeeper is up and running. Attempted to start Kafka server and it failed.

Command:
kafka-server-start.bat C:\Installs\kafka_2.12-2.5.0\config\server.properties

Error:
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING

Stack trace:

........
........
2020-07-19 01:20:32,081 ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) [main]
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:268)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:264)
at kafka.zookeeper.ZooKeeperClient.(ZooKeeperClient.scala:97)
at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1694)
at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:348)
at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:372)
at kafka.server.KafkaServer.startup(KafkaServer.scala:202)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:75)
at kafka.Kafka.main(Kafka.scala)
2020-07-19 01:20:32,088 INFO shutting down (kafka.server.KafkaServer) [main]
2020-07-19 01:20:32,105 INFO shut down completed (kafka.server.KafkaServer) [main]
2020-07-19 01:20:32,106 ERROR Exiting Kafka. (kafka.server.KafkaServerStartable) [main]
2020-07-19 01:20:32,121 INFO shutting down (kafka.server.KafkaServer) [kafka-shutdown-hook]

How I solved?
Investigation lead to increasing the timeout settings for Kafka-Zookeeper. Because of environment settings (RAM, CPU, etc), it turns out this plays some role.
I updated the ${kafka_home}/config/server.properties file:

# Timeout in ms for connecting to zookeeper (default it was 18000)
zookeeper.connection.timeout.ms=36000 

I read many other reasons for this error (did not look applicable to my case) like:
1. zookeper service not running
2. restarting system
3. zookeper is hosted on zookeeper:2181 or other server name instead of localhost:2181

ERROR #3

When:
Zookeeper is up and running. Attempted to start Kafka server and it failed.

Command:
kafka-server-start.bat C:\Installs\kafka_2.12-2.5.0\config\server.properties

Error:
java.lang.OutOfMemoryError: Map failed OR java.io.IOException: Map failed

Stack trace:

.......
.......
java.io.IOException: Map failed
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:944)
        at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:115)
        at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:105)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
        at kafka.log.AbstractIndex.resize(AbstractIndex.scala:105)
        at kafka.log.LogSegment.recover(LogSegment.scala:256)
        at kafka.log.Log.kafka$log$Log$$recoverSegment(Log.scala:342)
        at kafka.log.Log.recoverLog(Log.scala:427)
        at kafka.log.Log.loadSegments(Log.scala:402)
        at kafka.log.Log.<init>(Log.scala:186)
        at kafka.log.Log$.apply(Log.scala:1609)
        at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$5$$anonfun$apply$12$$anon
fun$apply$1.apply$mcV$sp(LogManager.scala:172)
        at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1
149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:
624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.OutOfMemoryError: Map failed
        at sun.nio.ch.FileChannelImpl.map0(Native Method)
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:941)
        ... 17 more

How I solved?
It turned out related to Java heap size. I made a change in the Kafka startup script file: ${kafka_home}/bin/windows/kafka-server-start.bat

IF NOT ERRORLEVEL 1 (
        rem 32-bit OS
        set KAFKA_HEAP_OPTS=-Xmx512M -Xms512M
    ) ELSE (
        rem 64-bit OS
        rem set KAFKA_HEAP_OPTS=-Xmx1G -Xms1G => Commented this
        rem added this below line
	set KAFKA_HEAP_OPTS=-Xmx512M -Xms512M
    )

Though, while looking for solution, quite a few also solved it up upgrading their Java from 32bit to 64bit application. I did not try this solution as had other Java setup dependencies on my system that I wanted to keep intact.

ERROR #4

When:
I tried to delete Kafka topic because I was having problems while pushing message from Producer

Command:
kafka-topics.bat --list --bootstrap-server localhost:9092 --delete --topic my_topic_name

Error:
Topic test is already marked for deletion

Stack trace:

Topic test is marked for deletion.
Note: This will have no impact if delete.topic.enable is not set to true.

How I solved?
I enabled topic deletion configuration. It needs to be set as delete.topic.enable = true in file ${kafka_home}/config/server.properties. Restarted the server post updating the config.

# Delete topic enabled
delete.topic.enable=true

ERROR #5

When:
Zookeeper & Kafka is up and running. I get an error when I try to create a Topic.

Command:
kafka-topics.bat --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic testkafka

Error:
org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment

Stack trace:

Error while executing topic command : org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
[2020-07-19 01:41:35,094] ERROR java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
    at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
    at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
    at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
    at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
    at kafka.admin.TopicCommand$AdminClientTopicService.createTopic(TopicCommand.scala:163)
    at kafka.admin.TopicCommand$TopicService.createTopic(TopicCommand.scala:134)
    at kafka.admin.TopicCommand$TopicService.createTopic$(TopicCommand.scala:129)
    at kafka.admin.TopicCommand$AdminClientTopicService.createTopic(TopicCommand.scala:157)
    at kafka.admin.TopicCommand$.main(TopicCommand.scala:60)
    at kafka.admin.TopicCommand.main(TopicCommand.scala)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
 (kafka.admin.TopicCommand$)

How I solved?
For once it worked for me as is but when I tried again later, I kept getting this error. While looking on web, suggestions were to enable listener and set it up like: listeners=PLAINTEXT://localhost:9093 in the server config file.

Before attempting this, I rebooted my system as it was little sluggish too. Turns out, mostly it was memory issue. I was in a Windows VM and probably it was craving for memory space. Without a change, things worked fine as is for me.

ERROR #6

When:
This was during another instance of Kafka setup (from start) in few days. Zookeeper is up and running. Attempted to start Kafka server and it failed.

Command:
kafka-server-start.bat C:\Installs\kafka_2.12-2.5.0\config\server.properties

Error:
It was around logs or lock file.

How I solved?
Looking at details, it hinted me to look into pre-exisiting (something related to my previous setup). I went ahead and deleted the logs and data folder that was auo created when I moved ahead with the entire process setup. Post this, the error was gone. Believe my server shutdown was not smooth and thus something was interferring with the current startup.

.

Hope these would help. Keep learning!

Beginner’s Guide to understand Kafka

It’s a digital age. Wherever there is data, we hear about Kafka these days. One of my projects I work, involves entire data system (Java backend) that leverages Kafka to achieve what deals with tonnes of data through various channels and departments. While working on it, I thought of exploring the setup in Windows. Thus, this guide helps learn Kafka and showcases the setup and test of data pipeline in Windows.

Introduction

<kafka-logo>
An OpenSource Project in Java & Scala

Apache Kafka is a distributed streaming platform with three key capabilities:

  • Messaging system – Publish-Subscribe to stream of records
  • Availability & Reliability – Store streams of records in a fault tolerant durable way
  • Scalable & Real time – Process streams of records as they occur

Data system components

Kafka is generally used to stream data into applications, data lakes and real-time stream analytics systems.

<kafka-highlevel-architecture>

Application inputs messages onto the Kafka server. These messages can be any defined information planned to capture. It is passed across in a reliable (due to distributed Kafka architecture) way to another application or service to process or re-process them.

Internally, Kafka uses a data structure to manage its messages. These messages have a retention policy applied at a unit level of this data structure. Retention is configurable – time based or size based. By default, the data sent is stored for 168 hours (7 days).

Kafka Architecture

Typically, there would be multiples of producers, consumers, clusters working with messages across. Horizontal scaling can be easily done by adding more brokers. Diagram below depicts the sample architecture:

kafka-internals

Kafka communicates between the clients and servers with TCP protocol. For more details, refer: Kafka Protocol Guide

Kafka ecosystem provides REST proxy that allows an easy integration via HTTP and JSON too.

Primarily it has four key APIs: Producer API, Consumer API, Streams API, Connector API

Key Components & related terminology
  • Messages/Records – byte arrays of an object. Consists of a key, value & timestamp
  • Topic – feeds of messages in categories
  • Producer – processes that publish messages to a Kafka topic
  • Consumer – processes that subscribe to topics and process the feed of published messages
  • Broker – It hosts topics. Also referred as Kafka Server or Kafka Node
  • Cluster – comprises one or more brokers
  • Zookeeper – keeps the state of the cluster (brokers, topics, consumers)
  • Connector – connect topics to existing applications or data systems
  • Stream Processor – consumes an input stream from a topic and produces an output stream to an output topic
  • ISR (In-Sync Replica) – replication to support failover.
  • Controller – broker in a cluster responsible for maintaining the leader/follower relationship for all the partitions
Zookeeper

Apache ZooKeeper is an open source that helps build distributed applications. It’s a centralized service for maintaining configuration information. It holds responsibilities like:

  • Broker state – maintains list of active brokers and which cluster they are part of
  • Topics configured – maintains list of all topics, number of partitions for each topic, location of all replicas, who is the preferred leader, list of ISR for partitions
  • Controller election – selects a new controller whenever a node shuts down. Also, makes sure that there is only one controller at any given time
  • ACL info – maintains Access control lists (ACLs) for all the topics

Kafka Internals

Brokers in a cluster are differentiated based on an ID which typically are unique numbers. Connecting to one broker bootstraps a client to the entire Kafka cluster. They receive messages from producers and allow consumers to fetch messages by topic, partition and offset.

A Topic is spread across a Kafka cluster as a logical group of one or more partitions. A partition is defined as an ordered sequence of messages that are distributed across multiple brokers. The number of partitions per topic are configurable during creation.

Producers write to Topics. Consumers read from Topics.

<kafka-partition>

Kafka uses Log data structure to manage its messages. Log data structure is an ordered set of Segments that are collection of messages. Each segment has files that help locate a message:

  1. Log file – stores message
  2. Index file – stores message offset and its starting position in the log file

Kafka appends records from a producer to the end of a topic log. Consumers can read from any committed offset and are allowed to read from any offset point they choose. The record is considered committed only when all ISRs for partition write to their log.

leader-follower

Among the multiple partitions, there is one leader and remaining are replicas/followers to serve as back up. If a leader fails, an ISR is chosen as a new leader. Leader performs all reads and writes to a particular topic partition. Followers passively replicate the leader. Consumers are allowed to read only from the leader partition.

A leader and follower of a partition can never reside on the same node.

leader-follower2

Kafka also supports log compaction for records. With it, Kafka will keep the latest version of a record and delete the older versions. This leads to a granular retention mechanism where the last update for each key is kept.

Offset manager is responsible for storing, fetching and maintaining consumer offsets. Every live broker has one instance of an offset manager. By default, consumer is configured to use an automatic commit policy of periodic interval. Alternatively, consumer can use a commit API for manual offset management.

Kafka uses a particular topic, __consumer_offsets, to save consumer offsets. This offset records the read location of each consumer in each group. This helps a consumer to trace back its last location in case of need. With committing offsets to the broker, consumer no longer depends on ZooKeeper.

Older versions of Kafka (pre 0.9) stored offsets in ZooKeeper only, while newer version of Kafka, by default stores offsets in an internal Kafka topic __consumer_offsets

consumer-groups

Kafka allows consumer groups to read data in parallel from a topic. All the consumers in a group has same group ID. At a time, only one consumer from a group can consume messages from a partition to guarantee the order of reading messages from a partition. A consumer can read from more than one partition.

Kafka Setup On Windows

setup-on-windows
Pre-Requisite
Setup files
  1. Install JRE – default settings should be fine
  2. Un-tar Kafka files at C:\Installs (could be any location by choice). All the required script files for Kafka data pipeline setup will be located at: C:\Installs\kafka_2.12-2.5.0\bin\windows
  3. Configuration changes as per Windows need
    • Setup for Kafka logs – Create a folder ‘logs’ at location C:\Installs\kafka_2.12-2.5.0
    • Set this logs folder location in Kafka config file: C:\Installs\kafka_2.12-2.5.0\config\server.properties as log.dirs=C:\Installs\kafka_2.12-2.5.0\logs
    • Setup for Zookeeper data – Create a folder ‘data’ at location C:\Installs\kafka_2.12-2.5.0
    • Set this data folder location in Zookeeper config file: C:\Installs\kafka_2.12-2.5.0\config\zookeeper.properties as dataDir=C:\Installs\kafka_2.12-2.5.0\data
Execute
  1. ZooKeeper – Get a quick-and-dirty single-node ZooKeeper instance using the convenience script already packaged along with Kafka files.
    • Open a command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • Execute script: zookeeper-server-start.bat C:\Installs\kafka_2.12-2.5.0\config\zookeeper.properties
    • ZooKeeper started at localhost:2181. Keep it running.
      demo-zookeeper
  2. Kafka Server – Get a single-node Kafka instance.
    • Open another command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • ZooKeeper is already configured in the properties file as zookeeper.connect=localhost:2181
    • Execute script: kafka-server-start.bat C:\Installs\kafka_2.12-2.5.0\config\server.properties
    • Kafka server started at localhost: 9092. Keep it running.
      demo-kafka
      Now, topics can be created and messages can be stored. We can produce and consume data from any client. We will use command prompt for now.
  3. Topic – Create a topic named ‘testkafka’
    • Use replication factor as 1 & partitions as 1 given we have made a single instance node
    • Open another command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • Execute script: kafka-topics.bat --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic testkafka
    • Execute script to see created topic: kafka-topics.bat --list --bootstrap-server localhost:9092
      demo-topic
    • Keep the command prompt open just in case.
  4. Producer – setup to send messages to the server
    • Open another command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • Execute script: kafka-console-producer.bat --bootstrap-server localhost:9092 --topic testkafka
    • It will show a ‘>’ as a prompt to type a message. Type: “Kafka demo – Message from server”
      demo-producer
    • Keep the command prompt open. We will come back to it to push more messages
  5. Consumer – setup to receive messages from the server
    • Open another command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • Execute script: kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic testkafka --from-beginning
    • You would see the Producer sent message in this command prompt window – “Kafka demo – Message from server”
      demo-consumer
    • Go back to Producer command prompt and type any other message to see them appearing real time in Consumer command prompt
      kafka-demo
  6. Check/Observe – few key changes behind the scene
    • Files under topic created – they keep track of the messages pushed for a given topic
      topic-files
    • Data inside the log file – All the messages that are pushed by producer are stored here
      topic-log
    • Topics present in Kafka – once a consumer starts reading messages from topic, __consumer_offsets is automatically created as a topic
      topic-present

NOTE: In case you want to choose Zookeeper to store topics instead of Kafka server, it would require following script commands:

  • Topic create: kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic testkafka
  • Topics view: kafka-topics.bat --list --zookeeper localhost:2181

With above, we are able to see messages sent by Producer and received by Consumer using a Kafka setup.

When I tried to setup Kafka, I faced few issues on the way. I have documented them for reference to learn. This should also help others if they face something similar: Troubleshoot: Kafka setup on Windows.

One should not encounter any issues with below shared files and the steps/commands shared above.

Download entire modified setup files for Windows from here: https://github.com/sandeep-mewara/kafka-demo-windows

hurray

References:
https://kafka.apache.org
https://cwiki.apache.org/confluence/display/KAFKA
https://docs.confluent.io/2.0.0/clients/consumer.html

Microsoft Build Conference 2016: Keynote announcements

Wanted to share across the Microsoft’s Build conference Keynote announcements that was done on March 30 2016.

Windows 10: Usage
Out for 8 months

  • 810M users world-wide
  • 270M users in US
    • 5B visitis to the Windows Store
    • 60% growth in the “last few months alone”
    • Coming soon Universal Windows Platform Apps: includes new Facebook app + audience network

Windows 10: Update
Anniversary update of Windows 10 coming this summer (FREE)

  • Update For – New Pcs, 5-year old Pcs or a Brand New Macs
  • Insider version available today along with Update of Visual Studio 2
  • Windows Hello, Ink, Gaming, Hololens, and Cortanna updates

Desktop App Converter:
Takes a modern Win32/.Net app/Game installer and runs it though Centennial tool

  • Sage example – run it through centennial and submit to the Windows 10 app store.
  • Visual studio – Win32 code with no modifications
  • Added in Live Tile code
  • Game examples with Age of Empires 2 HD

HoloLens:

  • Starts to ship to developers and enterprise partners today (exclusive to Windows 10)
  • Code example on github are available today on the Windows Store (“Galaxy Explorer”)

“Bash” shell:
Coming to Windows (native Ubuntu windows)

  • Power of command-line tools
  • example using JavaScript, ssh, Ruby and emacs

Windows Ecosystem:

  • Cortanna as a “boundary-less” offering across all devices and user actions/history

To me, in order to have Win10 capture more market, one of the most interesting support to look forward would be: Support for Win32 Desktop apps to the Windows 10 store (coming in June)

Microsoft Products Retirement

Earlier, I was not sure if I can share the information outside, but got a confirmation today that I can.
Microsoft shared the following information with us in order to be better prepared for the upcoming year.
These are the Microsoft products that are going to retire this year and be End Of Life, meaning – End of support.

Product End Of Life Date
SQL Server 2000 4/9/2013
Commerce Server 2002 7/9/2013
BizTalk Server 2004 7/8/2014
Project Server 2003 4/8/2014
Live Communication Server 2003 1/14/2014
Office 2003 4/8/2014
Windows XP 4/8/2014
.NET Framework 1.1 10/8/2013
Visual Studio .NET 2003 10/8/2013

It is suggested by Microsoft that we migrate to newer version if we are using any one of these products.
So please have a look at what version you are working on currently and take necessary action if needed.