Kubernetes – Evolution of application deployment

Kubernetes (K8s) is turning out as the cutting-edge of application deployment. It is becoming core to the creation and operation of modern software (few call it as modern SaaS). Thus, I planned to look into it and see what Kubernetes is and how/what application design will help adapt it in the application deployment evolution.

Kubernetes is a portable, extensible, open-source platform for automating deployment, scaling, and management of containerized applications.

History

Google originally designed and open-sourced the Kubernetes project in 2014. Kubernetes has inputs from over 15 years of Google’s experience to run production workloads at scale with best ideas and practices from the community. It is maintained by the Cloud Native Computing Foundation now. It’s current development repository is here.

First challenge …

With modern goal parameters like: recoverability, release cycle time & release frequency – applications need to be designed and deployed in a way that makes them improve year over year.

This leads to first step of breaking the monolith into microservices such that the changes and impact are compartmentalized for easy deployment and recovery.

monolith2microservice

A monolithic application puts all it’s functionality in a single process. In need of scaling, it replicates entire monolith on multiple servers. On the other hand, a microservice architecture separates out (keeps) each functionality into a separate service. Thus in case of scaling need, these services are distributed across servers as required.

Second challenge …

With multiple microservices in play, a variance of stack versions or deployment styles kicks in as trouble. Each team would have their own set of tools, versions to build the artifacts, store them and then deploy them. Thus, different applications/services can have different patterns and network topology. This in turn makes managing security and infrastructure more challenging.

This leads to the step of abstracting infrastructure out to ease maintenance and relieve from security and other infrastructure related concerns.

deployment-progression
Deployment scheme evolution
  • Traditional: Applications running on a physical server. No way to define resource boundaries for applications.
  • Virtualization: Allows to run multiple Virtual Machines (VMs) on a single physical server’s CPU. This leads to better utilization of resources and better scalability as an application can be added or updated easily. Also, if needed, applications can be isolated between different VMs to provide a level of security.
  • Containers: Like VM, it has its own filesystem, CPU, memory, process space, etc. Are environment consistent, easy to scale, portable across clouds and OS distributions. This leads to loosely coupled setup where application is totally decoupled from infrastructure and makes it easy to move towards smaller, modular microservices.

Containers are abstraction to next level. It does not matter on which OS you are on (although there could be different containers for different OS and how they work underlying), all we need is to package our code and needed libraries together, which then runs inside a container based on configured resource need. Docker is an example of container runtime, a packaging software.

Final challenge …

So, the packaging has been simplified and running the application on a single node has been simplified. When we move to enterprise, we need to scale up/down our containers on need basis automatically. Further, one would scale the application to be served from multiple servers instead of just one for better load distribution and easy recovery/fail safe. Now, while distributing the load, we would need to ensure the availability of nodes, resources like space on node for running a container, etc.

This is where Kubernetes pitch in. It acts as a container orchestrator that help provides with a framework to run distributed systems resiliently. It takes care of scaling and failover of containers having application, provides deployment patterns, and more.

kubernetes-architecture

Kubernetes has master-slave architecture where there is one master node and multiple worker nodes. A Pod is the smallest deployable unit in it. In order to run a single container, we would need to create a Pod for that container. A Pod can contain more than one container if those containers are relatively tightly coupled (like a container to download all secret configs related before application starts in other container).

API Server is the heart of the architecture. User interacts with Kubernetes via it and master node communicates to worker nodes through it. Number of containers requested is stored in the etcd (key-value store). Controller acts as a manager that keeps a constant check on the store, schedules the request for scheduler to pick and execute, spins of another worker node in case of need.

Wrap Up …

I have just touched the surface of both containerization and Kubernetes. They seem to have much more and can be explored in depth. Along with vast benefits, it can also bring new challenges on the table with moving to cloud like security and networking.

It was good to know how application design and deployment are evolving, getting abstracted and loosely coupled.

Keep learning!

Reference: https://kubernetes.io/docs/home/

GitHub Readme Samples

Troubleshoot: Kafka setup on Windows

Recently, I did a setup of Kafka on a windows system and shared a Kafka guide to understand and learn. I was using a Win10 VM on my MacBook. It was not a breeze setup and had few hiccups on the way. It took some time for me to resolve them one after another looking around on web. Collating all of them here for quick reference.

ERROR #1

When:
I tried to start Zookeeper.

Command:
zookeeper-server-start.bat config\zookeeper.properties

Error:
java.lang.IllegalArgumentException: config/zookeeper.properties file is missing

Stack trace:

INFO Reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2014-08-21 11:53:55,748] FATAL Invalid config, exiting abnormally (org.apache.zookeeper.server.quorum.QuorumPeerMain)
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing config/zookeeper.properties
    at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:110)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:99)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
Caused by: java.lang.IllegalArgumentException: config/zookeeper.properties file is missing
    at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:94)
    ... 2 more

How I solved?
It was clearly the case of relative path. config/zookeeper.properties was at two roots lower than where the start up script was. Either I had to correct the level or use an absolute path to move ahead.

zookeeper-server-start.bat C:\Installs\kafka_2.12-2.5.0\config\zookeeper.properties
rem OR relative path option below

zookeeper-server-start.bat ../../config/zookeeper.properties

ERROR #2

When:
Zookeeper is up and running. Attempted to start Kafka server and it failed.

Command:
kafka-server-start.bat C:\Installs\kafka_2.12-2.5.0\config\server.properties

Error:
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING

Stack trace:

........
........
2020-07-19 01:20:32,081 ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) [main]
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:268)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:264)
at kafka.zookeeper.ZooKeeperClient.(ZooKeeperClient.scala:97)
at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1694)
at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:348)
at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:372)
at kafka.server.KafkaServer.startup(KafkaServer.scala:202)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:75)
at kafka.Kafka.main(Kafka.scala)
2020-07-19 01:20:32,088 INFO shutting down (kafka.server.KafkaServer) [main]
2020-07-19 01:20:32,105 INFO shut down completed (kafka.server.KafkaServer) [main]
2020-07-19 01:20:32,106 ERROR Exiting Kafka. (kafka.server.KafkaServerStartable) [main]
2020-07-19 01:20:32,121 INFO shutting down (kafka.server.KafkaServer) [kafka-shutdown-hook]

How I solved?
Investigation lead to increasing the timeout settings for Kafka-Zookeeper. Because of environment settings (RAM, CPU, etc), it turns out this plays some role.
I updated the ${kafka_home}/config/server.properties file:

# Timeout in ms for connecting to zookeeper (default it was 18000)
zookeeper.connection.timeout.ms=36000 

I read many other reasons for this error (did not look applicable to my case) like:
1. zookeper service not running
2. restarting system
3. zookeper is hosted on zookeeper:2181 or other server name instead of localhost:2181

ERROR #3

When:
Zookeeper is up and running. Attempted to start Kafka server and it failed.

Command:
kafka-server-start.bat C:\Installs\kafka_2.12-2.5.0\config\server.properties

Error:
java.lang.OutOfMemoryError: Map failed OR java.io.IOException: Map failed

Stack trace:

.......
.......
java.io.IOException: Map failed
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:944)
        at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:115)
        at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:105)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
        at kafka.log.AbstractIndex.resize(AbstractIndex.scala:105)
        at kafka.log.LogSegment.recover(LogSegment.scala:256)
        at kafka.log.Log.kafka$log$Log$$recoverSegment(Log.scala:342)
        at kafka.log.Log.recoverLog(Log.scala:427)
        at kafka.log.Log.loadSegments(Log.scala:402)
        at kafka.log.Log.<init>(Log.scala:186)
        at kafka.log.Log$.apply(Log.scala:1609)
        at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$5$$anonfun$apply$12$$anon
fun$apply$1.apply$mcV$sp(LogManager.scala:172)
        at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1
149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:
624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.OutOfMemoryError: Map failed
        at sun.nio.ch.FileChannelImpl.map0(Native Method)
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:941)
        ... 17 more

How I solved?
It turned out related to Java heap size. I made a change in the Kafka startup script file: ${kafka_home}/bin/windows/kafka-server-start.bat

IF NOT ERRORLEVEL 1 (
        rem 32-bit OS
        set KAFKA_HEAP_OPTS=-Xmx512M -Xms512M
    ) ELSE (
        rem 64-bit OS
        rem set KAFKA_HEAP_OPTS=-Xmx1G -Xms1G => Commented this
        rem added this below line
	set KAFKA_HEAP_OPTS=-Xmx512M -Xms512M
    )

Though, while looking for solution, quite a few also solved it up upgrading their Java from 32bit to 64bit application. I did not try this solution as had other Java setup dependencies on my system that I wanted to keep intact.

ERROR #4

When:
I tried to delete Kafka topic because I was having problems while pushing message from Producer

Command:
kafka-topics.bat --list --bootstrap-server localhost:9092 --delete --topic my_topic_name

Error:
Topic test is already marked for deletion

Stack trace:

Topic test is marked for deletion.
Note: This will have no impact if delete.topic.enable is not set to true.

How I solved?
I enabled topic deletion configuration. It needs to be set as delete.topic.enable = true in file ${kafka_home}/config/server.properties. Restarted the server post updating the config.

# Delete topic enabled
delete.topic.enable=true

ERROR #5

When:
Zookeeper & Kafka is up and running. I get an error when I try to create a Topic.

Command:
kafka-topics.bat --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic testkafka

Error:
org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment

Stack trace:

Error while executing topic command : org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
[2020-07-19 01:41:35,094] ERROR java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
    at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
    at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
    at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
    at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
    at kafka.admin.TopicCommand$AdminClientTopicService.createTopic(TopicCommand.scala:163)
    at kafka.admin.TopicCommand$TopicService.createTopic(TopicCommand.scala:134)
    at kafka.admin.TopicCommand$TopicService.createTopic$(TopicCommand.scala:129)
    at kafka.admin.TopicCommand$AdminClientTopicService.createTopic(TopicCommand.scala:157)
    at kafka.admin.TopicCommand$.main(TopicCommand.scala:60)
    at kafka.admin.TopicCommand.main(TopicCommand.scala)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
 (kafka.admin.TopicCommand$)

How I solved?
For once it worked for me as is but when I tried again later, I kept getting this error. While looking on web, suggestions were to enable listener and set it up like: listeners=PLAINTEXT://localhost:9093 in the server config file.

Before attempting this, I rebooted my system as it was little sluggish too. Turns out, mostly it was memory issue. I was in a Windows VM and probably it was craving for memory space. Without a change, things worked fine as is for me.

ERROR #6

When:
This was during another instance of Kafka setup (from start) in few days. Zookeeper is up and running. Attempted to start Kafka server and it failed.

Command:
kafka-server-start.bat C:\Installs\kafka_2.12-2.5.0\config\server.properties

Error:
It was around logs or lock file.

How I solved?
Looking at details, it hinted me to look into pre-exisiting (something related to my previous setup). I went ahead and deleted the logs and data folder that was auo created when I moved ahead with the entire process setup. Post this, the error was gone. Believe my server shutdown was not smooth and thus something was interferring with the current startup.

.

Hope these would help. Keep learning!

Beginner’s Guide to understand Kafka

It’s a digital age. Wherever there is data, we hear about Kafka these days. One of my projects I work, involves entire data system (Java backend) that leverages Kafka to achieve what deals with tonnes of data through various channels and departments. While working on it, I thought of exploring the setup in Windows. Thus, this guide helps learn Kafka and showcases the setup and test of data pipeline in Windows.

Introduction

<kafka-logo>
An OpenSource Project in Java & Scala

Apache Kafka is a distributed streaming platform with three key capabilities:

  • Messaging system – Publish-Subscribe to stream of records
  • Availability & Reliability – Store streams of records in a fault tolerant durable way
  • Scalable & Real time – Process streams of records as they occur

Data system components

Kafka is generally used to stream data into applications, data lakes and real-time stream analytics systems.

<kafka-highlevel-architecture>

Application inputs messages onto the Kafka server. These messages can be any defined information planned to capture. It is passed across in a reliable (due to distributed Kafka architecture) way to another application or service to process or re-process them.

Internally, Kafka uses a data structure to manage its messages. These messages have a retention policy applied at a unit level of this data structure. Retention is configurable – time based or size based. By default, the data sent is stored for 168 hours (7 days).

Kafka Architecture

Typically, there would be multiples of producers, consumers, clusters working with messages across. Horizontal scaling can be easily done by adding more brokers. Diagram below depicts the sample architecture:

kafka-internals

Kafka communicates between the clients and servers with TCP protocol. For more details, refer: Kafka Protocol Guide

Kafka ecosystem provides REST proxy that allows an easy integration via HTTP and JSON too.

Primarily it has four key APIs: Producer API, Consumer API, Streams API, Connector API

Key Components & related terminology
  • Messages/Records – byte arrays of an object. Consists of a key, value & timestamp
  • Topic – feeds of messages in categories
  • Producer – processes that publish messages to a Kafka topic
  • Consumer – processes that subscribe to topics and process the feed of published messages
  • Broker – It hosts topics. Also referred as Kafka Server or Kafka Node
  • Cluster – comprises one or more brokers
  • Zookeeper – keeps the state of the cluster (brokers, topics, consumers)
  • Connector – connect topics to existing applications or data systems
  • Stream Processor – consumes an input stream from a topic and produces an output stream to an output topic
  • ISR (In-Sync Replica) – replication to support failover.
  • Controller – broker in a cluster responsible for maintaining the leader/follower relationship for all the partitions
Zookeeper

Apache ZooKeeper is an open source that helps build distributed applications. It’s a centralized service for maintaining configuration information. It holds responsibilities like:

  • Broker state – maintains list of active brokers and which cluster they are part of
  • Topics configured – maintains list of all topics, number of partitions for each topic, location of all replicas, who is the preferred leader, list of ISR for partitions
  • Controller election – selects a new controller whenever a node shuts down. Also, makes sure that there is only one controller at any given time
  • ACL info – maintains Access control lists (ACLs) for all the topics

Kafka Internals

Brokers in a cluster are differentiated based on an ID which typically are unique numbers. Connecting to one broker bootstraps a client to the entire Kafka cluster. They receive messages from producers and allow consumers to fetch messages by topic, partition and offset.

A Topic is spread across a Kafka cluster as a logical group of one or more partitions. A partition is defined as an ordered sequence of messages that are distributed across multiple brokers. The number of partitions per topic are configurable during creation.

Producers write to Topics. Consumers read from Topics.

<kafka-partition>

Kafka uses Log data structure to manage its messages. Log data structure is an ordered set of Segments that are collection of messages. Each segment has files that help locate a message:

  1. Log file – stores message
  2. Index file – stores message offset and its starting position in the log file

Kafka appends records from a producer to the end of a topic log. Consumers can read from any committed offset and are allowed to read from any offset point they choose. The record is considered committed only when all ISRs for partition write to their log.

leader-follower

Among the multiple partitions, there is one leader and remaining are replicas/followers to serve as back up. If a leader fails, an ISR is chosen as a new leader. Leader performs all reads and writes to a particular topic partition. Followers passively replicate the leader. Consumers are allowed to read only from the leader partition.

A leader and follower of a partition can never reside on the same node.

leader-follower2

Kafka also supports log compaction for records. With it, Kafka will keep the latest version of a record and delete the older versions. This leads to a granular retention mechanism where the last update for each key is kept.

Offset manager is responsible for storing, fetching and maintaining consumer offsets. Every live broker has one instance of an offset manager. By default, consumer is configured to use an automatic commit policy of periodic interval. Alternatively, consumer can use a commit API for manual offset management.

Kafka uses a particular topic, __consumer_offsets, to save consumer offsets. This offset records the read location of each consumer in each group. This helps a consumer to trace back its last location in case of need. With committing offsets to the broker, consumer no longer depends on ZooKeeper.

Older versions of Kafka (pre 0.9) stored offsets in ZooKeeper only, while newer version of Kafka, by default stores offsets in an internal Kafka topic __consumer_offsets

consumer-groups

Kafka allows consumer groups to read data in parallel from a topic. All the consumers in a group has same group ID. At a time, only one consumer from a group can consume messages from a partition to guarantee the order of reading messages from a partition. A consumer can read from more than one partition.

Kafka Setup On Windows

setup-on-windows
Pre-Requisite
Setup files
  1. Install JRE – default settings should be fine
  2. Un-tar Kafka files at C:\Installs (could be any location by choice). All the required script files for Kafka data pipeline setup will be located at: C:\Installs\kafka_2.12-2.5.0\bin\windows
  3. Configuration changes as per Windows need
    • Setup for Kafka logs – Create a folder ‘logs’ at location C:\Installs\kafka_2.12-2.5.0
    • Set this logs folder location in Kafka config file: C:\Installs\kafka_2.12-2.5.0\config\server.properties as log.dirs=C:\Installs\kafka_2.12-2.5.0\logs
    • Setup for Zookeeper data – Create a folder ‘data’ at location C:\Installs\kafka_2.12-2.5.0
    • Set this data folder location in Zookeeper config file: C:\Installs\kafka_2.12-2.5.0\config\zookeeper.properties as dataDir=C:\Installs\kafka_2.12-2.5.0\data
Execute
  1. ZooKeeper – Get a quick-and-dirty single-node ZooKeeper instance using the convenience script already packaged along with Kafka files.
    • Open a command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • Execute script: zookeeper-server-start.bat C:\Installs\kafka_2.12-2.5.0\config\zookeeper.properties
    • ZooKeeper started at localhost:2181. Keep it running.
      demo-zookeeper
  2. Kafka Server – Get a single-node Kafka instance.
    • Open another command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • ZooKeeper is already configured in the properties file as zookeeper.connect=localhost:2181
    • Execute script: kafka-server-start.bat C:\Installs\kafka_2.12-2.5.0\config\server.properties
    • Kafka server started at localhost: 9092. Keep it running.
      demo-kafka
      Now, topics can be created and messages can be stored. We can produce and consume data from any client. We will use command prompt for now.
  3. Topic – Create a topic named ‘testkafka’
    • Use replication factor as 1 & partitions as 1 given we have made a single instance node
    • Open another command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • Execute script: kafka-topics.bat --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic testkafka
    • Execute script to see created topic: kafka-topics.bat --list --bootstrap-server localhost:9092
      demo-topic
    • Keep the command prompt open just in case.
  4. Producer – setup to send messages to the server
    • Open another command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • Execute script: kafka-console-producer.bat --bootstrap-server localhost:9092 --topic testkafka
    • It will show a ‘>’ as a prompt to type a message. Type: “Kafka demo – Message from server”
      demo-producer
    • Keep the command prompt open. We will come back to it to push more messages
  5. Consumer – setup to receive messages from the server
    • Open another command prompt and move to location: C:\Installs\kafka_2.12-2.5.0\bin\windows
    • Execute script: kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic testkafka --from-beginning
    • You would see the Producer sent message in this command prompt window – “Kafka demo – Message from server”
      demo-consumer
    • Go back to Producer command prompt and type any other message to see them appearing real time in Consumer command prompt
      kafka-demo
  6. Check/Observe – few key changes behind the scene
    • Files under topic created – they keep track of the messages pushed for a given topic
      topic-files
    • Data inside the log file – All the messages that are pushed by producer are stored here
      topic-log
    • Topics present in Kafka – once a consumer starts reading messages from topic, __consumer_offsets is automatically created as a topic
      topic-present

NOTE: In case you want to choose Zookeeper to store topics instead of Kafka server, it would require following script commands:

  • Topic create: kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic testkafka
  • Topics view: kafka-topics.bat --list --zookeeper localhost:2181

With above, we are able to see messages sent by Producer and received by Consumer using a Kafka setup.

When I tried to setup Kafka, I faced few issues on the way. I have documented them for reference to learn. This should also help others if they face something similar: Troubleshoot: Kafka setup on Windows.

One should not encounter any issues with below shared files and the steps/commands shared above.

Download entire modified setup files for Windows from here: https://github.com/sandeep-mewara/kafka-demo-windows

hurray

References:
https://kafka.apache.org
https://cwiki.apache.org/confluence/display/KAFKA
https://docs.confluent.io/2.0.0/clients/consumer.html

Quick look into SignalR

Last year, I was looking into various ways of communication between client and server for one of my project work. Evaluated SignalR too as one of the options. I found decent documentation online to put across a test app to see how it works.

Introduction

SignalR is a free and open-source software library for Microsoft ASP.NET that provides ability to server-side code push content to the connected clients in real-time.

Pushing data from the server to the client (not just browser clients) has always been a tough problem. SignalR makes it dead easy and handles all the heavy lifting for you.

https://github.com/SignalR/SignalR

Detailed documentation can be found at: https://dotnet.microsoft.com/apps/aspnet/signalr

Most of the shareout online considers it as ASP.NET/Web application solution only which is not true. As mentioned in quote above, client could be a non-browser too like a desktop application.

I gathered that behind the scenes, SignalR primarily tries to use Web Socket protocol to send data. WebSocket is a new HTML5 API (refer my detailed article on it) that enables bi-directional communication between the client and server. In case of any compatibility gap, SignalR uses other protocols like long polling.

P.S.: Since most of the code to make the test app was picked from web, all the credit to them. One specific source that I can recall: https://docs.microsoft.com/en-us/aspnet/signalr/overview/deployment/tutorial-signalr-self-host

Now, a quick peek into the SignalR test app
  • Using the SignalR Hub class to setup the server. Defined a hub that exposes a method like an end point for clients to send a message. Server can process the message and send back a response to all or few of the clients.
[HubName("TestHub")]
public class TestHub : Hub
{
    public void ProcessMessageFromClient(string message)
    {
        Console.WriteLine($"<Client sent:> {message}");

        // Do some processing with the client request and send the response back
        string newMessage = $"<Service sent>: Client message back in upper case: {message.ToUpper()}";
        Clients.All.ResponseToClientMessage(newMessage);
    }
}

  • Specify which “origins” we want to allow our application to accept. It is setup via CORS configuration. CORS is a security concept that allows end points from different domains to interact with each other.
public void Configuration(IAppBuilder app)
{
     app.UseCors(CorsOptions.AllowAll);
     app.MapSignalR();
}

  • Server is setup using OWIN (Open Web Interface for .NET). OWIN defines an abstraction between .NET web servers and web applications. This helps in self-hosting a web application in a process, outside of IIS.
static void Main(string[] args)
{
    string url = @"http://localhost:8080/";

    using (WebApp.Start<Startup>(url))
    {
        Console.WriteLine($"============ SERVER ============");
        Console.WriteLine($"Server running at {url}");
        Console.WriteLine("Wait for clients message requests for server to respond OR");
        Console.WriteLine("Type any message - it will be broadcast to all clients.");
        Console.WriteLine($"================================");

        // For server broadcast test
        // Get hub context 
        IHubContext ctx = GlobalHost.ConnectionManager.GetHubContext<TestHub>();

        string line = null;
        while ((line = Console.ReadLine()) != null)
        {
            string newMessage = $"<Server sent:> {line}";
            ctx.Clients.All.MessageFromServer(newMessage);
        }

        // pause to allow clients to receive
        Console.ReadLine();
    }
}

In above code, Using IHubContext:

  1. Server is setup to have earlier defined Hub as one of the broadcast end point.
  2. Server is also setup to broadcast any message to all clients by itself if needed be

  • Client is setup to communicate with the server (send and receive message) via hub using HubConnection & IHubProxy. Client can invoke the exposed end point in the hub to a send a message.
static void Main(string[] args)
{
    string url = @"http://localhost:8080/";

    var connection = new HubConnection(url);
    IHubProxy _hub = connection.CreateHubProxy("TestHub");
    connection.Start().Wait();

    // For server side initiation of messages
    _hub.On("MessageFromServer", x => Console.WriteLine(x));
    _hub.On("ResponseToClientMessage", x => Console.WriteLine(x));

    Console.WriteLine($"============ CLIENT ============");
    Console.WriteLine("Type any message - it will be sent as a request to server for a response.");
    Console.WriteLine($"================================");

    string line = null;
    while ((line = Console.ReadLine()) != null)
    {
        // Send message to Server
        _hub.Invoke("ProcessMessageFromClient", line).Wait();
    }

    Console.Read();
}

With above setup, we can see the communication between Client & Server realtime like below:

SignalR test

Things to consider while opting for SignalR

SignalR looks awesome. But, there are couple of things one should know while using it:

  • It’s a connected technology – each client connected through SignalR is using a persistent and dedicated connection on the Web Server. SignalR is shared as scalable but it never harms to look for any queries around connections, memory, cpu, etc with higher number of clients.
  • One would need to setup a valid host server with a PORT open on it that can be used for communication. This could depend on an organisation security protocols and approvals.

Hope this gives a quick overview about SignalR. Keep learning!

Download entire code for lookup from here: https://github.com/sandeep-mewara/SignalRTest

HttpClient single instance or multiple

Recently, we were working on a project that needed numerous HTTP requests to be made. Initial implementation had a new HttpClient object being created for every request being made. It looked to have some performance cost attached to it that led us to evaluate the effect of using single vs multiple instances of HttpClient.

Problem Statement:

Whats the best way to use HttpClient for multiple requests and the performance cost associated with it?

Assessment:

Went through the Microsoft documentation, which seemed updated based on last when I read few years back. Found a fineprint for myself that states:

HttpClient is intended to be instantiated once and re-used throughout the life of an application. Instantiating an HttpClient class for every request will exhaust the number of sockets available under heavy loads. This will result in SocketException errors.

This was a straight give away that we should use a single instance HttpClient – irrespective of a usecase, one would want to keep distance from SocketException errors (though probability of it would be high for heavy usage of HTTP requests).

Now, the query was how to have single HttpClient for multiple requests but with different request payload for the calls? Also, does this has any impact on performance of the calls and if so, how much?

Resolution:

I started with looking into performance aspect for the two options. Created a test application that helped evaluate the time taken for various number of requests. Tried with www.google.com but seems they have some kind of check at 1000 requests so went ahead with www.bing.com that looked uniform till 5000 requests that I tried with.

for (var i = 0; i < noOfConnections; i++)
{
    using (var httpClient = new HttpClient())
    {
        var result = httpClient.GetAsync(new Uri("http://www.bing.com/")).Result;
    }
}
//having private static readonly HttpClient _httpClient = new HttpClient();
for (var i = 0; i < noOfConnections; i++)
{
    var result = _httpClient.GetAsync(new Uri("http://www.bing.com/")).Result;
}

With the above, I got the following numbers on an average post few runs:

No of RequestsMultiple Instance (s)Single Instance (s)%age Diff
1002016.6716.65
5001038814.56
100021617419.44
200043035118.37
5000103290612.21

It looked like the difference peaked around 1000 requests and overall there was an improvement with single instance.

Now, given we had a usecase where multiple HTTP requests has to be made simultaneously but with different payloads, looked at how to achieve it with single instance. Keeping multiple types of requests, unit testing, high load – One possible way looked like below that worked out well for us:

// Single instance of HttpClientManager was setup
public class HttpClientManager : IHttpClientManager
{
    ...
    public HttpClientManager(HttpMessageHandler messageHandler)
    {
        _httpClient = new HttpClient(messageHandler);
    }
    private HttpRequestMessage SetupRequest(IRequestPayload requestPayload)
    {
        var request = new HttpRequestMessage
        {
            RequestUri = new Uri(requestPayload.Url)
        };
        switch (requestPayload.RequestType)
        {
            case RequestType.POST_ASYNC:
                request.Method = HttpMethod.Post;
                request.Content = GetHttpContent(requestPayload.ContentJson);
                break;
            case RequestType.PUT_ASYNC:
                request.Method = HttpMethod.Put;
                request.Content = GetHttpContent(requestPayload.ContentJson);
                break;
            case RequestType.DELETE_ASYNC:
                request.Method = HttpMethod.Delete;
                break;
            case RequestType.GET_ASYNC:
                request.Method = HttpMethod.Get;
                break;
            default:
                request.Method = HttpMethod.Get;
                break;
        }
        ...
    }
    public HttpResponseMessage ExecuteRequest(IRequestPayload requestPayload)
    {
        HttpRequestMessage httpRequestMessage = SetupRequest(requestPayload);
        HttpResponseMessage httpResponseMessage = _httpClient.SendAsync(httpRequestMessage, HttpCompletionOption.ResponseHeadersRead).Result;
        return httpResponseMessage;
    }
    private HttpContent GetHttpContent(string contentJson)
    {
        return new StringContent(contentJson, ENCODING, MEDIATYPE_JSON);
    }
}

Since there are numerous articles on the web explaining details of the entire HttpClient workflow and inner details, I will not cover that here but a quick explanation on couple of key info. In the code above:

HttpRequestMessage is used to setup HttpClient object based on our need. We make use of the fact that HttpRequestMessage can be used only once. After the request is sent, it is disposed immediately to ensure that any associated Content object is disposed.

Making use of HttpClient underlying implementation, have used HttpMessageHandler more from the unit test point of view.

Conclusion:

One should use a single instance of HttpClient at application level to avoid create/destroy of it multiple times. Further, results suggest this also has better performance with more than 12% improvement based on the load.

For multiple requests of different payloads, having a single instance HttpClient but a new HttpRequestMessage for every request looked a good approach to use.

P.S.: For .NET Core, Microsoft added a new interface around the same discussion to have better handle at HttpClient instance: https://docs.microsoft.com/en-us/dotnet/architecture/microservices/implement-resilient-applications/use-httpclientfactory-to-implement-resilient-http-requests

Entire code for lookup can be downloaded from here.