AWS re:Invent – Amazon Virtual Event 2020

Amazon is planning for AWS re:Invent 2020, a cloud learning virtual event that would be open for all for free. It is a conference for the global cloud computing community, hosted by AWS.

Full details of the events are here. Registrations are open.

Event is spread across 3 weeks: Nov 30 – Dec 18, 2020

amazon-reinvent

Get inspired to build
Enjoy unlimited access to hundreds of sessions led by AWS experts. Hear from cloud leaders and be the first to learn what’s new.

Event highlight shared by AWS

There would be multiple interactive ways to learn. Like, Sessions with over 50 content tracks, Training & Certification to advance your professional skills, all mixed with some fun activities in between.

The world is moving towards cloud. Thus, it seems a good opportunity to see how various organizations are using the cloud to solve problems and the skills needed for it.


Happy connecting!

samples GitHub Profile Readme
Learn Python – Beginners step by step – Basics and Examples
Sandeep Mewara Github
Sandeep Mewara Learn By Insight
Matplotlib plot samples
Sandeep Mewara Github Repositories
Learn Microsoft Tech via Videos LiveTV Streams

How to solve Word Ladder Problem?

Sometime back, a colleague of mine asked me about the word ladder problem. She was looking for a change. So, I believe she stumbled across this while preparing for data structures and algorithms.

graph-header

Problem Statement

Typically, the puzzle shared is a flavor of below:

Find the smallest number of transformations needed to change an initial word to a target word of same length. In every transformation, change only one character and make sure word exists in the given dictionary.

Explanation

Assuming all these 4 letter words are there in the dictionary provided, it takes minimum 4 transitions to convert word from SAIL to RUIN, i.e.
SAIL -> MAIL -> MAIN -> RAIN -> RUIN

Intent here is to know about Graph algorithm. So, what are graphs in context of algorithms and how do we apply them to solve such problems?

Graph Data Structure

Graphs are flow structure that represents entities connection with each other. Visually, they are represented with help of a Node (Vertex) & an Edge (Connector).

graph-general

A tree is an undirected graph in which any two nodes are connected by only one path. In it, each node (except the root node) comprises exactly one parent node.

Most common way to represent a graph is using an Adjacency matrix. In it, Element A[i][j] is 1 if there is an edge from node i to node j or else it is 0. For example, adjacency matrix of above unidirected graph is:

  | 1 2 3 4
------------
1 | 0 1 0 1
2 | 1 0 1 0
3 | 0 1 0 1
4 | 1 0 1 0

Another common way is via Adjacency list. (List format of the data instead of a matrix.)

Related Algorithms

Graphs are applied in search algorithms. Traversing the nodes and edges in a defined order helps in optimizing search. There are two specific approaches to traverse graph:

Breadth First Search (BFS)

Given a graph G and a starting node s, search proceeds by exploring edges in the graph to find all the nodes in G for which there is a path from s. With this approach, it finds all the nodes that are at a distance k from s before it finds any nodes that are at a distance k+1.

For easy visualization, think of it as, in a tree, finding all the child nodes for a parent node as first step. Post it, find all the grandchildren and hence forth.

Depth First Search (DFS)

Given a graph G and a starting node s, search proceeds by exploring edges in the graph to find all the nodes in G traversed from s through it’s edges. With this approach, we go deep in graph connecting as many nodes in the graph as possible and branch where necessary.

For easy visualization, think of it as, in a tree, finding all the family nodes for a parent node. With this, for a given node, we connect its children, grand children, grand grand children and so on before moving to next node of same level.

Thus, with DFS approach, we can have multiple deduced trees.

Knight’s tour is a classic example that leverages Depth First Search algorithm.

Shortest Path First OR Dijkstra’s Algorithm (SPF)

Given a graph G and a starting node s, search the shortest path to reach node d. It uses a concept of weights. It’s an iterative algorithm similar to results of BFS.

Many real world example fits in here, e.g. what would be shortest path from home to office.

With BFS (a simple queue), we visit one node at a time whereas in SPF (a priority queue), we visit a node at any level with lowest cost. In a sense, BFS follows Dijkstra's algorithm, a step at a time with all edge weights equal to 1. The process for exploring the graph is structurally the same in both cases. at times, BFS is preferred with equal weight graphs. This is because, operations on a priority queue are O(log n) compared to operations on a regular queue which is O(1).

Code

I will be using a breadth first graph algorithm here based on the problem need:

import collections
from collections import deque 

class Solution(object):
    # method that will help find the path
    def ladderLength(self, beginWord, 
                        endWord, wordList):
        """
        :type beginWord: str
        :type endWord: str
        :type wordList: Set[str]
        :returntype: int
        """

        # Queue for BFS
        queue = deque()

        # start by adding begin word
        queue.append((beginWord, [beginWord]))

        while queue:
            # let's keep a watch at active queue
            print('Current queue:',queue)

            # get the current node and 
            # path how it came
            node, path = queue.popleft()

            # let's keep track of path length 
            # traversed so far
            print('Current transformation count:',
                                        len(path))

            # find possible next set of 
            # child nodes, 1 diff
            for next in self.next_nodes(node, 
                            wordList) - set(path):
                # traversing through all child nodes
                # if any of the child matches, 
                # we are good               
                if next == endWord:
                    print('found endword at path:',
                                            path)
                    return len(path)
                else:
                    # keep record of next 
                    # possible paths
                    queue.append((next, 
                                path + [next]))
        return 0

    def next_nodes(self, word, word_list):
        # start with empty collection
        possiblenodes = set()

        # all the words are of fixed length
        wl_word_length = len(word)

        # loop through all the words in 
        # the word list
        for wl_word in word_list:
            mismatch_count = 0

            # find all the words that are 
            # only a letter different from 
            # current word those are the 
            # possible next child nodes
            for i in range(wl_word_length):
                if wl_word[i] != word[i]:
                    mismatch_count += 1
            if mismatch_count == 1:
                # only one alphabet different-yes
                possiblenodes.add(wl_word)
        
        # lets see the set of next possible nodes 
        print('possible next nodes:',possiblenodes)
        return possiblenodes

# Setup
beginWord = "SAIL"
endWord = "RUIN"
wordList = ["SAIL","RAIN","REST","BAIL","MAIL",
                                    "MAIN","RUIN"]

# Call
print('Transformations needed: ',
    Solution().ladderLength(beginWord, 
                            endWord, wordList))

# Transformation expected == 4
# One possible shortes path with 4 transformation:
# SAIL -> MAIL -> MAIN -> RAIN -> RUIN

Used deque (doubly ended queue) of Python

deque helps with quicker append and pop operations from both the ends. It has O(1) time complexity for append and pop operations. In comparison, list provides it in O(n) time complexity.

A quick look at the code workflow to validate if all nodes at a particular distance was traversed first and then moved to next level:

Current queue: deque([('SAIL', ['SAIL'])])

Current transformation count: 1
possible next nodes: {'BAIL', 'MAIL'}
Current queue: deque([('BAIL', ['SAIL', 'BAIL']), 
                      ('MAIL', ['SAIL', 'MAIL'])])

Current transformation count: 2
possible next nodes: {'SAIL', 'MAIL'}
Current queue: deque([('MAIL', ['SAIL', 'MAIL']), 
                      ('MAIL', ['SAIL', 'BAIL', 
                       'MAIL'])])

Current transformation count: 2
possible next nodes: {'BAIL', 'MAIN', 'SAIL'}
Current queue: deque([('MAIL', ['SAIL', 'BAIL', 
                                'MAIL']), 
                      ('BAIL', ['SAIL', 'MAIL', 
                                'BAIL']), 
                      ('MAIN', ['SAIL', 'MAIL', 
                                'MAIN'])])

Current transformation count: 3
possible next nodes: {'BAIL', 'MAIN', 'SAIL'}
Current queue: deque([('BAIL', ['SAIL', 'MAIL', 
                                'BAIL']), 
                      ('MAIN', ['SAIL', 'MAIL', 
                                'MAIN']), 
                      ('MAIN', ['SAIL', 'BAIL', 
                                'MAIL', 'MAIN'])])

Current transformation count: 3
possible next nodes: {'SAIL', 'MAIL'}
Current queue: deque([('MAIN', ['SAIL', 'MAIL', 
                                'MAIN']), 
                      ('MAIN', ['SAIL', 'BAIL', 
                                'MAIL', 'MAIN'])])

Current transformation count: 3
possible next nodes: {'RAIN', 'MAIL'}
Current queue: deque([('MAIN', ['SAIL', 'BAIL', 
                                'MAIL', 'MAIN']), 
                      ('RAIN', ['SAIL', 'MAIL', 
                                'MAIN', 'RAIN'])])

Current transformation count: 4
possible next nodes: {'RAIN', 'MAIL'}
Current queue: deque([('RAIN', ['SAIL', 'MAIL', 
                                'MAIN', 'RAIN']), 
                      ('RAIN', ['SAIL', 'BAIL', 
                        'MAIL', 'MAIN', 'RAIN'])])

Current transformation count: 4
possible next nodes: {'MAIN', 'RUIN'}
found endword at path: ['SAIL', 'MAIL', 'MAIN', 
                                        'RAIN']

Transformations needed:  4
Overall path: ['SAIL', 'MAIL', 'MAIN', 
                               'RAIN', 'RUIN']

Complexity

For above code that I used to find the shortest path for transformation:

Time

In next_nodes, for each word in the word list, we iterated over its length to find all the intermediate words corresponding to it. Thus we did M×N iterations, where M is the length of each word and N is the total number of words in the input word list. Further, to form an intermediate word, it takes O(M) time. This adds up to O(M2×N).

In ladderLength, BFS can go to each of the N words and for each word, we need to examine M possible intermediate words. This adds up to O(M2×N).

Overall, it adds up to O2(M2×N) which would be called O(M2×N).

Space

In next_nodes, each word in the word list would have M intermediate combinations. For every word we need a space of M2 to save all the transformations corresponding to it. Thus, it would need a total space of O(M2×N).

In ladderLength, BFS queue would need a space of O(M×N)

Overall, it adds up to O(M2×N) + O(M×N) which would be called O(M2×N)

Wrap Up

It could be little tricky and thus would need some practice to visualize the graph as well to write code for it.

Great, so now we know how to solve problems like word ladder problem. It also touch based other related common graph algorithms that we can refer to.

I had a read of the following reference and it has much more details if needed.


Keep problem solving!

samples GitHub Profile Readme
Learn Python – Beginners step by step – Basics and Examples
Sandeep Mewara Github
Sandeep Mewara Learn By Insight
Matplotlib plot samples
Sandeep Mewara Github Repositories

Learn Microsoft Tech via Videos & Live Streams

Recently, Microsoft has invested and made extra efforts into putting up the resources to learn.

microsoft-learn

Recordings …

This September, Microsoft launched Video Hub, a new resource home for Tech Community Videos and Interactive Guides to help learn everything about major Microsoft products.

Access: Microsoft Tech Community Video Hub

Live …

Yesterday (Mid October), Microsoft launched .NET Live TV, a one stop shop for all .NET and Visual Studio Live Streams across Twitch and YouTube.

To view, visit: .NET Live TV

Sometime back, Microsoft launched Learn TV. It is a place to find the latest digital content to be updated on the latest announcements, features, and products from Microsoft.

To view, visit: Learn TV

Discover your path …

As shared on the Microsoft’s Learning site, irrespective of which stage you are in your carrier – a fresher or an experience professional, material can help learn hands on and gain confidence faster.

You can choose topics, learn at your own schedule and even do Microsoft based certifications. It even has suggestions and ratings for popular topics to guide further.

Explore: Microsoft Learn

Well, seems knowledge is all there and pretty organized for the taking.


Keep learning!

samples GitHub Profile Readme
Learn Python – Beginners step by step – Basics and Examples
Sandeep Mewara Github
Sandeep Mewara Learn By Insight
Matplotlib plot samples
Sandeep Mewara Github Repositories

Just reverse alphabets in a string?

Last week it was a simple problem that we discussed to tangle our brain.

string-reverse

Reverse only alphabets in a provided string in a most efficient way. (Special characters, numbers, etc should continue to stay at their original place)

First thought …

Okay, so first thought was to make use of an extra space and then we can get our desired result with two traversals of the string. Yeah, but let’s optimize.

Updated thought …

We can make the swaps in place in a single traversal. A good stopping criteria seems to be when index while moving from front crosses the index moving backwards from end.

Let’s write code:

static void Main(string[] args)
{
    Console.WriteLine($"Please enter a string:");

    // Ignore casing
    var inputString = Console.ReadLine().ToLower();
    char[] inputArray = inputString.ToCharArray();
    ReverseAlphabetsOnly(inputArray);        
    Console.WriteLine(
            $"Reversed: {new String(inputArray)}");
}

static void ReverseAlphabetsOnly(char[] inputArray)
{
    int frontIndex = 0;
    int endIndex = inputArray.Length-1;
    char temp;

    while(frontIndex < endIndex)
    {
        if(!IsAlphabet(inputArray[frontIndex]))
            frontIndex++;
        else if(!IsAlphabet(inputArray[endIndex]))
            endIndex--;    
        else
        {
            temp = inputArray[frontIndex];
            inputArray[frontIndex] 
                 = inputArray[endIndex];
            inputArray[endIndex] = temp;

            frontIndex++;
            endIndex--;
        }
    }
}

static bool IsAlphabet(char x) 
{ 
    return ( (x >= 'a' && x <= 'z') 
            || (x >= 'A' && x <= 'Z') ); 
}

// Input:  Le@rn By In$ig#t...
// Output: tg@in Iy Bn$re#L...

Closure …

Approach looks good, as it would be at maximum a single traversal with no extra space used. Thus we are able to solve it with an overall Order of Time complexity O(n) & Space complexity O(1).


Happy solving …

samples GitHub Profile Readme
Learn Python – Beginners step by step – Basics and Examples
Sandeep Mewara Github
Sandeep Mewara Learn By Insight
Matplotlib plot samples
Sandeep Mewara Github Repositories

Dwell in Data Science world

Let’s walk into the Data Science world and see what it means and how does it connect with other terms that we often hear in the context – Artificial Intelligence & Machine Learning.

data-science-explore

What is Data Science?

In a context of a particular domain, data science is gaining insights into data through statistics, computation and visualization. Below diagram represents this multi-focal field well:

data-science-venn

For a real world problem, based on statistics, we make certain assumptions. Based on assumptions, we make a learning model using mathematics. With this model, we make software that can help validate and solve the problem. This leads to solving complex problems fast and more accurately.

How to use Data Science?

Clearly, it can be defined as a process. When followed, it would help move towards destination with proper next steps:

datascience-process
Adapted from: Harvard Data Science lecture slide

There are multiple iterative stages that solves specific queries. Based on answers to the queries, we might have to circle back and reassess the steps of the entire process.

Various stages involved?

Once we have a defined process, it’s easier to break it down into different functional groups. It would help us interpret how to visualize, connect them and know who can help us at each of those stage:

what-is-data-science
Credit: Rubens Zimbres article

There is a strong correlation of business intelligence with data science here. Current advancements in algorithms and tools has helped us improve accuracy for each of the stages above.

Where does AI or ML fits in?

Data Science, Artificial Intelligence & Machine Learning are different but often used interchangeably. There are overlaps and a part covers all of them together:

data-science-overlap

On a high level, part of the data science provides processed data. AI or ML or DL helps to process the data to get the needed output.

Artificial Intelligence (AI)

It is a program that enables machine to mimic human behavior. As such, goal here is to understand human intelligence, learn to imitate and act accordingly. I came across a good AI exhibit by BCG:

ai-bcg-analysis
Credit: BCG Group tweet

Self driving cars and route change suggestion are few common AI solutions

Machine Learning (ML)

These are AI techniques that enables machines to learn from examples, without being explicitly programmed to do so. It incorporates mathematics and statistics to learn from itself and improve with experience.

ml-sample

Recommendation engines & Spam email classification are few common ML solutions

I will cover Machine Learning in much more detail in later posts.

Deep Learning (DL)

These are subset of ML that makes the computation of multi-layer neural network feasible. It does not require feature selection/engineering. They classify information similar to human brains. They often continue to improve as the size of your data increases.

dl-sample

Face detection and number recognition are few common DL solutions

Moving On …

Overall, Data Science is more about extracting insights to build a robust strategy for business solution and is different from AI or ML.

To read more around differences between the Data Science and other terms, Vincent Granville has shared in detail here.

With above, we have entered into the Data Science world. Going forward, I will concentrate more on Machine Learning aspect for now.


Keep exploring!

samples GitHub Profile Readme
Learn Python – Beginners step by step – Basics and Examples
Sandeep Mewara Github
Sandeep Mewara Learn By Insight
Matplotlib plot samples
Sandeep Mewara Github Repositories

Three A’s of the Next Gen World!

There are many new ideas and research going on in today’s tech world. I believe the potential of maximum disruption is by the three A’s – Analytics, Automation & Artificial Intelligence. Because of their scope of evolution and impact, they tend to be the foundation of the next generation tech world.
.

next-gen

A natural progressive way of doing it would be to:

  • first gather data (analytics)
  • define what’s crucial or repetitive that can be replaced to ease the flow (automate)
  • collate such multiple replaced things to move into predictive analysis and actions to complete a complex task (artificial intelligence)

Day by day, it looks more feasible to achieve above because of technology innovations and the processing power.

Analytics

It is the first step towards anything more concrete and is a great supportive to theories and assumptions. Enables to have data backed decisions which are easy to relate and comprehend.

This would mean collecting data and making better use of it. This would need to have a data architecture setup. With it, it would be easier to know and decide what and how to collect. Post it, using any data visualization tool, it would be easier to deduce insights, patterns and plan accordingly.

It’s not about just any or more data, but the right, quality data that would matter

Automation

It is solving for complex processes using technology. Data analytics help a great deal identify and solve for such processes. These repetitive mundane tasks can be automated and the efforts saved from it can be put elsewhere. Thus helps in concentrating on tasks that needs more human attention.

Just like machines brought industrial revolution to factory floor, automation has similar potential to transform most of our day to day work. This could lead to enhanced productivity, thus better outcomes, leading to more accurate predictions and optimizations.

Only way to scale the massive amount of work in defined time without hiring an army of professionals

Artificial Intelligence

This is exploring the world that was considered impossible a decade ago by most of us. It is easier to relate and understand when comparing with human capabilities like recognizing handwriting or identifying a particular disease cell. Not just it, AI has great potential to automate non-routine tasks.

With AI, we can analyze data, understand concepts, make assumptions, learn behaviors and provide predictions at a scale with detail that would be impossible for individual humans. It can also have self-learning to improve with more and more interactions with data and human.

AI evolution is being greatly supported with advanced algorithms and improved computing power & storage.

Potential

There is a nice data backed assessment and shareout by McKinsey Global Institute on the three A’s. I would suggest a must read of it. In there, they shared AI/ML potential usecases across industries.

ai-ml-potential-industries
Credit: McKinsey & Co.

AI and Automation will provide a much-needed boost to global productivity and may help some ‘moonshot’ challenges.

McKinsey Global Institute

Wrap Up

AI combined with various Process Automation and powerful Data Analytics transforms into an intelligent automated solution. Potential of these solutions are huge.

It would be more appropriate to say that it’s no more a choice but compulsion for all sectors to go through this digital transformation. Those who are able to do it will be setup sooner for the next generation of the tech world. It will provide them with an edge over their competitors, putting them in a position to take advantage big time.


Keep exploring!

samples GitHub Profile Readme
Learn Python – Beginners step by step – Basics and Examples
Sandeep Mewara Github
Sandeep Mewara Learn By Insight
Matplotlib plot samples
Sandeep Mewara Github Repositories

Linear time partition – a three way split

Linear-time partition is a divide & conquer based selection algorithm. With it, data is split into three groups using a pivot.
.

linear-time-partioning

An integral part of Quick Sort algorithm which uses this partitioning logic recursively. All the elements smaller than the pivot are put on one side and all the larger ones on the other side of the pivot.

Similar to discussion of Dynamic Programming, this algorithm plays on solving sub-problems to solve complex problem.

Algorithm

Post selecting the pivot, Linear-time partition routine separates the data into three groups with values:

  • less than the pivot
  • equal to the pivot
  • greater than the pivot

Generally, this algorithm is done in place. This results in partially sorting the data. There are handful of problems that makes use of this fact, like:

  • Sort an array that contains only 0’s, 1’s & 2’s
  • Dutch national flag problem
  • Print all negative integers followed by positive for an array full of them
  • Print all 0’s first and then 1’s or vice-versa for an array with only 0’s & 1’s
  • Move all the 0’s to the end maintaining relative order of other elements for an array of integers

If done out of place, (i.e. not changing the original data), it would cost O(n) additional space

Example

Let’s take an example of: sort a array that contains only 0’s, 1’s & 2’s

First thought for such problem is to perform a count of 0’s, 1’s and 2’s. Once we have the counts, reset the array with them. Though it has time complexity O(n), it takes two traversal of the array or uses an extra array.

Below is an attempt to solve using Linear-time partition algorithm to avoid that extra traversal/space.

def threeWayPartition(A):
    start = mid = 0
    end = len(A)-1
    
    # define a Pivot
    pivot = 1
    
    while (mid <= end):
        # mid element is less than pivot
        # current element is 0
        
        # so lets move it to start
        # current start is good. 
        # move start to next element
        # move mid to next element to move forward
        if (A[mid] < pivot) :
            swap(A, start, mid)
            start = start + 1
            mid = mid + 1
            
        # mid element is more than pivot
        # current element is 2
        
        # so lets move it to end
        # current end is good. 
        # move end to previous element
        elif (A[mid] > pivot) :
            swap(A, mid, end)
            end = end - 1
        
        # mid element is same as pivot
        # current element is 1
        
        # just move forward: 
        # mid to next element
        else :
            mid = mid + 1
            
# Swap two elements A[i] and A[j] in the list
def swap(A, i, j):
    A[i], A[j] = A[j], A[i]


# Define an array
inputArray = [0, 1, 2, 2, 1, 0, 0, 2]

# Call the Linear-time partition routine
threeWayPartition(inputArray)

# print the final result
print(inputArray)

# Outputs
# [0, 0, 0, 1, 1, 2, 2, 2]

With a defined pivot, we segregated the data on the either side which resulted in desired output. Dutch nation flag problem or printing all negative first and then positive, or printing all 0s first follows the same code.

For moving all 0’s to end maintaining other elements order, we do a tweak in swap index to maintain order:

def threeWayPartition(A):
    current = 0
    nonzero = 0
    end = len(A)-1
    
    # define a Pivot
    pivot = 0
    
    while (current <= end):
        if (A[current] != pivot) :
            swap(A, current, nonzero)
            nonzero = nonzero + 1
        current = current + 1
            
# Swap two elements A[i] and A[j] in the list
def swap(A, i, j):
    A[i], A[j] = A[j], A[i]


# Define an array
inputArray = [7,0,5,1,2,0,2,0,6]

# Call the Linear-time partition routine
threeWayPartition(inputArray)

# print the final result
print(inputArray)

# Output
# [7, 5, 1, 2, 2, 6, 0, 0, 0]

Complexity

With above algorithm approach, we solved our problem with Time complexity O(n) & Space complexity O(1) (with single traversal of the array)


It was fun solving!

samples GitHub Profile Readme
Learn Python – Beginners step by step – Basics and Examples
Sandeep Mewara Github
Sandeep Mewara Learn By Insight
Matplotlib plot samples
Sandeep Mewara Github Repositories

Data Science Virtual Conference 2020

Open Data Science Conference (ODSC) is one of the biggest specialized data science upcoming event, scheduled on December 8-9, 2020. Read details about the event here. Theme for the year is RETHINK AI.

Virtual Conference is spread across two days:
– December 8th – ODSC India
– December 9th – ODSC Asia & Pacific

The Largest Applied Data Science & AI Conference Returns to India!

Call for speakers are open for submission. Registration is open to book seat.

data-science-conf-2020

Learn the latest AI & data science topics, tools, and languages from some of the best and brightest minds in the field.

Event highlight shared by ODSC team

The conference promises to accelerate your knowledge on data science and related disciplines with insightful sessions, workshops and speakers from various fields. There would be speakers that include core contributors to many open source libraries and languages.

Happy connecting!

samples GitHub Profile Readme
Learn Python – Beginners step by step – Basics and Examples
Sandeep Mewara Github
Sandeep Mewara Learn By Insight
Matplotlib plot samples
Sandeep Mewara Github Repositories