pandas – get started with examples

August 30, 2020September 25, 2020Sandeep Mewara Leave a comment

This is to get started with pandas and try few concrete examples. pandas is a Python based library that helps in reading, transforming, cleaning and analyzing data. It is built on the NumPy package.

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
https://pandas.pydata.org

Key data structure in pandas is called DataFrame – it helps to work with tabular data translated as rows of observations and columns of features.

Download or fork entire Jupiter notebook from my GitHub to play around: https://github.com/sandeep-mewara/python-examples

pandas basics includes:

Series
Dataframes
- Create
  - from list of tuples
  - from a dictionary
  - from a CSV
  - from built-in dataset (eg: from sklearn.datasets)
- Data retrieval
- Modifying data
- Group by operation
- Custom Functions – apply method
- Pre-Processing
  - drop, mean, mode
  - ordinal feature
  - nominal feature
- Reshaping
  - CrossTab
  - Merge
  - Melt
  - Pivot

# .info(), .head(), .sample are handy method to use first off with dataframe to get a high level details
# index may be not unique – can return multiple values
# boolean indexing (masking) can help select certain set of rows
# .isin() is a useful when building a boolean index
# .where() is useful to retain shape of the original table
# Column names & Indexes can be set if needed
# to modify the table right away, use inplace=True
# aggregate operations can be applied on a groupby object
# dropna(), mean() or mode() are handy ways for pre-processing missing data
Key learning’s …

Examples notebook includes:

Uber taxi drivers
Apple stock price
Day or Night
Students marks
Balance Calculator

# .describe() is a handy method to get the statistical summary of numerical columns
# one-hot-encoding is really helpful for nominal features (that cannot be ordered)
# converting the columns into right datatype helps
# converting data into meaningful numbers help for analysis
# groupby is a powerful tool with dataframes for analysis
Key learning’s …

Cheat sheet

Download cheat sheet pdf from here
For more details about pandas, look at the documentation reference.

Keep learning!

Python as statistics workbench

August 22, 2020August 30, 2020Sandeep Mewara Leave a comment

While reading for AI/ML (Artificial Intelligence/Machine Learning), I came across a discussion – if Python can be used as a “statistics workbench” to replace R, SPSS, etc? It was nice shareout by multiple knowledge folks related to languages used for problems of statistics, specifically R (read about R here).

Discussion here: https://stats.stackexchange.com/questions/1595/python-as-a-statistics-workbench

For quick reference, I will quote few of the latest thoughts from there that are in favor of Python and how it has evolved. I too conquer with most of them:

1. Python is easily the most intuitive syntax of any programming language. This makes for extremely fast development time.
2. Python is performant. It opens large datasets reliably.
3. The packages in Python are fast catching up to R’s packages. Python usage has increased tremendously last few years.
4. Readability is one of the most important qualities good code can possess, and Python is one of the most readable language.
5. Python has an extremely well-thought-out IDE now: PyCharm & Visual Studio Code.
https://stats.stackexchange.com/a/457753

Overall, Python is a general purpose language with an easy to understand syntax which would be relatively easier for usual programmers to learn/adopt. R is developed keeping statisticians in mind. Thus it has many features around data visualization and is a tad ahead currently.

A little research …

Recently DataCamp too published an article comparing R and Python for data analysis. There is a nice comparison in it on various parameters, picking just couple of them here:

Final analysis in the paper shares R being ahead in comparison for data analysis but Python having potential to catch up quickly and easily.

My thoughts …

My intent was to understand which of the programming language serves as an essential tool to demonstrate AI/ML capabilities. Looking at them, Python seems good enough for me to serve as AI/ML tool to start and probably conquer it.

Ammunition needed …

There are many python based libraries and packages that are generally used for statistical work. Below are few of them that would help in our data analysis exploration going ahead:

scipy – python-based ecosystem of open-source software for mathematics, science, and engineering.
- cookbook – many statistical facilities, a collection of various user-contributed recipes already available
- numpy – base N-dimensional array package. Handful of example lists here
- pandas – a fast, powerful, flexible and easy to use data analysis and manipulation tool
- matplotlib – a comprehensive library for creating static, animated, and interactive visualizations
scikit-learn – simple and efficient machine learning tools for predictive data analysis
keras – API for deep learning
tensorflow – API to develop and train ML models

Since I am a programmer, I maybe be biased here. But, it seems Python can and does all the needful to start with AI/ML journey.

Happy learning!

NumPy – Basics & Examples

August 15, 2020September 27, 2020Sandeep Mewara Leave a comment

This is to get started with NumPy and try few concrete examples. NumPy (Numerical Python) are packages for numerical computation designed for efficient work on large data sets.

Entire Jupiter notebook can be downloaded or forked from my GitHub to play around: https://github.com/sandeep-mewara/python-examples

Reference: https://numpy.org/learn/

NumPy basics includes:

Initialize Matrix via
- List
- NULL Matrix
- IDENTITY Matrix
- ONES Matrix
Matrix Transpose
Matrix Indexing
Simulation
Basic CSV file operations
Matrix Broadcasting
Basic Image Processing

# matrix in python is list of a list
# arrays are compatible for broadcasting when the trailing dimensions match or either of them is of length 1
# image when read as numbers, the values are between 0 & 1
Key learning’s …

Examples notebook includes:

Random walk simulation
Triangle simulation
Random Number
Correlation co-efficient
Mean/Variance of crude oil

# masking helps get all the values back that satisfy the mask
# cumsum() is a handy function for cumulative sum
# there are handy methods for random number generation
Key learning’s …

For learning more about NumPy, look here: https://numpy.org/doc/stable/

Keep learning!

Python – Basics & Examples

July 25, 2020September 25, 2020Sandeep Mewara Leave a comment

This is to get started with Python and try few concrete examples. It should help beginners to learn or others to do a quick revision without getting too deep.

Entire Jupyter notebook can be downloaded or forked from my GitHub to look or play around: https://github.com/sandeep-mewara/python-examples

I started Python programming using Jupiter notebook web application. Later, I moved to Visual Studio Code that looked much user friendly.
A guide on how to setup VS Code for Python is here.

Python basics includes:

Variables
Conditional statements
String manipulations
Type conversion
Formatting strings
Data Structure – List, Tuple
Functions
List comprehension
Zip & Pack

# items are indexed by integers, starting from 0.
# % is a format operator and %d, %s, %f are special format sequences
# negative index is used to access list elements from the end
# [start:end:step] Returns a new list from start to end-1 with default step 1
# zip can merge two lists into a list of tuples
Key learning’s …

Examples notebook includes:

Palindrome
Sum of Squares
Sort students marks list
Format students marks list
Word Frequency

# sometimes anonymous functions are enough
# storing data in dictionary as key-value pair helps
Key learning’s …

Keep learning!

Learn by Insight…

Explore & Share

Tag AI/ML

pandas – get started with examples

pandas basics includes:

Examples notebook includes:

Cheat sheet

Python as statistics workbench

A little research …

My thoughts …

Ammunition needed …

NumPy – Basics & Examples

NumPy basics includes:

Examples notebook includes:

Python – Basics & Examples

Python basics includes:

Examples notebook includes: