This is to get started with pandas
and try few concrete examples. pandas
is a Python
based library that helps in reading, transforming, cleaning and analyzing data. It is built on the NumPy
package.
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
https://pandas.pydata.org
Key data structure in pandas is called DataFrame
– it helps to work with tabular data translated as rows of observations and columns of features.
Download or fork entire Jupiter notebook from my GitHub to play around: https://github.com/sandeep-mewara/python-examples
pandas basics includes:
- Series
- Dataframes
- Create
- from list of tuples
- from a dictionary
- from a CSV
- from built-in dataset (eg: from sklearn.datasets)
- Data retrieval
- Modifying data
- Group by operation
- Custom Functions – apply method
- Pre-Processing
- drop, mean, mode
- ordinal feature
- nominal feature
- Reshaping
- CrossTab
- Merge
- Melt
- Pivot
- Create
#
.info()
,.head()
,.sample
are handy method to use first off with dataframe to get a high level details# index may be not unique – can return multiple values
# boolean indexing (masking) can help select certain set of rows
#
.isin()
is a useful when building a boolean index#
.where()
is useful to retain shape of the original table# Column names & Indexes can be set if needed
# to modify the table right away, use
inplace=True
# aggregate operations can be applied on a
groupby
object#
Key learning’s …dropna
(),mean
() ormode
() are handy ways for pre-processing missing data
Examples notebook includes:
- Uber taxi drivers
- Apple stock price
- Day or Night
- Students marks
- Balance Calculator
#
.describe()
is a handy method to get the statistical summary of numerical columns#
one-hot-encoding
is really helpful for nominal features (that cannot be ordered)# converting the columns into right datatype helps
# converting data into meaningful numbers help for analysis
#
Key learning’s …groupby
is a powerful tool with dataframes for analysis
Cheat sheet
Download cheat sheet pdf from here
For more details about pandas, look at the documentation reference.
Keep learning!