pandas – get started with examples

This is to get started with pandas and try few concrete examples. pandas is a Python based library that helps in reading, transforming, cleaning and analyzing data. It is built on the NumPy package.

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

https://pandas.pydata.org


Key data structure in pandas is called DataFrame – it helps to work with tabular data translated as rows of observations and columns of features.

Download or fork entire Jupiter notebook from my GitHub to play around: https://github.com/sandeep-mewara/python-examples

pandas basics includes:

  • Series
  • Dataframes
    • Create
      • from list of tuples
      • from a dictionary
      • from a CSV
      • from built-in dataset (eg: from sklearn.datasets)
    • Data retrieval
    • Modifying data
    • Group by operation
    • Custom Functions – apply method
    • Pre-Processing
      • drop, mean, mode
      • ordinal feature
      • nominal feature
    • Reshaping
      • CrossTab
      • Merge
      • Melt
      • Pivot

# .info(), .head(), .sample are handy method to use first off with dataframe to get a high level details

# index may be not unique – can return multiple values

# boolean indexing (masking) can help select certain set of rows

# .isin() is a useful when building a boolean index

# .where() is useful to retain shape of the original table

# Column names & Indexes can be set if needed

# to modify the table right away, use inplace=True

# aggregate operations can be applied on a groupby object

# dropna(), mean() or mode() are handy ways for pre-processing missing data

Key learning’s …

Examples notebook includes:

  • Uber taxi drivers
  • Apple stock price
  • Day or Night
  • Students marks
  • Balance Calculator

# .describe() is a handy method to get the statistical summary of numerical columns

# one-hot-encoding is really helpful for nominal features (that cannot be ordered)

# converting the columns into right datatype helps

# converting data into meaningful numbers help for analysis

# groupby is a powerful tool with dataframes for analysis

Key learning’s …

Cheat sheet

Credit: Pandas website

Download cheat sheet pdf from here
For more details about pandas, look at the documentation reference.

Keep learning!

Leave a Reply