Author : HASSAN MD TAREQ | Updated : 2020/09/17
What is pandas?
- pandas is a Python package / library
- A library for data manipulation, analysis, data science
- Python-based data analysis toolkit
- High-level data manipulation tool
- Built on the Numpy package
- Key data structure is called the DataFrame
Naming
- Pandas stands for “Python Data Analysis Library”
- The name ‘pandas’ is derived from the term “panel data”, an econometrics term for multidimensional structured data sets (Wikipedia)
Links
- Package overview
- https://pandas.pydata.org/docs/
- pandas API Reference
- https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
- https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/
Core components
- Series: essentially a column ((1-dimensional) )
- DataFrame: a multi-dimensional table made up of a collection of Series (2-dimensional)
A Pandas Series is one dimensioned whereas a DataFrame is two dimensioned.
Dataframe
Pandas DataFrame is a two-dimensional tabular data structure with labeled axes (rows and columns). Similar to Excel sheet.
Details: Pandas Dataframe
Getting Started
- Download and install Anaconda: https://www.anaconda.com/products/individual
- Start Anaconda Navigator
- Launch JupyterLab
Check version at top cell of JupyterLab
import pandas
pandas.__version__
Installing pandas separately
Conda
conda install pandas
pip
pip install pandas
More: https://pandas.pydata.org/docs/getting_started/install.html#installing-pandas