Author : HASSAN MD TAREQ

What is Dataframe?

  • Pandas DataFrame is two-dimensional tabular data structure with labeled axes (rows and columns)
  • A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns
  • DataFrame is similar to table (rows & columns)
  • DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables
  • DataFrames make manipulating your data easy, from selecting or replacing columns and indices to reshaping your data
  • Features of DataFrame (Courtesy: tutorialspoint.com/python_pandas_dataframe)
    • Size – Mutable
    • Labeled axes (rows and columns)
    • Potentially columns are of different types
    • Can Perform Arithmetic operations on rows and columns

Components

  • index (rows)
  • columns (series)
  • data

Overview

Pandas Dataframe overview Step 1

Pandas Dataframe overview Step 2

Accessing series

Syntax 1: dataframe['SeriesName']
Syntax 2: dataframe.SeriesName

import pandas as pd

# dataframe
df = pd.read_csv("foo.csv")

Single series

df["bar"]

Multiple series

df[ ["baz", "bax"] ]    # returns Dataframe

Sub-dataframe

import pandas as pd

# dataframe
df = pd.read_csv("foo.csv")

# sub-dataframe from dataframe
subdf = df[ ["Bar", "Baz"] ]

type(subdf)    # pandas.core.frame.DataFrame

Shape

  • dataframe.shape => a tuple
  • Tuple: (numberOfRows, numberOfColumns) (zero-based index to access tuple elements)
    • Row count: df.shape[0]
    • Column count: df.shape[1]
import pandas as pd
df = pd.read_csv("foo.csv")

shape = df.shape

rowCount = shape[0]
colCount = shape[1]

print(f"rowCount: {rowCount}")
print(f"colCount: {colCount}")

Head and tail

  • df.head(n) => first n rows
  • df.tail(n) => last n rows
  • default value of n: 5 (for both head and tail)
    • df.head(n) => first 5 rows
    • df.tail(n) => last 5 rows

Info

  • Provides a summary of the data frame including the number of entries, the data type, and the number of non-null entries for each series in the data frame
  • Syntax: dataframe.info()