Author : MD TAREQ HASSAN | Updated : 2023/07/20
What is Dataframe?
- Pandas DataFrame is two-dimensional tabular data structure with labeled axes (rows and columns)
- A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns
- DataFrame is similar to table (rows & columns)
- DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables
- DataFrames make manipulating your data easy, from selecting or replacing columns and indices to reshaping your data
- Features of DataFrame (Courtesy: tutorialspoint.com/python_pandas_dataframe)
- Size – Mutable
- Labeled axes (rows and columns)
- Potentially columns are of different types
- Can Perform Arithmetic operations on rows and columns
Components
- index (rows)
- columns (series)
- data
Overview
Accessing series
Syntax 1: dataframe['SeriesName']
Syntax 2: dataframe.SeriesName
import pandas as pd
# dataframe
df = pd.read_csv("foo.csv")
Single series
df["bar"]
Multiple series
df[ ["baz", "bax"] ] # returns Dataframe
Sub-dataframe
import pandas as pd
# dataframe
df = pd.read_csv("foo.csv")
# sub-dataframe from dataframe
subdf = df[ ["Bar", "Baz"] ]
type(subdf) # pandas.core.frame.DataFrame
Shape
dataframe.shape
=> a tuple- Tuple:
(numberOfRows, numberOfColumns)
(zero-based index to access tuple elements)- Row count:
df.shape[0]
- Column count:
df.shape[1]
- Row count:
import pandas as pd
df = pd.read_csv("foo.csv")
shape = df.shape
rowCount = shape[0]
colCount = shape[1]
print(f"rowCount: {rowCount}")
print(f"colCount: {colCount}")
Head and tail
df.head(n)
=> first n rowsdf.tail(n)
=> last n rows- default value of n: 5 (for both head and tail)
df.head(n)
=> first 5 rowsdf.tail(n)
=> last 5 rows
Info
- Provides a summary of the data frame including the number of entries, the data type, and the number of non-null entries for each series in the data frame
- Syntax:
dataframe.info()