Tuesday, 23 June 2020

Pandas for data exploration


Using Pandas for data exploration

Pandas is a python library that is used for data manipulation and analysis.In data science the key applications of pandas are


  • Reading and writing data 
  • Handling of missing data
  • Reshaping and pivoting of data
  • Label based slicing, indexing and sub-setting of large data sets
  • Column insertion, deletion and filtering
  • Merging and joining
Below are some important syntax that are used in analysis

#Syntax for importing pandas
Import pandas as pd

#Reading a csv file, after writing pd.read_csv, one can use Shift + Tab in jupyter notebook to explore the individual component in  the function
data = pd.read_csv(“data.csv”)

type(data) # To check the data type 

len(data) #To check the length of data 
data.shape # To check the dimension (rows, column)
data.head() # To check the top 5 rows
pd.set_option("display.max.columns", None) # To see all columns
data.tail() # To check the bottom 5 rows
data.info() # To provide information on non null count and data type for a column
data.describe() # To describe the data in terms of count, mean , std,min, 25%,50%,75%, max
data.describe(include=np.object)   # To explore object variables
data["column"].value_counts() # To explore categorical variables
data.loc[data["column1"] == "ABC", "column2"].value_counts() # To explore a column with some condition
data.loc[data["column"] == “A", "date_column"].min() 
data.loc[data["column"] == "A", "date_column"].max()
data.loc[data["column"] == "A", "date_column"].agg(("min", "max"))

No comments:

Post a Comment