All about Artificial Intelligence, Machine Learning, Deep Learning and Data Science: Pandas for data exploration

Using Pandas for data exploration

Pandas is a python library that is used for data manipulation and analysis.In data science the key applications of pandas are

Reading and writing data
Handling of missing data
Reshaping and pivoting of data
Label based slicing, indexing and sub-setting of large data sets
Column insertion, deletion and filtering
Merging and joining

Below are some important syntax that are used in analysis

#Syntax for importing pandas
Import pandas as pd

#Reading a csv file, after writing pd.read_csv, one can use Shift + Tab in jupyter notebook to explore the individual component in the function

data = pd.read_csv(“data.csv”)

type(data) # To check the data type

len(data) #To check the length of data

data.shape # To check the dimension (rows, column)

data.head() # To check the top 5 rows

pd.set_option("display.max.columns", None) # To see all columns

data.tail() # To check the bottom 5 rows

data.info() # To provide information on non null count and data type for a column

data.describe() # To describe the data in terms of count, mean , std,min, 25%,50%,75%, max

data.describe(include=np.object) # To explore object variables

data["column"].value_counts() # To explore categorical variables

data.loc[data["column1"] == "ABC", "column2"].value_counts() # To explore a column with some condition

data.loc[data["column"] == “A", "date_column"].min()

data.loc[data["column"] == "A", "date_column"].max()

data.loc[data["column"] == "A", "date_column"].agg(("min", "max"))

All about Artificial Intelligence, Machine Learning, Deep Learning and Data Science

Tuesday, 23 June 2020

Pandas for data exploration

No comments:

Post a Comment

Blog Archive