Using
Pandas for data exploration
Pandas is a python library that is used for data manipulation and analysis.In data science the key applications of pandas are
Pandas is a python library that is used for data manipulation and analysis.In data science the key applications of pandas are
- Reading and writing data
- Handling of missing data
- Reshaping and pivoting of data
- Label based slicing, indexing and sub-setting of large data sets
- Column insertion, deletion and filtering
- Merging and joining
Below are some important syntax that are used in analysis
#Syntax for importing pandas
Import pandas as pd
#Reading a csv file, after writing pd.read_csv, one can use Shift + Tab in jupyter notebook to explore the individual component in the function
Import pandas as pd
#Reading a csv file, after writing pd.read_csv, one can use Shift + Tab in jupyter notebook to explore the individual component in the function
data
= pd.read_csv(“data.csv”)
type(data) # To check the data type
len(data) #To check the length of data
data.shape # To check the dimension (rows, column)
data.head() # To check the top 5 rows
pd.set_option("display.max.columns",
None) # To see all columns
data.tail() # To check the bottom 5 rows
data.info() # To provide information on non null count and data type for a column
data.describe() # To describe the data in terms of count, mean , std,min, 25%,50%,75%, max
data.describe(include=np.object) # To explore object variables
data["column"].value_counts()
# To explore categorical variables
data.loc[data["column1"]
== "ABC", "column2"].value_counts() # To explore a column with some condition
data.loc[data["column"]
== “A", "date_column"].min()
data.loc[data["column"]
== "A", "date_column"].max()
data.loc[data["column"]
== "A", "date_column"].agg(("min", "max"))
No comments:
Post a Comment