欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

Learning notes | Data Analysis: 1.1 data evaluation

程序员文章站 2022-10-16 16:59:37
| Data Evaluation | - Use Shift + Enter or Shift + Return to run the upper box so as to make it display the edited text format. - Markdown used for te ......

| data evaluation |

- use shift + enter or shift + return to run the upper box so as to make it display the edited text format. 

- markdown used for text writing, while the other is code cell used for code writing. 

import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
%matplotlib inline

 

  # import/load the data set use the read_csv function of pandas

shanghai_data = pd.read_csv('shanghaipm20100101_20151231.csv')

 

  # view the basic information of data by means of head, info and describe.

shanghai_data.head()
shanghai_data.info()

 

  # print type of python object

print(type(shanghai_data['cbwd'][0]))

 

  # change the space into an underline

shanghai_data.columns = [c.replace(' ', '_') for c in shanghai_data.columns]

 

  # convert the numerical value of 1, 2, 3, 4 to four corresponding seasons (by means of the map method of pandas):

shanghai_data['season'] = shanghai_data['season'].map({1:'spring', 2:'summer', 3:'autumn', 4: 'winter'})

 

- check data missing and data type:

  # print the length of data

print("the number of row in this dataset is ",len(shanghai_data.index))

 

  # calculating the number of records in column "pm_jingan"

print("there number of missing data records in pm_jingan is: ",len(shanghai_data.index) - len(shanghai_data['pm_jingan'].dropna()))

note: # “dropna()” function used in the following code can delete missing value in data.