欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

100 Days Of ML Code:Day 7/11-KNN

程序员文章站 2022-07-14 20:32:37
...

100天机器学习挑战汇总文章链接在这儿

目录

Step1:数据预处理

Step2:将KNN应用于训练集

Step3:预测


100 Days Of ML Code:Day 7/11-KNN

Step1:数据预处理

因为用的是同一个数据集,这一步与Day6逻辑回归做的完全一致。

import pandas as pd
import numpy as np

df = pd.read_csv('Social_Network_Ads.csv')
# print(df)
X = df.iloc[:, 2:4].values
Y = df.iloc[:, 4].values
# print(X)
# print(Y)

from sklearn.cross_validation import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25, random_state=0)
# print(X_train)

# feature scaling
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)
# print(X_train)
# print(X_test)

Step2:将KNN应用于训练集

KNeighborsClassifier的指导页面在这儿

from sklearn.neighbors import KNeighborsClassifier
k =5  # k is the number of nearest neighbor
neigh = KNeighborsClassifier(n_neighbors=k)
neigh.fit(X_train, Y_train)

标答中对KNN分类器的设定是:

classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)

这两个parameters的含义是:

p : integer, optional (default = 2)

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

metric : string or callable, default ‘minkowski’

the distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of the DistanceMetric class for a list of available metrics.

Step3:预测

Y_pred = neigh.predict(X_test)
# print(Y_pred)

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(Y_test, Y_pred)
# print(cm)

 

相关标签: KNN