欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

机器学习系列之聚类

程序员文章站 2022-07-14 19:33:19
...

三种常用的聚类算法:

    (1)基本K均值:基于原型的,划分的聚类技术,试图从全部数据对象中发现用户指定个数的簇。

    (2)凝聚层次聚类:开始每个点各成一簇,然后重复的合并两个最近的簇,直到指定的簇个数。

    (3)DBSCAN:一种划分的,基于密度的聚类算法。

优缺点,不再赘述,参考https://www.cnblogs.com/giserliu/archive/2015/04/05/4394807.html

#!/usr/bin/env python
# -*- coding:utf-8 -*-
__author__ = 'Great'
"""
指定三个质心
计算每个点到三个质心的距离
指派每个点到簇
更新质心
记录质心
显示分类及质心轨迹
"""
import numpy as np
a = np.random.randint(1,100, 80, dtype = 'int64')
b = np.random.randint(1, 150, 80)
points = []
for i in range(80):
    points.append((a[i], b[i]))

import pylab as pl

#指定质心
current_point1 = [22, 111]
current_point2 = [100, 12]
current_point3 = [50, 56]
#显示质心
pl.plot([current_point1[0]], [current_point1[1]], 'ok')
pl.plot(current_point2[0], current_point2[1], 'ok')
pl.plot(current_point3[0], current_point3[1], 'ok')

#记录质心轨迹
current1 = [current_point1]
current2 = [current_point2]
current3 = [current_point3]

#三个聚类簇
group1 = []
group2 = []
group3 = []

for cost_time in range(100):
    group1 = []; group2 = []; group3 = []

    for onepoint in points:
        distance1 = pow(abs(onepoint[0]-current_point1[0]), 2) + pow(abs(onepoint[1]-current_point1[1]), 2)
        distance2 = pow(abs(onepoint[0] - current_point2[0]), 2) + pow(abs(onepoint[1] - current_point2[1]), 2)
        distance3 = pow(abs(onepoint[0] - current_point3[0]), 2) + pow(abs(onepoint[1] - current_point3[1]), 2)

        #指派到最近的簇
        min_len = min(distance1, distance2, distance3)
        if min_len == distance1:
            group1.append(onepoint)
        if min_len == distance2:
            group2.append(onepoint)
        if min_len == distance3:
            group3.append(onepoint)

    #更新质心
    current_point1 = [sum([onepoint[0] for onepoint in group1])/len(group1), sum([onepoint[1] for onepoint in group1])/len(group1)]
    current_point2 = [sum([onepoint[0] for onepoint in group2])/len(group2), sum([onepoint[1] for onepoint in group2])/len(group2)]
    current_point3 = [sum([onepoint[0] for onepoint in group3])/len(group3), sum([onepoint[1] for onepoint in group3])/len(group3)]

    current1.append(current_point1)
    current2.append(current_point2)
    current3.append(current_point3)
#打印簇
pl.plot([onepoint[0] for onepoint in group1], [onepoint[1] for onepoint in group1], "or")
pl.plot([onepoint[0] for onepoint in group2], [onepoint[1] for onepoint in group2], "oy")
pl.plot([onepoint[0] for onepoint in group3], [onepoint[1] for onepoint in group3], "og")

#打印质心轨迹
for center in [current1, current2, current3]:
    pl.plot([eachcenter[0] for eachcenter in center], [eachenter[1] for eachenter in center])
#显示
pl.show()
#输出
print(current_point1, current_point2, current_point3)
print(group1)
print(group2)
print(group3)

机器学习系列之聚类

本文,只是实现了基本的K均值聚类算法,并且没有对聚类结果进行优化。只是熟悉下,相关的计算过程。其中样本集,为随机生成的。过程主要参照https://www.oschina.net/code/snippet_176897_14731

希望自己越来越好,加油!走在成长的路上。

相关标签: 聚类