欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

tensorflow2.0入门 数据标准化 学习笔记

程序员文章站 2022-07-16 17:11:30
...

深度学习中图片数据一般需要归一化,或者标准化。
应用sklearn中数据标准化方法可以简化处理过程。

import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import sklearn
import pandas as pd
import tensorflow as tf
from tensorflow import keras

print(tf.__version__)

2.0.0
fashion_mnist = keras.datasets.fashion_mnist
(x_train_all,y_train_all),(x_test,y_test) = fashion_mnist.load_data()
x_valid,x_train = x_train_all[:5000],x_train_all[5000:]
y_valid,y_train = y_train_all[:5000],y_train_all[5000:]

print(x_valid.shape,y_valid.shape)
print(x_train.shape,y_train.shape)
print(x_test.shape,y_test.shape)
(5000, 28, 28) (5000,)
(55000, 28, 28) (55000,)
(10000, 28, 28) (10000,)
print(np.max(x_train),np.min(x_train))
255 0

归一化(Normalization),是为了将数据映射到(0,1)之间
标准化(Standardization),消除分布产生的度量偏差,服从标准正态分布N(0,1)。

#:x = (x-mu)/std 均值为0,方差为1

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
#x_train: [None,28,28] -> [None,784] -> [None,28,28],reshape(-1,1),自动计算行数,1列
#fit_transform要求2维输入,
#fit得到训练集的均值方差并记录,以后还要用到验证集和测试集。
#训练数据是0-255的整数,标准化过程中有除法,需转化为float32类型
x_train_scaled = scaler.fit_transform(
    x_train.astype(np.float32).reshape(-1,1)).reshape(-1,28,28)
x_valid_scaled = scaler.transform(
    x_valid.astype(np.float32).reshape(-1,1)).reshape(-1,28,28)
x_test_scaled = scaler.transform(
    x_test.astype(np.float32).reshape(-1,1)).reshape(-1,28,28)

print(np.max(x_train_scaled),np.min(x_train_scaled))
2.0231433 -0.8105136
#tf.keras.Sequential()

model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[28,28])) #输入28*28的图像,展平成28*28的1维向量
model.add(keras.layers.Dense(300,activation="relu")) #全连接层,300个神经元,**函数rule
model.add(keras.layers.Dense(100,activation="relu"))
model.add(keras.layers.Dense(10,activation="softmax"))#输出层为全连接层,10类,**函数softmax,获得每一类的概率

model.compile(loss="sparse_categorical_crossentropy",
             optimizer = "sgd",
             metrics = ["accuracy"])
model.layers 
model.summary() #模型的概况
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 300)               235500    
_________________________________________________________________
dense_1 (Dense)              (None, 100)               30100     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1010      
=================================================================
Total params: 266,610
Trainable params: 266,610
Non-trainable params: 0
_________________________________________________________________
history = model.fit(x_train_scaled,y_train,epochs=10,
                    validation_data=(x_valid_scaled,y_valid))
#epochs:训练次数
Train on 55000 samples, validate on 5000 samples
Epoch 1/10
55000/55000 [==============================] - 6s 106us/sample - loss: 0.5310 - accuracy: 0.8142 - val_loss: 0.3946 - val_accuracy: 0.8606
Epoch 2/10
55000/55000 [==============================] - 5s 84us/sample - loss: 0.3895 - accuracy: 0.8612 - val_loss: 0.3797 - val_accuracy: 0.8628
Epoch 3/10
55000/55000 [==============================] - 5s 90us/sample - loss: 0.3516 - accuracy: 0.8735 - val_loss: 0.3500 - val_accuracy: 0.8778
Epoch 4/10
55000/55000 [==============================] - 5s 88us/sample - loss: 0.3267 - accuracy: 0.8832 - val_loss: 0.3321 - val_accuracy: 0.8786
Epoch 5/10
55000/55000 [==============================] - 5s 84us/sample - loss: 0.3088 - accuracy: 0.8886 - val_loss: 0.3473 - val_accuracy: 0.8698
Epoch 6/10
55000/55000 [==============================] - 5s 90us/sample - loss: 0.2933 - accuracy: 0.8941 - val_loss: 0.3127 - val_accuracy: 0.8862
Epoch 7/10
55000/55000 [==============================] - 5s 89us/sample - loss: 0.2795 - accuracy: 0.8994 - val_loss: 0.3015 - val_accuracy: 0.8894
Epoch 8/10
55000/55000 [==============================] - 5s 82us/sample - loss: 0.2656 - accuracy: 0.9041 - val_loss: 0.3118 - val_accuracy: 0.8864
Epoch 9/10
55000/55000 [==============================] - 5s 93us/sample - loss: 0.2561 - accuracy: 0.9072 - val_loss: 0.3165 - val_accuracy: 0.8878
Epoch 10/10
55000/55000 [==============================] - 5s 86us/sample - loss: 0.2463 - accuracy: 0.9109 - val_loss: 0.2931 - val_accuracy: 0.8936
def plot_learning_curve(history):
    pd.DataFrame(history.history).plot(figsize=(8,5))
    plt.grid(True)
    plt.gca().set_ylim(0,1)
    plt.show()
#DateFrame 数据类型的作图
plot_learning_curve(history)

tensorflow2.0入门 数据标准化 学习笔记

#在测试集评估
y = model.evaluate(x_test_scaled,y_test)

0s 44us/sample - loss: 0.2366 - accuracy: 0.8833