欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

《一个图像复原实例入门深度学习&TensorFlow—第十篇》训练过程可视化

程序员文章站 2024-03-19 18:27:04
...

训练过程可视化 TensorBoard

漫漫训练路,孤独的等待岂不是太过寂寞?让我们在模型训练过程中也找点事做吧,那就是用好TensorBoard(在我们安装tensorflow的时候tensorboard就被一起安装了,所以不用conda install tensorboard了,直接能在我们创建的环境中使用)

1. TensorBoard简介

这里参考《TensorFlow实战谷歌深度学习框架》第9章的内容

TensorBoard是TensorFlow提供的可视化工具,它可以通过TensorFlow程序运行过程中输出的日志文件可视化TensorFlow程序的运行状态。TensorBoard和TensorFlow跑在不同的进程中,TensorBoard会自动读取最新的TensorFlow日志文件,并呈现当前TensorFlow程序运行的最新状态。

下面的代码展示了一个简单的TensorFlow程序,在这个程序中完成了TensorBoard日志输出的功能。(构建计算图并不需要在Session中运行哦,定义计算流程后就会自动生成计算图)

import tensorflow as tf
tf.reset_default_graph()                         # 重置计算图
TensorBoard_path = 'E:\\MNIST_data\\TensorBoard' # tensorboard日志保存位置

# 定义一个简单的计算图,实现向量的加法操作
input1 = tf.constant([1.0,2.0,3.0],name='input1')
input2 = tf.constant([4.0,5.0,6.0],name='input2')
output = tf.add_n([input1,input2],name='add')

#生成一个写日志的writer,并将当前的TensorFlow计算图写入日志,日志保存在TensorBoard_path中生成的.tfevents文件中
summary_writer = tf.summary.FileWriter(TensorBoard_path,tf.get_default_graph())
summary_writer.close()

在Spyder(TensorFlow)中运行代码,然后打开Anaconda Prompt,输入activate tensorflow进入我们创建的环境,然后输入:tensorboard.exe –logdir=E:\MNIST_data\TensorBoard 得到:
《一个图像复原实例入门深度学习&TensorFlow—第十篇》训练过程可视化
在浏览器中打开网址:http://HP-HP:6006 (不同IP会得到不同的网址,你要输入你自己网址)
进入网址,我们得到下图:(红色方框中的图就是我们陌生又熟悉的计算图的庐山真面目)
《一个图像复原实例入门深度学习&TensorFlow—第十篇》训练过程可视化
很容易吧,调用tf.summary.FileWriter函数就ok啦,来让我们看看之前构建的网络的计算图是啥样的。
代码:(train.py增加了生成计算图日志的函数)

import time
time_start=time.time() # time.time()为1970.1.1到当前时间的毫秒数
import tensorflow as tf
import numpy as np

import input_data  # 导入与输入数据相关的操作
import model       # 导入模型

img_W = 28                                                               # 图像宽度
img_H = 28                                                               # 图像高度
batch_size = 10                                                          # 每个mini-batch含有的样本数量
min_after_dequeue = 1000                                                 # 队列中最少文件数量
capacity = min_after_dequeue + 3*batch_size                              # 队列中最多文件数量

train_image_path = 'E:\\MNIST_data\\train_images\\'                      # 输入图像的路径
train_label_path = 'E:\\MNIST_data\\train_labels\\'                      # 输出图像的路径
Train_TFRecord_path = 'E:\\MNIST_data\\tfrecord\\train_data_set.tfrecord'# 输出TFRecord文件的路径

test_image_path = 'E:\\MNIST_data\\test_images\\'                        # 输入图像的路径
test_label_path = 'E:\\MNIST_data\\test_labels\\'                        # 输出图像的路径
Test_TFRecord_path = 'E:\\MNIST_data\\tfrecord\\test_data_set.tfrecord'  # 输出TFRecord文件的路径
model_save_path = 'E:\\MNIST_data\\models\\conv_1.ckpt'                  # 模型保存的路径
TensorBoard_path = 'E:\\MNIST_data\\TensorBoard'                         # tensorboard日志保存位置

print('please wait for generating the TFRecord file of training sets...')       
#input_data.generate_TFRecordfile(train_image_path,train_label_path,Train_TFRecord_path)# 调用函数生成TFRecord文件
print('please wait for generating the TFRecord file of test sets...')
#input_data.generate_TFRecordfile(test_image_path,test_label_path,Test_TFRecord_path)   # 调用函数生成TFRecord文件

Train_Images_Batch,Train_Labels_Batch = input_data.get_batch(Train_TFRecord_path)       # 调用函数多线程读取TFRecord文件生成mini-batch       
Test_Images_Batch,Test_Labels_Batch = input_data.get_batch(Test_TFRecord_path)          # 调用函数多线程读取TFRecord文件生成mini-batch       
# 定义将mini-batch导入网络的占位符
x = tf.placeholder(tf.float32, shape=[None,img_W,img_H,1],name = 'images')
y_label = tf.placeholder(tf.float32, shape=[None,img_W,img_H,1],name = 'labels')

y_conv = model.inference(x)

loss = tf.reduce_mean(tf.square(y_conv - y_label))     # 定义代价函数为均方误差
train_op = tf.train.AdamOptimizer(1e-4).minimize(loss) # 使用梯度下降算法对参数进行寻优 

init_op = (tf.local_variables_initializer(),tf.global_variables_initializer())#初始化操作
saver = tf.train.Saver()
# 初始化写日志的writer 将当前计算图写入日志
summary_writer = tf.summary.FileWriter(TensorBoard_path,tf.get_default_graph())
with tf.Session() as sess:
    sess.run(init_op)
    coord = tf.train.Coordinator() # 用于协调多个线程同时终止
    threads = tf.train.start_queue_runners(sess=sess,coord=coord) # 启动线程     
    try:
        for step in range(100):
            if coord.should_stop(): # 读到结束标记后coord.should_stop()变为True,跳出循环
                break
            train_images_batch,train_labels_batch = sess.run([Train_Images_Batch,Train_Labels_Batch])
            train_images_batch = np.reshape(train_images_batch,[batch_size,img_W,img_H,1]) # 一个样本为行
            train_labels_batch = np.reshape(train_labels_batch,[batch_size,img_W,img_H,1])
            sess.run(train_op,feed_dict={x:train_images_batch,y_label:train_labels_batch}) # 将mini-batch feed给train_op 训练网络               
            if step%100 == 0:
                test_images_batch,test_labels_batch = sess.run([Test_Images_Batch,Test_Labels_Batch])
                test_images_batch = np.reshape(test_images_batch,[batch_size,img_W,img_H,1]) 
                test_labels_batch = np.reshape(test_labels_batch,[batch_size,img_W,img_H,1])
                train_loss = sess.run(loss,feed_dict={x:train_images_batch,y_label:train_labels_batch})
                test_loss = sess.run(loss,feed_dict={x:test_images_batch,y_label:test_labels_batch})
                print('step %d: loss on training set batch:%d  loss on testing set batch:%d' % (step,train_loss,test_loss))
                saver.save(sess, model_save_path)

    except tf.errors.OutOfRangeError: # 捕捉文件名队列中的结束标记
        print('epoch limit reached')
        coord.request_stop() #通知其它线程停止读取数据
    finally:
        coord.request_stop()
        coord.join(threads) #等待所有线程退出
    saver.save(sess, model_save_path) # 保存模型
    summary_writer.close()
time_end=time.time() # time.time()为1970.1.1到当前时间的毫秒数
print('\nTrain Finished\nTotal run time is : %f s \nThe network was saved in %s' %(time_end-time_start,model_save_path))

查看网络的计算图(tensorboard.exe –logdir=E:\MNIST_data\TensorBoard ):
《一个图像复原实例入门深度学习&TensorFlow—第十篇》训练过程可视化
好乱啊,一点条理都没有!下面我们就通过tf.variable_scope()函数来将相关的计算步骤统一命名得到更为清爽的计算图。

2. 命名空间管理

说的浅显一点就是给相似的计算过程统一命名,比如网络中各个卷积层的构建过程都可以命名为conv_1、conv_2…这样一来计算图就会将各层的中间定义过程归类。我们先在最开始那个小例子上进行命名空间管理:

import tensorflow as tf
tf.reset_default_graph()                         # 重置计算图
TensorBoard_path = 'E:\\MNIST_data\\TensorBoard' # tensorboard日志保存位置

# 定义一个简单的计算图,实现向量的加法操作
with tf.variable_scope('input1'):
    input1 = tf.constant([1.0,2.0,3.0])
with tf.variable_scope('input2'):
    input2 = tf.constant([4.0,5.0,6.0])
with tf.variable_scope('add'):
    output = tf.add_n([input1,input2])

#生成一个写日志的writer,并将当前的TensorFlow计算图写入日志,日志保存在TensorBoard_path中生成的.tfevents文件中
summary_writer = tf.summary.FileWriter(TensorBoard_path,tf.get_default_graph())
summary_writer.close()

现在计算图是这样的:
《一个图像复原实例入门深度学习&TensorFlow—第十篇》训练过程可视化
使用命名空间管理后,实现加法运算的相关操作节点就被封装到名为add的节点当中,从而使得计算图变得简单直观。下面我们在原有代码中增加命名空间管理,代码如下:

file name model.py

import tensorflow as tf
batch_size = 10

# 定义权重的函数
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1) # 从截断的正态分布中输出随机值μ-2σ,μ+2σ
    return tf.Variable(initial)
# 定义偏置的函数
def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)
# 定义卷积层的函数
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
# 定义池化层的函数
def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                            strides=[1, 2, 2, 1], padding='SAME')

# 定义前向传播过程
def inference(x):
    # 第一卷积层
    with tf.variable_scope('conv_1'):
        with tf.variable_scope('W_conv1'):
            W_conv1 = weight_variable([5, 5, 1, 32])  
        with tf.variable_scope('b_conv1'):
            b_conv1 = bias_variable([32])          
        h_conv1 = tf.nn.relu(conv2d(x, W_conv1) + b_conv1)
    # 第一池化层
    with tf.variable_scope('Max_pooling_1'):
        h_pool1 = max_pool_2x2(h_conv1)
    # 第二卷积层
    with tf.variable_scope('conv_2'):
        with tf.variable_scope('W_conv2'):
            W_conv2 = weight_variable([5, 5, 32, 64])
        with tf.variable_scope('b_conv2'):
            b_conv2 = bias_variable([64])
        h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
    # 第二池化层 
    with tf.variable_scope('Max_pooling_2'):
        h_pool2 = max_pool_2x2(h_conv2)
    # 上采样层1
    with tf.variable_scope('Upsampling_1'):        
        W_de_conv1 = weight_variable([5, 5, 32, 64])
        h_de_conv1 = tf.nn.conv2d_transpose(h_pool2,W_de_conv1,output_shape=[batch_size, 14, 14, 32],strides=[1,2,2,1],padding="SAME")
    # 上采样层2
    with tf.variable_scope('Upsampling_2'):
        W_de_conv2 = weight_variable([5, 5, 1, 32])  
        h_de_conv2 = tf.nn.conv2d_transpose(h_de_conv1,W_de_conv2,output_shape=[batch_size, 28, 28, 1],strides=[1,2,2,1],padding="SAME")    
    # 网络输出的结果 
    return h_de_conv2

file name train.py

import time
time_start=time.time() # time.time()为1970.1.1到当前时间的毫秒数
import tensorflow as tf
import numpy as np

import input_data  # 导入与输入数据相关的操作
import model       # 导入模型
tf.reset_default_graph()  

img_W = 28                                                               # 图像宽度
img_H = 28                                                               # 图像高度
batch_size = 10                                                          # 每个mini-batch含有的样本数量
min_after_dequeue = 1000                                                 # 队列中最少文件数量
capacity = min_after_dequeue + 3*batch_size                              # 队列中最多文件数量

train_image_path = 'E:\\MNIST_data\\train_images\\'                      # 输入图像的路径
train_label_path = 'E:\\MNIST_data\\train_labels\\'                      # 输出图像的路径
Train_TFRecord_path = 'E:\\MNIST_data\\tfrecord\\train_data_set.tfrecord'# 输出TFRecord文件的路径

test_image_path = 'E:\\MNIST_data\\test_images\\'                        # 输入图像的路径
test_label_path = 'E:\\MNIST_data\\test_labels\\'                        # 输出图像的路径
Test_TFRecord_path = 'E:\\MNIST_data\\tfrecord\\test_data_set.tfrecord'  # 输出TFRecord文件的路径
model_save_path = 'E:\\MNIST_data\\models\\conv_1.ckpt'                  # 模型保存的路径
TensorBoard_path = 'E:\\MNIST_data\\TensorBoard'                         # tensorboard日志保存位置

print('please wait for generating the TFRecord file of training sets...')       
input_data.generate_TFRecordfile(train_image_path,train_label_path,Train_TFRecord_path)# 调用函数生成TFRecord文件
print('please wait for generating the TFRecord file of test sets...')
input_data.generate_TFRecordfile(test_image_path,test_label_path,Test_TFRecord_path)   # 调用函数生成TFRecord文件
with tf.variable_scope('train_mini-batch'):
    Train_Images_Batch,Train_Labels_Batch = input_data.get_batch(Train_TFRecord_path)       # 调用函数多线程读取TFRecord文件生成mini-batch       
with tf.variable_scope('test_mini-batch'):
    Test_Images_Batch,Test_Labels_Batch = input_data.get_batch(Test_TFRecord_path)          # 调用函数多线程读取TFRecord文件生成mini-batch       
# 定义将mini-batch导入网络的占位符
with tf.variable_scope('input'):
    x = tf.placeholder(tf.float32, shape=[None,img_W,img_H,1],name = 'images')
    y_label = tf.placeholder(tf.float32, shape=[None,img_W,img_H,1],name = 'labels')

y_conv = model.inference(x)
with tf.variable_scope('loss_function'):
    loss = tf.reduce_mean(tf.square(y_conv - y_label))     # 定义代价函数为均方误差
with tf.variable_scope('train_step'):
    train_op = tf.train.AdamOptimizer(1e-4).minimize(loss) # 使用梯度下降算法对参数进行寻优 
with tf.variable_scope('init_step'):
    init_op = (tf.local_variables_initializer(),tf.global_variables_initializer())#初始化操作

saver = tf.train.Saver()
# 初始化写日志的writer 将当前计算图写入日志
summary_writer = tf.summary.FileWriter(TensorBoard_path,tf.get_default_graph())
with tf.Session() as sess:
    sess.run(init_op)
    coord = tf.train.Coordinator() # 用于协调多个线程同时终止
    threads = tf.train.start_queue_runners(sess=sess,coord=coord) # 启动线程      
    try:
        for step in range(100):
            if coord.should_stop(): # 读到结束标记后coord.should_stop()变为True,跳出循环
                break
            train_images_batch,train_labels_batch = sess.run([Train_Images_Batch,Train_Labels_Batch])
            train_images_batch = np.reshape(train_images_batch,[batch_size,img_W,img_H,1]) # 一个样本为行
            train_labels_batch = np.reshape(train_labels_batch,[batch_size,img_W,img_H,1])
            sess.run(train_op,feed_dict={x:train_images_batch,y_label:train_labels_batch}) # 将mini-batch feed给train_op 训练网络               
            if step%100 == 0:
                test_images_batch,test_labels_batch = sess.run([Test_Images_Batch,Test_Labels_Batch])
                test_images_batch = np.reshape(test_images_batch,[batch_size,img_W,img_H,1]) 
                test_labels_batch = np.reshape(test_labels_batch,[batch_size,img_W,img_H,1])
                train_loss = sess.run(loss,feed_dict={x:train_images_batch,y_label:train_labels_batch})
                test_loss = sess.run(loss,feed_dict={x:test_images_batch,y_label:test_labels_batch})
                print('step %d: loss on training set batch:%d  loss on testing set batch:%d' % (step,train_loss,test_loss))
                #saver.save(sess, model_save_path)

    except tf.errors.OutOfRangeError: # 捕捉文件名队列中的结束标记
        print('epoch limit reached')
        coord.request_stop() #通知其它线程停止读取数据
    finally:
        coord.request_stop()
        coord.join(threads) #等待所有线程退出
    saver.save(sess, model_save_path) # 保存模型
    summary_writer.close()
time_end=time.time() # time.time()为1970.1.1到当前时间的毫秒数
print('\nTrain Finished\nTotal run time is : %f s \nThe network was saved in %s' %(time_end-time_start,model_save_path))

TensorBoard中的结果:
《一个图像复原实例入门深度学习&TensorFlow—第十篇》训练过程可视化
是不是清爽了好多呀。
我们已经学会了通过TensorBoard查看构建的计算图,然而从TensorBoard中能可视化的信息远不止计算图一种,下面介绍,在TensorBoard中其他监控信息的可视化实现。

3. 监控指标可视化

TensorBoard除了可以可视化TensorFlow的计算图,还可以可视化TensorFlow运行过程中各种有助于了解程序运行状态的监控指标。与可视化计算图一样简单,直接调用对应的日志生成函数就可以了,下面的表格介绍主要的TensorFlow日志生成函数与TensorBoard界面栏的对应关系

TensorFlow日志生成函数 TensorBoard界面栏 展示内容
tf.summary.scalar EVENTS TensorFlow中标量(scalar,如误差,准确率等)监控数据随着迭代进行的变化趋势
tf.summary.image IMAGES TensorFlow中使用的图片数据,用于可视化当前使用的训练\测试图片
tf.summary.histogram HISTOGRAM TensorFlow中的张量分布直方图,用于监控数据随着迭代轮数的变化趋势

具体实现(伪代码):

...
TensorBoard_path = 'E:\\MNIST_data\\TensorBoard'
with tf.variable_scope('input'):
    x = tf.placeholder(tf.float32, shape=[None,img_W,img_H,1],name = 'images')
    y_label = tf.placeholder(tf.float32,shape=[None,img_W,img_H,1],name = 'labels')
    tf.summary.image('images',x,4)         # 生成mini-batch中的4张输入图片(images)的日志
    tf.summary.image('labels',y_label,4)   # 生成mini-batch中的4张输出图片(labels)的日志

y_conv = model.inference(x,keep_prob)      # 前向传播求网络输出
tf.summary.image('outputs',y_conv,4)       # 生成网络对这4张图片的响应(outputs)的日志

with tf.variable_scope('loss_function'):
    loss = tf.reduce_mean(tf.square(y_conv - y_label)) 
    tf.summary.scalar('loss_on_training_batch',loss) # 生成loss监控的日志        
with tf.variable_scope('train_step'):
    train_op = tf.train.AdamOptimizer(1e-4).minimize(loss)
merged = tf.summary.merge_all() # 整理所有的日志生成操作
with tf.Session() as sess:
    # 初始化写日志的writer 将当前计算图写入日志,日志保存在TensorBoard_path中
    summary_writer = tf.summary.FileWriter(TensorBoard_path,tf.get_default_graph()) 
    for step in range(100000):
        # 运行训练步骤以及所有的日志生成操作,得到这次运行的日志(日志中保存的是张量信息,要sess.run后才会有具体的值)
        summary,_ = sess.run([merged,train_op],feed_dict={x:train_images_batch,y_label:train_labels_batch}) 
        # 将所有日志写入文件,TensorBoard就可以拿到这次运行所对应的所有信息
        summary_writer.add_summary(summary,step)
    summary_writer.close() # 运行结束,关闭写日志的writer   

这样就实现了对损失值的监控,以及实时观察网络在训练集上的表现。
当然我们还可以监控网络参数的变化情况,从而观察是否出现了梯度消失的情况(参数恒定不变 不更新),这在我们的model.py中增加写日志的操作(应为参数都在model里呀),代码如下:

...
# 对于参数我们希望观察其直方图分布、均值、方差,为减少重复代码,定义此函数
def variable_summaries(var,name):
    with tf.variable_scope('summaries'):
        tf.summary.histogram(name,var)

        mean = tf.reduce_mean(var)
        tf.summary.scalar('mean/' + name,mean)

        stddev = tf.sqrt(tf.reduce_mean(tf.square(var-mean)))
        tf.summary.scalar('stddev/' + name,stddev)

# 定义前向传播过程
def inference(x):
    # 第一卷积层
    with tf.variable_scope('conv_1'):
        with tf.variable_scope('W_conv1'):
            W_conv1 = weight_variable([5, 5, 1, 32])
            variable_summaries(W_conv1,'conv_1'+ '/W_conv1')  
        with tf.variable_scope('b_conv1'):
            b_conv1 = bias_variable([32])  
            variable_summaries(b_conv1,'conv_1'+ '/b_conv1')         
        h_conv1 = tf.nn.relu(conv2d(x, W_conv1) + b_conv1)
    # 第一池化层
    with tf.variable_scope('Max_pooling_1'):
        h_pool1 = max_pool_2x2(h_conv1)
    # 第二卷积层
    with tf.variable_scope('conv_2'):
        with tf.variable_scope('W_conv2'):
            W_conv2 = weight_variable([5, 5, 32, 64])
            variable_summaries(W_conv2,'conv_2'+ '/W_conv2')
        with tf.variable_scope('b_conv2'):
            b_conv2 = bias_variable([64])
            variable_summaries(b_conv2,'conv_2'+ '/b_conv2')
        h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
    # 第二池化层 
    with tf.variable_scope('Max_pooling_2'):
        h_pool2 = max_pool_2x2(h_conv2)
    # 上采样层1
    with tf.variable_scope('Upsampling_1'):        
        W_de_conv1 = weight_variable([5, 5, 32, 64])
        variable_summaries(W_de_conv1,'Upsampling_1'+ '/W_de_conv1')
        h_de_conv1 = tf.nn.conv2d_transpose(h_pool2,W_de_conv1,output_shape=[batch_size, 14, 14, 32],strides=[1,2,2,1],padding="SAME")
    # 上采样层2
    with tf.variable_scope('Upsampling_2'):
        W_de_conv2 = weight_variable([5, 5, 1, 32])  
        variable_summaries(W_de_conv2,'Upsampling_2'+ '/W_de_conv2')
        h_de_conv2 = tf.nn.conv2d_transpose(h_de_conv1,W_de_conv2,output_shape=[batch_size, 28, 28, 1],strides=[1,2,2,1],padding="SAME")    
    # 网络输出的结果 
    return h_de_conv2

这样就实现了对所有可学习参数的直方图分布、均值、方差的可视化监控。
先看看结果吧(完整代码在最后):
《一个图像复原实例入门深度学习&TensorFlow—第十篇》训练过程可视化
《一个图像复原实例入门深度学习&TensorFlow—第十篇》训练过程可视化
《一个图像复原实例入门深度学习&TensorFlow—第十篇》训练过程可视化
《一个图像复原实例入门深度学习&TensorFlow—第十篇》训练过程可视化
《一个图像复原实例入门深度学习&TensorFlow—第十篇》训练过程可视化

完整代码(只改动了model.py和train.py):

file name:model.py

import tensorflow as tf
batch_size = 10

# 定义权重的函数
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1) # 从截断的正态分布中输出随机值μ-2σ,μ+2σ
    return tf.Variable(initial)
# 定义偏置的函数
def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)
# 定义卷积层的函数
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
# 定义池化层的函数
def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                            strides=[1, 2, 2, 1], padding='SAME')

# 对于参数我们希望观察其直方图分布、均值、方差,为减少重复代码,定义此函数
def variable_summaries(var,name):
    with tf.variable_scope('summaries'):
        tf.summary.histogram(name,var)

        mean = tf.reduce_mean(var)
        tf.summary.scalar('mean/' + name,mean)

        stddev = tf.sqrt(tf.reduce_mean(tf.square(var-mean)))
        tf.summary.scalar('stddev/' + name,stddev)

# 定义前向传播过程
def inference(x):
    # 第一卷积层
    with tf.variable_scope('conv_1'):
        with tf.variable_scope('W_conv1'):
            W_conv1 = weight_variable([5, 5, 1, 32])
            variable_summaries(W_conv1,'conv_1'+ '/W_conv1')  
        with tf.variable_scope('b_conv1'):
            b_conv1 = bias_variable([32])  
            variable_summaries(b_conv1,'conv_1'+ '/b_conv1')         
        h_conv1 = tf.nn.relu(conv2d(x, W_conv1) + b_conv1)
    # 第一池化层
    with tf.variable_scope('Max_pooling_1'):
        h_pool1 = max_pool_2x2(h_conv1)
    # 第二卷积层
    with tf.variable_scope('conv_2'):
        with tf.variable_scope('W_conv2'):
            W_conv2 = weight_variable([5, 5, 32, 64])
            variable_summaries(W_conv2,'conv_2'+ '/W_conv2')
        with tf.variable_scope('b_conv2'):
            b_conv2 = bias_variable([64])
            variable_summaries(b_conv2,'conv_2'+ '/b_conv2')
        h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
    # 第二池化层 
    with tf.variable_scope('Max_pooling_2'):
        h_pool2 = max_pool_2x2(h_conv2)
    # 上采样层1
    with tf.variable_scope('Upsampling_1'):        
        W_de_conv1 = weight_variable([5, 5, 32, 64])
        variable_summaries(W_de_conv1,'Upsampling_1'+ '/W_de_conv1')
        h_de_conv1 = tf.nn.conv2d_transpose(h_pool2,W_de_conv1,output_shape=[batch_size, 14, 14, 32],strides=[1,2,2,1],padding="SAME")
    # 上采样层2
    with tf.variable_scope('Upsampling_1'):
        W_de_conv2 = weight_variable([5, 5, 1, 32])  
        variable_summaries(W_de_conv2,'Upsampling_2'+ '/W_de_conv2')
        h_de_conv2 = tf.nn.conv2d_transpose(h_de_conv1,W_de_conv2,output_shape=[batch_size, 28, 28, 1],strides=[1,2,2,1],padding="SAME")    
    # 网络输出的结果 
    return h_de_conv2

file name:train.py

import time
time_start=time.time() # time.time()为1970.1.1到当前时间的毫秒数
import tensorflow as tf
import numpy as np

import input_data  # 导入与输入数据相关的操作
import model       # 导入模型
tf.reset_default_graph()  

img_W = 28                                                               # 图像宽度
img_H = 28                                                               # 图像高度
batch_size = 10                                                          # 每个mini-batch含有的样本数量
min_after_dequeue = 1000                                                 # 队列中最少文件数量
capacity = min_after_dequeue + 3*batch_size                              # 队列中最多文件数量

train_image_path = 'E:\\MNIST_data\\train_images\\'                      # 输入图像的路径
train_label_path = 'E:\\MNIST_data\\train_labels\\'                      # 输出图像的路径
Train_TFRecord_path = 'E:\\MNIST_data\\tfrecord\\train_data_set.tfrecord'# 输出TFRecord文件的路径

test_image_path = 'E:\\MNIST_data\\test_images\\'                        # 输入图像的路径
test_label_path = 'E:\\MNIST_data\\test_labels\\'                        # 输出图像的路径
Test_TFRecord_path = 'E:\\MNIST_data\\tfrecord\\test_data_set.tfrecord'  # 输出TFRecord文件的路径
model_save_path = 'E:\\MNIST_data\\models\\conv_1.ckpt'                  # 模型保存的路径
TensorBoard_path = 'E:\\MNIST_data\\TensorBoard'                         # tensorboard日志保存位置

print('please wait for generating the TFRecord file of training sets...')       
input_data.generate_TFRecordfile(train_image_path,train_label_path,Train_TFRecord_path)# 调用函数生成TFRecord文件
print('please wait for generating the TFRecord file of test sets...')
input_data.generate_TFRecordfile(test_image_path,test_label_path,Test_TFRecord_path)   # 调用函数生成TFRecord文件

with tf.variable_scope('train_mini-batch'):
    Train_Images_Batch,Train_Labels_Batch = input_data.get_batch(Train_TFRecord_path)       # 调用函数多线程读取TFRecord文件生成mini-batch       
with tf.variable_scope('test_mini-batch'):
    Test_Images_Batch,Test_Labels_Batch = input_data.get_batch(Test_TFRecord_path)          # 调用函数多线程读取TFRecord文件生成mini-batch 

# 定义将mini-batch导入网络的占位符
with tf.variable_scope('input'):
    x = tf.placeholder(tf.float32, shape=[None,img_W,img_H,1],name = 'images')
    y_label = tf.placeholder(tf.float32, shape=[None,img_W,img_H,1],name = 'labels')
    tf.summary.image('images',x,4)         # 生成mini-batch中的4张输入图片(images)的日志
    tf.summary.image('labels',y_label,4)   # 生成mini-batch中的4张输出图片(labels)的日志

y_conv = model.inference(x)                # 前向传播求网络输出
tf.summary.image('outputs',y_conv,4)       # 生成网络对这4张图片的响应(outputs)的日志
with tf.variable_scope('loss_function'):
    loss = tf.reduce_mean(tf.square(y_conv - y_label))     # 定义代价函数为均方误差
    tf.summary.scalar('loss_on_training_batch',loss) # 生成loss监控的日志
with tf.variable_scope('train_step'):
    train_op = tf.train.AdamOptimizer(1e-4).minimize(loss) # 使用梯度下降算法对参数进行寻优 
with tf.variable_scope('init_step'):
    init_op = (tf.local_variables_initializer(),tf.global_variables_initializer())#初始化操作

saver = tf.train.Saver()
# 初始化写日志的writer 将当前计算图写入日志
summary_writer = tf.summary.FileWriter(TensorBoard_path,tf.get_default_graph())
merged = tf.summary.merge_all() # 整理所有的日志生成操作

with tf.Session() as sess:
    sess.run(init_op)
    coord = tf.train.Coordinator() # 用于协调多个线程同时终止
    threads = tf.train.start_queue_runners(sess=sess,coord=coord) # 启动线程      
    try:
        for step in range(10000):
            if coord.should_stop(): # 读到结束标记后coord.should_stop()变为True,跳出循环
                break
            train_images_batch,train_labels_batch = sess.run([Train_Images_Batch,Train_Labels_Batch])
            train_images_batch = np.reshape(train_images_batch,[batch_size,img_W,img_H,1]) # 一个样本为行
            train_labels_batch = np.reshape(train_labels_batch,[batch_size,img_W,img_H,1])
            # 运行训练步骤以及所有的日志生成操作,得到这次运行的日志保存在TensorBoard_path中
            summary,_ = sess.run([merged,train_op],feed_dict={x:train_images_batch,y_label:train_labels_batch}) 
            # 将所有日志写入文件,TensorBoard就可以拿到这次运行所对应的所有信息
            summary_writer.add_summary(summary,step)               
            if step%100 == 0:
                test_images_batch,test_labels_batch = sess.run([Test_Images_Batch,Test_Labels_Batch])
                test_images_batch = np.reshape(test_images_batch,[batch_size,img_W,img_H,1]) 
                test_labels_batch = np.reshape(test_labels_batch,[batch_size,img_W,img_H,1])
                train_loss = sess.run(loss,feed_dict={x:train_images_batch,y_label:train_labels_batch})
                test_loss = sess.run(loss,feed_dict={x:test_images_batch,y_label:test_labels_batch})
                print('step %d: loss on training set batch:%d  loss on testing set batch:%d' % (step,train_loss,test_loss))
                saver.save(sess, model_save_path)

    except tf.errors.OutOfRangeError: # 捕捉文件名队列中的结束标记
        print('epoch limit reached')
        coord.request_stop() #通知其它线程停止读取数据
    finally:
        coord.request_stop()
        coord.join(threads) #等待所有线程退出
    saver.save(sess, model_save_path) # 保存模型
    summary_writer.close()
time_end=time.time() # time.time()为1970.1.1到当前时间的毫秒数
print('\nTrain Finished\nTotal run time is : %f s \nThe network was saved in %s' %(time_end-time_start,model_save_path))