DeepDreaming with Tensorflowをやる(1)

2016-09-07 tensorflow

https://github.com/tensorflow/tensorflow/blob/r0.10/tensorflow/examples/tutorials/deepdream/deepdream.ipynb

例の通りまとめながら進めていく。

このノートブックは、畳み込みニューラルネットワークによる画像生成の手法を説明するものだ。ネットワークは入力画像へ変換させる配列のレイヤーの集合から成り立っている。変換のパラメータは勾配降下法で変形しながら学習していく。内部的な画像の表現は意味不明なように見えるが、可視化し、解釈することができる。

Loading and displaying the model graph

学習済みネットワークのprotobufファイルが用意されていて、これをダウンロードして使う。ただgcr.io/tensorflow/tensorflowにwgetもunzipも入っていなかったので、中に入ってapt-getした。

model_fn = 'tensorflow_inception_graph.pb'

# creating TensorFlow session and loading the model
graph = tf.Graph()
sess = tf.InteractiveSession(graph=graph)
with tf.gfile.FastGFile(model_fn, 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
t_input = tf.placeholder(np.float32, name='input') # define the input tensor
imagenet_mean = 117.0
t_preprocessed = tf.expand_dims(t_input-imagenet_mean, 0)
tf.import_graph_def(graph_def, {'input':t_preprocessed})

tf.gfile.FastGFileのドキュメントが見つからないのでソースを探したところFile I/Oのラッパーのようだ。これでprotobufファイルを読み、ParseFromStringでGraphDefにする。

さらにこれと入力データをtf.import_graph_defに渡すことでGraphに取り込む。

tf.expand_dimsは値が1の次元を指定の場所に挿入するもの。なんでそんなことをしたり、imagenet_meanを引いているのかは説明がなかった。

layers = [op.name for op in graph.get_operations() if op.type=='Conv2D' and 'import/' in op.name]
feature_nums = [int(graph.get_tensor_by_name(name+':0').get_shape()[-1]) for name in layers]

このlayersに入っているのはこんな感じ。

import/conv2d0_pre_relu/conv
import/conv2d1_pre_relu/conv
import/conv2d2_pre_relu/conv
import/mixed3a_1x1_pre_relu/conv
import/mixed3a_3x3_bottleneck_pre_relu/conv
import/mixed3a_3x3_pre_relu/conv
import/mixed3a_5x5_bottleneck_pre_relu/conv
import/mixed3a_5x5_pre_relu/conv
...

これらのレイヤーのうち、mixed4d_3x3_bottleneck_pre_reluを可視化してみる。

layer = 'mixed4d_3x3_bottleneck_pre_relu'
channel = 139 # picking some feature channel to visualize

def T(layer):
    '''Helper for getting layer output tensor'''
    return graph.get_tensor_by_name("import/%s:0"%layer)

render_naive(T(layer)[:,:,:,channel])

mixed4d_3x3_bottleneck_pre_relu'は144チャンネルのフィルターで、今回はそのうち139番目のチャンネルを選んでいる。

print(T(layer))
-> Tensor("import/mixed4d_3x3_bottleneck_pre_relu:0", shape=(?, ?, ?, 144), dtype=float32, device=/device:CPU:0)

初期値はRGB100(グレー)にノイズを加えた画像。

# start with a gray image with a little noise
img_noise = np.random.uniform(size=(224,224,3)) + 100.0

スコアはそのチャンネルの値の平均で、これが高くなるように画像を変化させていく。

def render_naive(t_obj, img0=img_noise, iter_n=20, step=1.0):
    t_score = tf.reduce_mean(t_obj) # defining the optimization objective
    t_grad = tf.gradients(t_score, t_input)[0] # behold the power of automatic differentiation!

    img = img0.copy()
    for i in range(iter_n):
        g, score = sess.run([t_grad, t_score], {t_input:img})
        # normalizing the gradient, so the same step size should work
        g /= g.std()+1e-8         # for different layers and networks
        img += g*step
        print(score, end = ' ')
    clear_output()
    showarray(visstd(img))

tf.gradients(ys,xs)で xそれぞれで偏微分したyの和が得られる。

a = tf.Variable(tf.constant([
            [1., 2.],
            [3., 4.]]))
b = tf.Variable(tf.constant([
            [2., 3.],
            [4., 5.]]))
c = tf.matmul(a, b)

grad = tf.gradients(c, a)[0]
init = tf.initialize_all_variables()
with tf.Session() as sess:
    sess.run(init)

    print(sess.run(c))
    # [[ 10.  13.]
    # [ 22.  29.]]

    print(sess.run(grad))
    # [[ 5.  9.]
    # [ 5.  9.]]

入力画像にこれをを加算していくと、その状態からスコアが上がるパラメータが増え、下がるパラメータが減るため、勾配を上っていくことになる。スコアが上昇するに従って、そのフィルターによる模様が浮かんできた。

続く。