首页 > 资讯 > 其他综合 > 正文


17-06-20        来源:[db:作者]  
收藏   我要投稿


1. 接口研究

1.1. Recurrent接口


1.1.1. implementation:

implementation: one of {0, 1, or 2}.
If set to 0, the RNN will use an implementation that uses fewer, larger matrix products, thus running faster on CPU but consuming more memory. If set to 1, the RNN will use more matrix products, but smaller ones, thus running slower (may actually be faster on GPU) while consuming less memory. If set to 2 (LSTM/GRU only), the RNN will combine the input gate, the forget gate and the output gate into a single matrix, enabling more time-efficient parallelization on the GPU. Note: RNN dropout must be shared for all gates, resulting in a slightly reduced regularization.

1.1.2. weights:

weights: list of Numpy arrays to set as initial weights.
The list should have 3 elements, of shapes: [(input_dim, output_dim), (output_dim, output_dim), (output_dim,)]`.

1.2. LSTM接口

1.3. recurrent_activation

Activation function to use for the recurrent step.
注意: 默认值是’hard_sigmoid’,而原论文中用的’sigmoid’。

2. kernel VS recurrent_kernel

2.1. kernel

1 . 初始化

self.kernel = self.add_weight(shape=(self.input_dim, self.units * 4),                          name='kernel',     initializer=self.kernel_initializer,                        regularizer=self.kernel_regularizer,  constraint=self.kernel_constraint)

2 . 分块意义

self.kernel_i = self.kernel[:, :self.units]
self.kernel_f = self.kernel[:, self.units: self.units * 2]
self.kernel_c = self.kernel[:, self.units * 2: self.units * 3]
self.kernel_o = self.kernel[:, self.units * 3:]

3 . kernel是用于和输入x做乘法的矩阵

2.2. recurrent_kernel

1 . 初始化:

self.recurrent_kernel = self.add_weight(
            shape=(self.units, self.units * 4),

2 . 分块意义

self.recurrent_kernel_i = self.recurrent_kernel[:, :self.units]
self.recurrent_kernel_f = self.recurrent_kernel[:, self.units: self.units * 2]
self.recurrent_kernel_c = self.recurrent_kernel[:, self.units * 2: self.units * 3]
self.recurrent_kernel_o = self.recurrent_kernel[:, self.units * 3:]

3 . recurrent_kernel是用于和前一时刻隐层输出h做乘法的矩阵

activation VS recurrent_activation


if self.implementation == 2:
            z = K.dot(inputs * dp_mask[0], self.kernel)
            z += K.dot(h_tm1 * rec_dp_mask[0], self.recurrent_kernel)
            if self.use_bias:
                z = K.bias_add(z, self.bias)

            z0 = z[:, :self.units]
            z1 = z[:, self.units: 2 * self.units]
            z2 = z[:, 2 * self.units: 3 * self.units]
            z3 = z[:, 3 * self.units:]

            i = self.recurrent_activation(z0)
            f = self.recurrent_activation(z1)
            c = f * c_tm1 + i * self.activation(z2)
            o = self.recurrent_activation(z3)
h = o * self.activation(c)

可见activation 作用于i,f,o的生成,recurrent_activation作用于g的生成以及在c的输出部分做微调。如果要模拟原论文的话,应该设置activation = tanh, recurrent_activation = sigmoid。

下一篇:十、Android XML解析

关于我们 | 联系我们 | 广告服务 | 投资合作 | 版权申明 | 在线帮助 | 网站地图 | 作品发布 | Vip技术培训 | 举报中心

版权所有: 红黑联盟--致力于做实用的IT技术学习网站