0.1 神经网络
0.1.1 简图
input layer hide layer1 hide layer2 ouput layer
L-2 L-1 L
N I J K
******* ******* ******* *******
** ** ** ** a(z) ** ** a(z) ** ** a(z)
* x1 * * wx+b * ------> * wa+b * ------> * wa+b * ----->
** ** ** ** relu ** ** relu ** ** softmax
******* ******* ******* *******
******* ******* ******* *******
** ** ** ** ** ** ** **
* x2 * * * * * * *
** ** ** ** ** ** ** **
******* ******* ******* *******
. . . .
. . . .
. . . .
******* ******* ******* *******
** ** ** ** ** ** ** **
* xN * * hI * * hJ * * oK *
** ** ** ** ** ** ** **
******* ******* ******* *******
a_i(z_i) a_j(z_j) a_k(z_k)
0.1.2 描述
神经网络包含1个输入层, 2个隐藏层, 1个输出层.
输出层(神经元)节点数为N, 总层数L, 所以倒数第一个隐藏层为第L-1层, 以此类推L-2层.
第一个隐藏层的节点数为I个, 第二个隐藏层的节点数为J个, 输出层的节点数为K个.
隐藏层使用的激活函数为Relu, 输出层的激活函数我Softmax.
除了输入层外, 每个神经元的输入都是上一层所有节点经过激活函数之后的值, 例如:
输出层的第k个神经元的真实输出\(t_k\)
L层:
\[ \begin{align*} z_i^{L-2} &= \sum_n^N w_{in}^{L-2} x_n + b_i \\ a_i^{L-2} &= relu(z_i^{L-2}) \\ &= max(0, z_i^{L-2} \\ z_j^{L-1} &= \sum_i^I w_{ji}^{L-1} a_i + b_j \\ a_j^{L-1} &= relu(z_j^{L-1}) \\ &= max(0, z_j^{L-1} \\ z_k^{L} &= \sum_j^{J} w_{kj}^{L} a_j + b_k \tag{0} \\ a_k^{L} &= softmax(z_k^{L}) \\ &= \frac{e^{z_k^{L}}}{\sum_c^K e^{z_c^{L}}} \end{align*} \]
交叉熵(误差估计):
\[ \begin{align} E = -\sum_k^K t_k log a_k^L = -\sum_k^K t_k (z_k^L - log\sum_c^K e^{z_c^L}) \tag{1} \end{align} \]
0.1.3 梯度
更新权重(组成第k个节点对应上一层第j个节点的参数): \(\Delta W \propto =-\frac{\partial E}{\partial W}\)
输出层:
\[ \begin{align*} \Delta W_{kj}^L &= -\eta \frac{\partial E}{\partial W_{kj}} \\ &= -\eta \frac{\partial E}{\partial a_k^L} \frac{\partial a_k^L}{\partial z_k^L} \frac{\partial z_k^L}{\partial W_{kj}^l} \\ \end{align*} \]
计算:
\[ \begin{align} \frac{\partial E}{\partial W_{kj}^L} = \frac{\partial E}{\partial z_{k}^L} \frac{\partial z_{k}^L}{\partial W_{kj}^L} \tag{2} \end{align} \]
\[ \begin{align*} \frac{\partial E}{\partial z_{k}^L} &= - \sum_{d}^K t_d (\mathbb{1}_{d=k} - \frac{1}{\sum_c e^{z_c^L}} e^{z_k^L}) \nonumber \\ &= - \sum_d^K t_d (\mathbb{1}_{d=k} - a_k^L) \nonumber \\ &= \sum_d^K t_d a_k^L - \sum_d^K t_d \mathbb{1}_{d=k} \nonumber \\ &= a_k^L \sum_d^K t_d - t_k \nonumber \\ &= a_k^L - t_k \end{align*} \tag{3} \]
其中:
\[ \mathbb{1}_{d=k} = \begin{cases} 1 & \quad \text{if } d=k \\ 0 & \quad \text{otherwise }. \end{cases} \]
再由\(\frac{\partial z_{k}^L}{\partial W_{kj}^L} = a_j^{L-1}\)代入得:
\[ \begin{align} \frac{\partial E}{\partial W_{kj}^L} &= \frac{\partial E}{\partial z_{k}^L} \frac{\partial z_{k}^L}{\partial W_{kj}^L} = (a_k^L - t_k) a_j^{L-1} \end{align} \]
偏袒b: \[ \begin{align} \frac{\partial E}{\partial b_{k}^L} &= \frac{\partial E}{\partial z_{k}^L} \frac{\partial z_{k}^L}{\partial b_{k}^L} = a_k^L - t_k \end{align} \]
其他层方法相似.