| \(N, c\) | 
the total number of training examples, number of classes of output | 
| \(x^{(n)}, y^{(n)}\) | 
the superscript \({(n)}\) denote the variable for the \(n^{th}\) individual training example | 
| \(x^{[l]}, w^{[l]}, a^{[l]}\) | 
the superscript \({[l]}\) denote the \(l^{th}\) layer, can drop \([]\) on single training example | 
| \(z^{x,l}, a^{x,l}\) | 
the z, activition in the \(l^{th}\) on the \(x^{th}\) training example | 
| \(n_H^{[l]},n_W^{[l]},n_C^{[l]}\) | 
the numbers of height, width, channel in the \(l^{th}\) layer | 
| \(x_i, y_i, b_i\) | 
the subscript \(i\) denote the \(i^{th}\) elem of the vector \(\mathbf{x, y, b}\) | 
| \(w^l_{jk}\) | 
weights for connections from the \(k^{th}\) neuron in the \((l-1)^{th}\) layer to the \(j^{th}\) neuon in the \(l^{th}\) | 
| \(b^l_j\) | 
the the bias of the \(j^{th}\) neuron in the \(l^{th}\) layer | 
| \(a^l_j\) | 
the activation of the \(j^{th}\) neuron in the \(l^{th}\) layer | 
| \(\sigma\left(\right)\) | 
the activation function(or vectorized activation function) | 
| \(\mathbf{z}^l = \mathbf{w}^l a^{l-1}+\mathbf{b}^l\) | 
the intermediate quantity(vector) called weighted input | 
| \(\begin{eqnarray} a^{l}_j = \sigma\left( \sum_k w^{l}_{jk} a^{l-1}_k + b^l_j \right)\end{eqnarray}\) | 
the the activation \(a^{l}_j\) in the in component form | 
| \(\begin{eqnarray} a^{l} = \sigma(w^l a^{l-1}+b^l) = \sigma(z^l)\end{eqnarray}\) | 
the the activation \(a^{l}\) in a matrix form | 
| \(Cost(), C, Loss(), L, Error(), E\) | 
the cost function | 
| \(\partial C / \partial w, \partial C / \partial b\) | 
the partical derivatives that cost function with respect to any weights w and bias b in the network | 
| \(y = y(x), t = t(x)\) | 
the corresponding desired ouput over the individual input x | 
| \(s \odot t\) | 
element wise product of the two vectors | 
| \(\sigma^l_j = \dfrac{\partial C}{\partial z^l_j}\) | 
the error of the \(j^{th}\) neuron in the \(l^{th}\) layer | 
| \(\nabla_a C\) | 
vector of the partial dervatives \(\partial C / \partial a^L_j\), the rate of change of C with respect to the output activations | 
| \(\begin{eqnarray} \delta^L = \nabla_a C \odot \sigma'(z^L) \end{eqnarray}\) | 
the error of the ouput layer in matrix-based form | 
| \(\begin{eqnarray} \delta^L = (a^L-y) \odot \sigma'(z^L) \end{eqnarray}\) | 
error in the case of the quadratic cost | 
| \(\begin{eqnarray} \delta^l = ((w^{l+1})^T \delta^{l+1}) \odot \sigma'(z^l) \end{eqnarray}\) | 
the error of nuerons in other layer out of the last layer | 
| \(\begin{eqnarray} \dfrac{\partial C}{\partial w^l_{jk}} = a^{l-1}_k \delta^l_j \end{eqnarray}\) | 
the rate of change of the cost with respect to any weight | 
| \(\begin{eqnarray} \dfrac{\partial C}{\partial w^l} =\delta^l (a^{l-1})^T \end{eqnarray}\) | 
matrix-based form: the rate of change of the cost with respect to any weight | 
| \(\begin{eqnarray} \dfrac{\partial C}{\partial b^l_j} = \delta^l_j \end{eqnarray}\) | 
the rate of change of the cost with respect to any bias in the network | 
| \(\begin{eqnarray} \dfrac{\partial C}{\partial b} = \delta \end{eqnarray}\) | 
matrix-based form: the rate of change of the cost with respect to any bias in the network |