Backpropagation

1 Forward Propagation

This is more like a summary section.

We set $a^{[0]} = x$ for our input to the network and $\ell = 1,2,\dots,N$ where N is the number of layers of network. Then, we have

\[z^{[\ell]} = W^{[\ell]}a^{[\ell-1]} + b^{[\ell]}\] \[a^{[\ell]} = g^{[\ell]}(z^{[\ell]})\]

where $g^{[\ell]}$ is the same for all the layers except for last layer. For the last layer, we can do:

    1 regression then $g(x) = x$

    2 binary then $g(x) = sigmoid(x)$

    3 multi-class then $g(x) = softmax(x)$

Finally, we can have the output of the network $a^{[N]}$ and compute its loss.

For regression, we have:

\[\mathcal{L}(\hat{y},y) = \frac{1}{2}(\hat{y} - y)^2\]

For binary classification, we have:

\[\mathcal{L}(\hat{y},y) = -\bigg(y\log\hat{y} + (1-y)\log (1-\hat{y})\bigg)\]

For multi-classification, we have:

\[\mathcal{L}(\hat{y},y) = -\sum\limits_{j=1}^k\mathbb{1}\{y=j\}\log\hat{y}_j\]

Note that for multi-class, if we have $\hat{y}$ as a k-dimensional vector, we can calculate its cross-entropy for its loss:

\[\mathcal{L}(\hat{y},y) = -\sum\limits_{j=1}^ky_j\log\hat{y}_j\]

2 Backpropagation

We define that:

\[\delta^{[\ell]} = \triangledown_{z^{[\ell]}}\mathcal{L}(\hat{y},y)\]

So we have three steps for computing the gradient for any layer:

1 For output layer N, we have:

\[\delta^{[N]} = \triangledown_{z^{[N]}}\mathcal{L}(\hat{y},y)\]

For softmax function, since it is not performed element-wise, so you can directly caculate it as a whole. For sigmoid, it is applied element-wise, so we need to:

\[\triangledown_{z^{[N]}}\mathcal{L}(\hat{y},y) = \triangledown_{\hat{y}}\mathcal{L}(\hat{y},y)\circ (g^{[N]})^{\prime}(z^{[N]})\]

Note this is element-wise operation.

2 For $\ell = N-1,N-2,\dots,1$, we have:

\[\delta^{[\ell]} = (W^{[\ell+1]T}\delta^{[\ell+1]})\circ g^{\prime}(z^{[\ell]})\]

3 For each layer, we have:

\[\triangle_{W^{[\ell]}}J(W,b) = \delta^{[\ell]}a^{[\ell]T}\] \[\triangle_{b^{[\ell]}}J(W,b) = \delta^{[\ell]}\]

This can be directly used in coding, which acts like a formula.