Hi
in page 3 it is said that when calculating the derivative of the loss function wrt w1 the first three expressions are exactly delta2, but delta2 is the derivative of the loss function with respect to w2 and not z1 like in the first three expressions. actually we need to replace the final multiplication by z1 in delta2 with w2 to get the result of the first three expressions. Am I missing anything?