代价函数正则化（在上面的代码下添加）：

反向传播

Sigmoid导数的实现（sigmoidGradient.m）：

functiong = sigmoidGradient(z)%SIGMOIDGRADIENT returns the gradient of the sigmoid function%evaluated at z% g = SIGMOIDGRADIENT(z) computes the gradient of the sigmoid function% evaluated at z. This should work regardless if z is a matrix or a% vector. In particular, if z is a vector or matrix, you should return% the gradient for each element.
g = zeros(size(z));
% ====================== YOUR CODE HERE ======================% Instructions: Compute the gradient of the sigmoid function evaluated at% each value of z (z can be a matrix, vector or scalar).
g = sigmoid(z).*(1-sigmoid(z));
% =============================================================end

随机初始化（randInitializeWeights.m）（因为权重不能全为0嘛，笔记上解释了为什么）：

functionW = randInitializeWeights(L_in, L_out)%RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in%incoming connections and L_out outgoing connections% W = RANDINITIALIZEWEIGHTS(L_in, L_out) randomly initializes the weights % of a layer with L_in incoming connections and L_out outgoing % connections. %% Note that W should be set to a matrix of size(L_out, 1 + L_in) as% the first column of W handles the "bias" terms%% You need to return the following variables correctly
W = zeros(L_out, 1 + L_in);
% ====================== YOUR CODE HERE ======================% Instructions: Initialize W randomly so that we break the symmetry while% training the neural network.%% Note: The first column of W corresponds to the parameters for the bias unit%
epsilon_init = 0.12; %这个数字要小一点从而保证较高的学习效率
W = rand(L_out, 1+L_in)*2*epsilon-epsilon_init;
% =========================================================================end