Coursera: Machine Learning (Week 4) [Assignment Solution] - Andrew NG

Coursera: Machine Learning (Week 4) [Assignment Solution] - Andrew NG
▸ One-vs-all logistic regression and neural networks to recognize hand-written digits.


I have recently completed the Machine Learning course from Coursera by Andrew NG.

While doing the course we have to go through various quizzes and assignments.

Here, I am sharing my solutions for the weekly assignments throughout the course.

These solutions are for reference only.

It is recommended that you should solve the assignments by yourself honestly then only it makes sense to complete the course.
But, In case you stuck in between, feel free to refer to the solutions provided by me.

NOTE:

Don't just copy-paste the code for the sake of completion. 
Even if you copy the code, make sure you understand the code first.

Click here to check out week-3 assignment solutions, Scroll down for the solutions for week-4 assignment.

In this exercise, you will implement one-vs-all logistic regression and neural networks to recognize hand-written digits. Before starting the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.


It consists of the following files:
  • ex3.m - Octave/MATLAB script that steps you through part 1
  • ex3 nn.m - Octave/MATLAB script that steps you through part 2
  • ex3data1.mat - Training set of hand-written digits
  • ex3weights.mat - Initial weights for the neural network exercise
  • submit.m - Submission script that sends your solutions to our servers
  • displayData.m - Function to help visualize the dataset
  • fmincg.m - Function minimization routine (similar to fminunc)
  • sigmoid.m - Sigmoid function
  • [*] lrCostFunction.m - Logistic regression cost function
  • [*] oneVsAll.m - Train a one-vs-all multi-class classifier
  • [*] predictOneVsAll.m - Predict using a one-vs-all multi-class classifier
  • [*] predict.m - Neural network prediction function
  • Video - YouTube videos featuring Free IOT/ML tutorials
* indicates files you will need to complete






lrCostFunction.m :

function [J, grad] = lrCostFunction(theta, X, y, lambda)
%LRCOSTFUNCTION Compute cost and gradient for logistic regression with
%regularization
% J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
%
% Hint: The computation of the cost function and gradients can be
% efficiently vectorized. For example, consider the computation
%
% sigmoid(X * theta)
%
% Each row of the resulting matrix will contain the value of the
% prediction for that example. You can make use of this to vectorize
% the cost function and gradient computations.
%
% Hint: When computing the gradient of the regularized cost function,
% there're many possible vectorized solutions, but one solution
% looks like:
% grad = (unregularized gradient for logistic regression)
% temp = theta;
% temp(1) = 0; % because we don't add anything for j = 0
% grad = grad + YOUR_CODE_HERE (using the temp variable)
%

%DIMENSIONS:
% theta = (n+1) x 1
% X = m x (n+1)
% y = m x 1
% grad = (n+1) x 1
% J = Scalar

z = X * theta; % m x 1
h_x = sigmoid(z); % m x 1

reg_term = (lambda/(2*m)) * sum(theta(2:end).^2);

J = (1/m)*sum((-y.*log(h_x))-((1-y).*log(1-h_x))) + reg_term; % scalar

grad(1) = (1/m) * (X(:,1)'*(h_x-y)); % 1 x 1
grad(2:end) = (1/m) * (X(:,2:end)'*(h_x-y)) + (lambda/m)*theta(2:end); % n x 1

% =============================================================

grad = grad(:);
end




oneVsAll.m :

function [all_theta] = oneVsAll(X, y, num_labels, lambda)
%ONEVSALL trains multiple logistic regression classifiers and returns all
%the classifiers in a matrix all_theta, where the i-th row of all_theta
%corresponds to the classifier for label i
% [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
% logistic regression classifiers and returns each of these classifiers
% in a matrix all_theta, where the i-th row of all_theta corresponds
% to the classifier for label i

% num_labels = No. of output classifier (Here, it is 10)

% Some useful variables
m = size(X, 1); % No. of Training Samples == No. of Images : (Here, 5000)
n = size(X, 2); % No. of features == No. of pixels in each Image : (Here, 400)

% You need to return the following variables correctly
all_theta = zeros(num_labels, n + 1);
%DIMENSIONS: num_labels x (input_layer_size+1) == num_labels x (no_of_features+1) == 10 x 401

%DIMENSIONS: X = m x input_layer_size
%Here, 1 row in X represents 1 training Image of pixel 20x20

% Add ones to the X data matrix
X = [ones(m, 1) X]; %DIMENSIONS: X = m x (input_layer_size+1) = m x (no_of_features+1)

% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the following code to train num_labels
% logistic regression classifiers with regularization
% parameter lambda.
%
% Hint: theta(:) will return a column vector.
%
% Hint: You can use y == c to obtain a vector of 1's and 0's that tell you
% whether the ground truth is true/false for this class.
%
% Note: For this assignment, we recommend using fmincg to optimize the cost
% function. It is okay to use a for-loop (for c = 1:num_labels) to
% loop over the different classes.
%
% fmincg works similarly to fminunc, but is more efficient when we
% are dealing with large number of parameters.
%
% Example Code for fmincg:
%
% % Set Initial theta
% initial_theta = zeros(n + 1, 1);
%
% % Set options for fminunc
% options = optimset('GradObj', 'on', 'MaxIter', 50);
%
% % Run fmincg to obtain the optimal theta
% % This function will return theta and the cost
% [theta] = ...
% fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
% initial_theta, options);
%

initial_theta = zeros(n+1, 1);
options = optimset('GradObj', 'on', 'MaxIter', 50);

for c=1:num_labels
all_theta(c,:) = ...
fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
initial_theta, options);
end

% =========================================================================
end




predictOneVsAll.m :

function p = predictOneVsAll(all_theta, X)
%PREDICT Predict the label for a trained one-vs-all classifier. The labels
%are in the range 1..K, where K = size(all_theta, 1).
% p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions
% for each example in the matrix X. Note that X contains the examples in
% rows. all_theta is a matrix where the i-th row is a trained logistic
% regression theta vector for the i-th class. You should set p to a vector
% of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2
% for 4 examples)

m = size(X, 1); % No. of Input Examples to Predict (Each row = 1 Example)
num_labels = size(all_theta, 1); %No. of Ouput Classifier

% You need to return the following variables correctly
p = zeros(size(X, 1), 1); % No_of_Input_Examples x 1 == m x 1

% Add ones to the X data matrix
X = [ones(m, 1) X];

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
% your learned logistic regression parameters (one-vs-all).
% You should set p to a vector of predictions (from 1 to
% num_labels).
%
% Hint: This code can be done all vectorized using the max function.
% In particular, the max function can also return the index of the
% max element, for more information see 'help max'. If your examples
% are in rows, then, you can use max(A, [], 2) to obtain the max
% for each row.
%
% num_labels = No. of output classifier (Here, it is 10)
% DIMENSIONS:
% all_theta = 10 x 401 = num_labels x (input_layer_size+1) == num_labels x (no_of_features+1)

prob_mat = X * all_theta'; % 5000 x 10 == no_of_input_image x num_labels
[prob, p] = max(prob_mat,[],2); % m x 1
%returns maximum element in each row == max. probability and its index for each input image
%p: predicted output (index)
%prob: probability of predicted output

%%%%%%%% WORKING: Computation per input image %%%%%%%%%
% for i = 1:m % To iterate through each input sample
% one_image = X(i,:); % 1 x 401 == 1 x no_of_features
% prob_mat = one_image * all_theta'; % 1 x 10 == 1 x num_labels
% [prob, out] = max(prob_mat);
% %out: predicted output
% %prob: probability of predicted output
% p(i) = out;
% end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%% WORKING %%%%%%%%%
% for i = 1:m
% RX = repmat(X(i,:),num_labels,1);
% RX = RX .* all_theta;
% SX = sum(RX,2);
% [val, index] = max(SX);
% p(i) = index;
% end
%%%%%%%%%%%%%%%%%%%%%%%%%%
% =========================================================================
end

Check-out our free tutorials on IOT (Internet of Things):







predict.m :

function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
% p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
% trained weights of a neural network (Theta1, Theta2)

% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);

% You need to return the following variables correctly
p = zeros(size(X, 1), 1); % m x 1

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
% your learned neural network. You should set p to a
% vector containing labels between 1 to num_labels.
%
% Hint: The max function might come in useful. In particular, the max
% function can also return the index of the max element, for more
% information see 'help max'. If your examples are in rows, then, you
% can use max(A, [], 2) to obtain the max for each row.
%
%DIMENSIONS:
% theta1 = 25 x 401
% theta2 = 10 x 26

% layer1 (input) = 400 nodes + 1bias
% layer2 (hidden) = 25 nodes + 1bias
% layer3 (output) = 10 nodes
%
% theta dimensions = S_(j+1) x ((S_j)+1)
% theta1 = 25 x 401
% theta2 = 10 x 26

% theta1:
% 1st row indicates: theta corresponding to all nodes from layer1 connecting to for 1st node of layer2
% 2nd row indicates: theta corresponding to all nodes from layer1 connecting to for 2nd node of layer2
% and
% 1st Column indicates: theta corresponding to node1 from layer1 to all nodes in layer2
% 2nd Column indicates: theta corresponding to node2 from layer1 to all nodes in layer2
%
% theta2:
% 1st row indicates: theta corresponding to all nodes from layer2 connecting to for 1st node of layer3
% 2nd row indicates: theta corresponding to all nodes from layer2 connecting to for 2nd node of layer3
% and
% 1st Column indicates: theta corresponding to node1 from layer2 to all nodes in layer3
% 2nd Column indicates: theta corresponding to node2 from layer2 to all nodes in layer3

a1 = [ones(m,1) X]; % 5000 x 401 == no_of_input_images x no_of_features % Adding 1 in X
%No. of rows = no. of input images
%No. of Column = No. of features in each image

z2 = a1 * Theta1'; % 5000 x 25
a2 = sigmoid(z2); % 5000 x 25

a2 = [ones(size(a2,1),1) a2]; % 5000 x 26

z3 = a2 * Theta2'; % 5000 x 10
a3 = sigmoid(z3); % 5000 x 10

[prob, p] = max(a3,[],2);
%returns maximum element in each row == max. probability and its index for each input image
%p: predicted output (index)
%prob: probability of predicted output

% =========================================================================
end

I tried to provide optimized solutions like vectorized implementation for each assignment. If you think that more optimization can be done, then put suggest the corrections / improvements.

--------------------------------------------------------------------------------
Click here to see solutions for all Machine Learning Coursera Assignments.
&
Click here to see more codes for Raspberry Pi 3 and similar Family.
&
Click here to see more codes for NodeMCU ESP8266 and similar Family.
&
Click here to see more codes for Arduino Mega (ATMega 2560) and similar Family.

Feel free to ask doubts in the comment section. I will try my best to solve it.
If you find this helpful by any mean like, comment and share the post.
This is the simplest way to encourage me to keep doing such work.

Thanks and Regards,
-Akshay P. Daga





45 Comments

  1. hey!
    In predict.m file theta should be = 25*401 not 26*401;

    wrong:
    % theta dimensions = S_(j+1) x ((S_j)+1)
    % theta1 = 26 x 401
    % theta2 = 10 x 26
    correct:
    % theta dimensions = S_(j+1) x ((S_j)+1)
    % theta1 = 25 x 401
    % theta2 = 10 x 26

    ReplyDelete
  2. Hey, could you explain how "[prob, p] = max(a3,[],2);" is working in predict.m

    ReplyDelete
  3. Hi Iam getting error =: nonconformant arguments (op1 is 1x1, op2 is 1x2) at line using the code grad(1) = (1/m) * (X(:,1)'*(h_x-y)); in IrCostFunction

    ReplyDelete
    Replies
    1. Mentioned error says there is some matrix dimension mismatch in variable op1 & op2.
      I don't see any variables as op1 & op2 in my code.
      Please check once again.

      Delete
    2. Hi Akshay
      I am having the same problem too when trying to submit my solutions. The error message is:
      !! Submission failed: product: nonconformant arguments (op1 is 20x3, op2 is 3x1)
      Function: lrCostFunction
      LineNumber: 46
      Appreciate your help to troubleshoot this? Thanks

      Delete
    3. I got the same error and after I have figured it out. It is because of wrong implementation of sigmoid. you might have writing code as g = 1/(1+exp(-z)) but z can be matrix so operation should be element wise. find out correct implementation.

      ex = exp(z.*(-1));
      din = 1.+ex;
      g = 1./din;

      Delete
  4. Sigmoid function is missing in predictOneVsAll

    ReplyDelete
    Replies
    1. Sigmoid is not used as we need to get the maximum value of Theta*x
      as h(x) =Sigmoid(1/(1+e^theta*x)).
      this E (0,1)
      To predict the value to highest we need theta*x as maximum.
      Hence sigmoid is not used.

      Delete
  5. will you please tell me what is t here?

    @(t)(lrCostFunction(t, X, (y == c), lambda)

    ReplyDelete
  6. why do to separate grad into two line? like seen below
    grad(1) = (1/m) * (X(:,1)'*(h_x-y));
    grad(2:end) = (1/m) * (X(:,2:end)'*(h_x-y)) + (lambda/m)*theta(2:end);

    Just writing it as
    grad = (1/m) * (X'*(h_x-y)) + (lambda/m)*theta;
    works fine or am i missing something here?

    ReplyDelete
    Replies
    1. As per the theory, we don't do regularization for first term. and we apply regularization from 2nd term onward. that's why we have to do it separately.

      Watch the related theory video once again carefully.

      Delete
  7. Thankyou for your help it's really great of you , i just wanted to know 2 things

    (1) always i start with an programming assignment i get really confused and dont understand where and how to start , so i first refer to your code understand it thoroughly and proceed with the assignment , i wanted to know how correct it is to do

    (2) why have we used [prob , p] and and what are it's further intuations in the code , i mean why have we used 2 variables 'prob' & 'p'

    ReplyDelete
    Replies
    1. Hi Rohan,
      (1) I think you should understand the problem first, then try to solve it your way. and if stuck in between or couldn't understand the problem then only you should check out my code for understanding purpose and then start solving your assignment. (Please don't just copy paste the code as it is)

      (2) In predict function, we calculate probability for each class (for multi-class problem) then find out the maximum probability.
      "prob" variable has value of probability and "p" variable has index of probability.
      more the probability means more matching. then we use variable "p" to represent predicted class (category). which is nothing but the index of the maximum probability (prob).

      I hope, I made it clear. If you still find it difficult to understand, please go through the theory lecture once again.

      Delete
    2. absolutely clear , thanks for the support

      Delete
  8. Hi Akshay

    Thanks for creating this amazing forum for us like minded people. Had a couple of queries:

    1. Am not able to understand the variables of fmincg function (despite of using 'help'. It would be great if someone could help me with the same !

    2. What do the three dots (...) in the line preceding the fmincg function specify ? Why are they needed ? (tried running the function without them but it pointed out as syntax error !

    Thanks in advance.

    ReplyDelete
    Replies
    1. Thank you very much for your appreciation.
      1. fmincg is explained a little bit in theory lecture. (Honestly, Even I have to check it in details)

      2. Three dots (...) are nothing but "Lin Continuation character" in MATLAB.
      DESCRIPTION: Three or more periods at the end of a line continues the current command on the next line. If three or more periods occur before the end of a line, then MATLAB ignores the rest of the line and continues to the next line. This effectively makes a comment out of anything on the current line that follows the three periods.

      Delete
  9. None of the coed are working, getting 0/100

    ReplyDelete
    Replies
    1. Hi Qwert123, I think you are doing something wrong. Because the codes were 100% working for me and they are still working for many of my viewers. (you can get idea from comments).
      And anyways, these codes are just for understanding. Get the idea from the above codes and make your own solution and try to submit.
      Thank you.

      Delete
  10. how were you able to solve onevsall.m predictOneVsAll.m and predict.m bc i am trying to understand the problem and i am not getting how should i solve it

    ReplyDelete
  11. Can anyone explain what "theta_t" is? Why and how they coose some random value "[-2; -1; 1; 2]" (in ex.m).

    ReplyDelete
  12. Hi Akshay ,
    It is showing error as unprecedented parameter name 'GrabObj'

    ReplyDelete
  13. Hi Akshay,
    In OneVsall.m,it is saying IrCostFunction is undefined.
    Why is it so?

    ReplyDelete
  14. Hello,

    Can you help me resolve this
    octave:7> oneVsAll.m
    error: 'X' undefined near line 11 column 10
    error: called from
    oneVsAll at line 11 column 3

    ReplyDelete
    Replies
    1. Instead of running oneVsAll.m file, please run the (.m) file in which all above function are called. Don't run those individual (.m) files in which the functions are defined.

      Delete
  15. Hi..... I used same to same implementation but the cost of my set is coming out to be 45.73 in contrast to the expected cost of 2.53.
    I am using the same logic as yours but I dont know why is this happening.
    Can you plz help me out?

    ReplyDelete
    Replies
    1. Did you find the solution? Cos am having the same problem here.

      Delete
    2. I found the solution. His vectorizing formulas are wrong. He needed to use scalar multipication in some of them. Try the code below. It works %100

      z = X * theta; % m x 1
      h_x = sigmoid(z); % m x 1

      reg_term = (lambda/(2*m)) .* sum(theta(2:end).^2);

      J = (1/m).*sum((-y.*log(h_x))-((1-y).*log(1-h_x))) + reg_term; % scalar

      grad(1) = (1/m). * (X(:,1)'*(h_x-y)); % 1 x 1
      grad(2:end) = (1/m). * (X(:,2:end)'*(h_x-y)) + (lambda/m).*theta(2:end); % n x 1

      Delete
    3. @Ozan Kocabs All vectorized implemented formulas provided by me are 100% right.

      When you multiply a scalar (constant) with any matrix, you don't have to use ".*" (dot star), only "*" (star) is enough to multiply all the elements of the matrix by that constant.

      You might have some other mistake which caused the different cost value.
      Please check and find out the correct root cause of your problem.

      NOTE: For 2nd check, I ran my code once again and tested it just now and it is giving the correct output.
      ...
      Testing lrCostFunction() with regularization
      Cost: 2.534819
      Expected cost: 2.534819
      ...

      Delete
    4. I dont know why it resulted in 5 different values in my results. It was like 5x1 matrice all resulting 45,73 and after i put some scalar multipication problem solved. I have just used your code once again and it worked. U are right. But i dont know why it didnt work at first. Thanks you mate. You are a life saver:)

      Delete
  16. Hi Akshay,
    I have used the same code as yours in predict.m
    Within the exercise code i am getting training exercise accuracy as expected (97.5%). Also the digit is also being recognized correctly.

    But when i am submitting the code for grading, i am getting the following error:

    !! Submission failed: unexpected error: Index exceeds the number of array elements (16).
    !! Please try again later.

    Thanks in advance for the help.

    ReplyDelete
    Replies
    1. Please compare your code with the one given above and check if the dimensions are matching or not. Please use the comments given in each in above code. That will help you understand what that particular line of code signifies.

      Delete
  17. Could you please explain the line all_theta(c,:) = ... in onevsall. I got stuck for this an hour

    ReplyDelete
  18. I dont know , i am getting iteration and cost on output console here i am posting some of them. Please help as i am stuck there for more than one day.

    Iteration 16 | Cost: 1.018509e-01
    Iteration 17 | Cost: 1.018509e-01
    Iteration 18 | Cost: 1.018509e-01
    Iteration 19 | Cost: 1.018509e-01
    Iteration 20 | Cost: 1.018509e-01
    Iteration 21 | Cost: 1.018509e-01
    Iteration 22 | Cost: 1.018509e-01
    Iteration 23 | Cost: 1.018509e-01
    Iteration 24 | Cost: 1.018509e-01
    Iteration 25 | Cost: 1.018509e-01
    Iteration 26 | Cost: 1.018509e-01

    all_theta =

    -0.5595 0.6192 -0.5504 -0.0935
    -5.4744 -0.4716 1.2613 0.6349
    0.0684 -0.3756 -1.6523 -1.4101

    ReplyDelete
  19. Hi could you please help me? this is my code on lrcostfunction:

    H = sigmoid(X*theta);
    T = y.*log(H) + (1 - y).*log(1 - H);
    J = -1/m*sum(T) + lambda/(2*m)*sum(theta(2:end).^2);

    ta = [0; theta(2:end)];
    grad = X'*(H - y)/m + lambda/m*ta;

    but im getting this error:

    >> lrCostFunction
    Not enough input arguments.

    Error in lrCostFunction (line 9)
    m = length(y); % number of training examples

    I try using your code to check if i was wrong but i got the same error could you help me? please

    ReplyDelete
  20. Hey, I have question and that is when we were calculating grad in week 3 assignment we include
    grad(1) = (1/m)* sum(X(:,1)'*(hx-y));
    grad(2:end) = (1/m)* sum(X(:,2:end)'*(hx-y))+(lambda/m)*theta(2:end);
    Now, when we calculate in week 4 we remove "sum" in both equations, my question is why we remove sum and when I calculate with sum it's provides wrong answer.

    ReplyDelete
    Replies
    1. I don't see any sum function used in calculating grad even in assignment 3.
      Here is the link for assignment 3 solution- https://www.apdaga.com/2018/06/coursera-machine-learning-week-3.html#costFunctionReg
      Please check it out.

      Delete
  21. Hi
    for the oneVsAll.m problem, how would the code look like if you don't use the fmincg function,
    I'm kinda lost on the process of how to get all_theta

    ReplyDelete
  22. can you send submit.m and submit confg file of the of this experiment

    ReplyDelete
Post a Comment
Previous Post Next Post