Coursera: Machine Learning (Week 4) [Assignment Solution] - Andrew NG

byAkshay Daga (APDaga) -يونيو 05, 2021

45

Coursera: Machine Learning (Week 4) [Assignment Solution] - Andrew NG

▸ One-vs-all logistic regression and neural networks to recognize hand-written digits.

I have recently completed the Machine Learning course from Coursera by Andrew NG.

While doing the course we have to go through various quizzes and assignments.

Here, I am sharing my solutions for the weekly assignments throughout the course.

These solutions are for reference only.

> It is recommended that you should solve the assignments by yourself honestly then only it makes sense to complete the course.

> But, In case you stuck in between, feel free to refer to the solutions provided by me.

NOTE:

Don't just copy-paste the code for the sake of completion.

Even if you copy the code, make sure you understand the code first.

Click here to check out week-3 assignment solutions, Scroll down for the solutions for week-4 assignment.

In this exercise, you will implement one-vs-all logistic regression and neural networks to recognize hand-written digits. Before starting the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.

Recommended Machine Learning Courses:
Coursera: Machine Learning
Coursera: Deep Learning Specialization
Coursera: Machine Learning with Python
Coursera: Advanced Machine Learning Specialization
Udemy: Machine Learning
LinkedIn: Machine Learning
Eduonix: Machine Learning
edX: Machine Learning
Fast.ai: Introduction to Machine Learning for Coders

It consists of the following files:

ex3.m - Octave/MATLAB script that steps you through part 1
ex3 nn.m - Octave/MATLAB script that steps you through part 2
ex3data1.mat - Training set of hand-written digits
ex3weights.mat - Initial weights for the neural network exercise
submit.m - Submission script that sends your solutions to our servers
displayData.m - Function to help visualize the dataset
fmincg.m - Function minimization routine (similar to fminunc)
sigmoid.m - Sigmoid function
[*] lrCostFunction.m - Logistic regression cost function
[*] oneVsAll.m - Train a one-vs-all multi-class classifier
[*] predictOneVsAll.m - Predict using a one-vs-all multi-class classifier
[*] predict.m - Neural network prediction function
Video - YouTube videos featuring Free IOT/ML tutorials

* indicates files you will need to complete

lrCostFunction.m :

function [J, grad] = lrCostFunction(theta, X, y, lambda)
  %LRCOSTFUNCTION Compute cost and gradient for logistic regression with 
  %regularization
  %   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
  %   theta as the parameter for regularized logistic regression and the
  %   gradient of the cost w.r.t. to the parameters. 
  
  % Initialize some useful values
  m = length(y); % number of training examples
  
  % You need to return the following variables correctly 
  J = 0;
  grad = zeros(size(theta));
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Compute the cost of a particular choice of theta.
  %               You should set J to the cost.
  %               Compute the partial derivatives and set grad to the partial
  %               derivatives of the cost w.r.t. each parameter in theta
  %
  % Hint: The computation of the cost function and gradients can be
  %       efficiently vectorized. For example, consider the computation
  %
  %           sigmoid(X * theta)
  %
  %       Each row of the resulting matrix will contain the value of the
  %       prediction for that example. You can make use of this to vectorize
  %       the cost function and gradient computations. 
  %
  % Hint: When computing the gradient of the regularized cost function, 
  %       there're many possible vectorized solutions, but one solution
  %       looks like:
  %           grad = (unregularized gradient for logistic regression)
  %           temp = theta; 
  %           temp(1) = 0;   % because we don't add anything for j = 0  
  %           grad = grad + YOUR_CODE_HERE (using the temp variable)
  %
  
  %DIMENSIONS: 
  %   theta = (n+1) x 1
  %   X     = m x (n+1)
  %   y     = m x 1
  %   grad  = (n+1) x 1
  %   J     = Scalar
  
  z   = X * theta;   % m x 1
  h_x = sigmoid(z);  % m x 1 
  
  reg_term = (lambda/(2*m)) * sum(theta(2:end).^2);
  
  J = (1/m)*sum((-y.*log(h_x))-((1-y).*log(1-h_x))) + reg_term; % scalar
  
  grad(1) = (1/m) * (X(:,1)'*(h_x-y));                                    % 1 x 1
  grad(2:end) = (1/m) * (X(:,2:end)'*(h_x-y)) + (lambda/m)*theta(2:end);  % n x 1
  
  % =============================================================
  
  grad = grad(:);
end

oneVsAll.m :

function [all_theta] = oneVsAll(X, y, num_labels, lambda)
  %ONEVSALL trains multiple logistic regression classifiers and returns all
  %the classifiers in a matrix all_theta, where the i-th row of all_theta 
  %corresponds to the classifier for label i
  %   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
  %   logistic regression classifiers and returns each of these classifiers
  %   in a matrix all_theta, where the i-th row of all_theta corresponds 
  %   to the classifier for label i
  
  % num_labels = No. of output classifier (Here, it is 10)
  
  % Some useful variables
  m = size(X, 1);        % No. of Training Samples == No. of Images : (Here, 5000) 
  n = size(X, 2);        % No. of features == No. of pixels in each Image : (Here, 400)
  
  % You need to return the following variables correctly 
  all_theta = zeros(num_labels, n + 1);  
  %DIMENSIONS: num_labels x (input_layer_size+1) == num_labels x (no_of_features+1) == 10 x 401
  
  %DIMENSIONS: X = m x input_layer_size
  %Here, 1 row in X represents 1 training Image of pixel 20x20
  
  % Add ones to the X data matrix
  X = [ones(m, 1) X];   %DIMENSIONS: X = m x (input_layer_size+1) = m x (no_of_features+1)
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: You should complete the following code to train num_labels
  %               logistic regression classifiers with regularization
  %               parameter lambda. 
  %
  % Hint: theta(:) will return a column vector.
  %
  % Hint: You can use y == c to obtain a vector of 1's and 0's that tell you
  %       whether the ground truth is true/false for this class.
  %
  % Note: For this assignment, we recommend using fmincg to optimize the cost
  %       function. It is okay to use a for-loop (for c = 1:num_labels) to
  %       loop over the different classes.
  %
  %       fmincg works similarly to fminunc, but is more efficient when we
  %       are dealing with large number of parameters.
  %
  % Example Code for fmincg:
  %
  %     % Set Initial theta
  %     initial_theta = zeros(n + 1, 1);
  %     
  %     % Set options for fminunc
  %     options = optimset('GradObj', 'on', 'MaxIter', 50);
  % 
  %     % Run fmincg to obtain the optimal theta
  %     % This function will return theta and the cost 
  %     [theta] = ...
  %         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
  %                 initial_theta, options);
  %
  
  initial_theta = zeros(n+1, 1);
  options = optimset('GradObj', 'on', 'MaxIter', 50);
  
  for c=1:num_labels
  all_theta(c,:) = ...
           fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
                   initial_theta, options);
  end
  
  % =========================================================================
end

predictOneVsAll.m :

function p = predictOneVsAll(all_theta, X)
  %PREDICT Predict the label for a trained one-vs-all classifier. The labels
  %are in the range 1..K, where K = size(all_theta, 1).
  %  p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions
  %  for each example in the matrix X. Note that X contains the examples in
  %  rows. all_theta is a matrix where the i-th row is a trained logistic
  %  regression theta vector for the i-th class. You should set p to a vector
  %  of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2
  %  for 4 examples)
  
  m = size(X, 1);     % No. of Input Examples to Predict (Each row = 1 Example)
  num_labels = size(all_theta, 1); %No. of Ouput Classifier
  
  % You need to return the following variables correctly
  p = zeros(size(X, 1), 1);    % No_of_Input_Examples x 1 == m x 1
  
  % Add ones to the X data matrix
  X = [ones(m, 1) X];
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Complete the following code to make predictions using
  %               your learned logistic regression parameters (one-vs-all).
  %               You should set p to a vector of predictions (from 1 to
  %               num_labels).
  %
  % Hint: This code can be done all vectorized using the max function.
  %       In particular, the max function can also return the index of the
  %       max element, for more information see 'help max'. If your examples
  %       are in rows, then, you can use max(A, [], 2) to obtain the max
  %       for each row.
  %
  % num_labels = No. of output classifier (Here, it is 10)
  % DIMENSIONS:
  % all_theta = 10 x 401 = num_labels x (input_layer_size+1) == num_labels x (no_of_features+1)
  
  prob_mat = X * all_theta';     % 5000 x 10 == no_of_input_image x num_labels
  [prob, p] = max(prob_mat,[],2); % m  x 1 
  %returns maximum element in each row  == max. probability and its index for each input image
  %p: predicted output (index)
  %prob: probability of predicted output
  
  %%%%%%%% WORKING: Computation per input image %%%%%%%%%
  % for i = 1:m                               % To iterate through each input sample
  %     one_image = X(i,:);                   % 1 x 401 == 1 x no_of_features
  %     prob_mat = one_image * all_theta';    % 1 x 10  == 1 x num_labels
  %     [prob, out] = max(prob_mat);
  %     %out: predicted output
  %     %prob: probability of predicted output
  %     p(i) = out;
  % end
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  
  %%%%%%%% WORKING %%%%%%%%%
  % for i = 1:m
  %     RX = repmat(X(i,:),num_labels,1);
  %     RX = RX .* all_theta;
  %     SX = sum(RX,2);
  %     [val, index] = max(SX);
  %     p(i) = index;
  % end
  %%%%%%%%%%%%%%%%%%%%%%%%%%
  % =========================================================================
end

Check-out our free tutorials on IOT (Internet of Things):

predict.m :

function p = predict(Theta1, Theta2, X)
  %PREDICT Predict the label of an input given a trained neural network
  %   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
  %   trained weights of a neural network (Theta1, Theta2)
  
  % Useful values
  m = size(X, 1);
  num_labels = size(Theta2, 1);
  
  % You need to return the following variables correctly 
  p = zeros(size(X, 1), 1);  % m x 1
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Complete the following code to make predictions using
  %               your learned neural network. You should set p to a 
  %               vector containing labels between 1 to num_labels.
  %
  % Hint: The max function might come in useful. In particular, the max
  %       function can also return the index of the max element, for more
  %       information see 'help max'. If your examples are in rows, then, you
  %       can use max(A, [], 2) to obtain the max for each row.
  %
  %DIMENSIONS:
  % theta1 = 25 x 401
  % theta2 = 10 x 26
  
  % layer1 (input)  = 400 nodes + 1bias
  % layer2 (hidden) = 25 nodes + 1bias 
  % layer3 (output) = 10 nodes
  % 
  % theta dimensions = S_(j+1) x ((S_j)+1)
  % theta1 = 25 x 401
  % theta2 = 10 x 26
  
  % theta1:
  %     1st row indicates: theta corresponding to all nodes from layer1 connecting to for 1st node of layer2
  %     2nd row indicates: theta corresponding to all nodes from layer1 connecting to for 2nd node of layer2
  %     and
  %     1st Column indicates: theta corresponding to node1 from layer1 to all nodes in layer2
  %     2nd Column indicates: theta corresponding to node2 from layer1 to all nodes in layer2
  %     
  % theta2:
  %     1st row indicates: theta corresponding to all nodes from layer2 connecting to for 1st node of layer3
  %     2nd row indicates: theta corresponding to all nodes from layer2 connecting to for 2nd node of layer3
  %     and
  %     1st Column indicates: theta corresponding to node1 from layer2 to all nodes in layer3
  %     2nd Column indicates: theta corresponding to node2 from layer2 to all nodes in layer3
      
  a1 = [ones(m,1) X]; % 5000 x 401 == no_of_input_images x no_of_features % Adding 1 in X 
  %No. of rows = no. of input images
  %No. of Column = No. of features in each image
  
  z2 = a1 * Theta1';  % 5000 x 25
  a2 = sigmoid(z2);   % 5000 x 25
 
  a2 =  [ones(size(a2,1),1) a2];  % 5000 x 26
  
  z3 = a2 * Theta2';  % 5000 x 10
  a3 = sigmoid(z3);  % 5000 x 10
  
  [prob, p] = max(a3,[],2); 
  %returns maximum element in each row  == max. probability and its index for each input image
  %p: predicted output (index)
  %prob: probability of predicted output
  
  % =========================================================================
end

I tried to provide optimized solutions like vectorized implementation for each assignment. If you think that more optimization can be done, then put suggest the corrections / improvements.

--------------------------------------------------------------------------------

Click here to see solutions for all Machine Learning Coursera Assignments.

&

Click here to see more codes for Raspberry Pi 3 and similar Family.

&

Click here to see more codes for NodeMCU ESP8266 and similar Family.

&

Click here to see more codes for Arduino Mega (ATMega 2560) and similar Family.

Feel free to ask doubts in the comment section. I will try my best to solve it.

If you find this helpful by any mean like, comment and share the post.

This is the simplest way to encourage me to keep doing such work.

Thanks and Regards,

-Akshay P. Daga

45 تعليقات

Unknown3 ديسمبر 2018 في 8:02 م
hey!
In predict.m file theta should be = 25*401 not 26*401;

wrong:
% theta dimensions = S_(j+1) x ((S_j)+1)
% theta1 = 26 x 401
% theta2 = 10 x 26
correct:
% theta dimensions = S_(j+1) x ((S_j)+1)
% theta1 = 25 x 401
% theta2 = 10 x 26
ردحذف
الردود
harshita28 ديسمبر 2018 في 3:19 ص
predict.m is not working
ردحذف
الردود
Unknown28 يناير 2019 في 11:51 ص
Hey, could you explain how "[prob, p] = max(a3,[],2);" is working in predict.m
ردحذف
الردود
aspiring DS11 مارس 2019 في 2:16 ص
Hi Iam getting error =: nonconformant arguments (op1 is 1x1, op2 is 1x2) at line using the code grad(1) = (1/m) * (X(:,1)'*(h_x-y)); in IrCostFunction
ردحذف
الردود
as19 يونيو 2019 في 1:28 م
Sigmoid function is missing in predictOneVsAll
ردحذف
الردود
Unknown6 أغسطس 2019 في 12:55 ص
will you please tell me what is t here?

@(t)(lrCostFunction(t, X, (y == c), lambda)
ردحذف
الردود
Blitz24 مارس 2020 في 7:16 م
why do to separate grad into two line? like seen below
grad(1) = (1/m) * (X(:,1)'*(h_x-y));
grad(2:end) = (1/m) * (X(:,2:end)'*(h_x-y)) + (lambda/m)*theta(2:end);

Just writing it as
grad = (1/m) * (X'*(h_x-y)) + (lambda/m)*theta;
works fine or am i missing something here?
ردحذف
الردود
Rohan Patil29 مارس 2020 في 12:01 م
Thankyou for your help it's really great of you , i just wanted to know 2 things

(1) always i start with an programming assignment i get really confused and dont understand where and how to start , so i first refer to your code understand it thoroughly and proceed with the assignment , i wanted to know how correct it is to do

(2) why have we used [prob , p] and and what are it's further intuations in the code , i mean why have we used 2 variables 'prob' & 'p'
ردحذف
الردود
Akshay27 أبريل 2020 في 2:08 م
Hi Akshay

Thanks for creating this amazing forum for us like minded people. Had a couple of queries:

1. Am not able to understand the variables of fmincg function (despite of using 'help'. It would be great if someone could help me with the same !

2. What do the three dots (...) in the line preceding the fmincg function specify ? Why are they needed ? (tried running the function without them but it pointed out as syntax error !

Thanks in advance.
ردحذف
الردود
Qwert12324 مايو 2020 في 8:15 م
None of the coed are working, getting 0/100
ردحذف
الردود
Unknown29 مايو 2020 في 7:34 م
how were you able to solve onevsall.m predictOneVsAll.m and predict.m bc i am trying to understand the problem and i am not getting how should i solve it
ردحذف
الردود
Aravindh7 يونيو 2020 في 10:10 ص
Can anyone explain what "theta_t" is? Why and how they coose some random value "[-2; -1; 1; 2]" (in ex.m).
ردحذف
الردود
Unknown28 يونيو 2020 في 9:46 م
Hi Akshay ,
It is showing error as unprecedented parameter name 'GrabObj'
ردحذف
الردود
Kailas5 يوليو 2020 في 7:32 م
Hi Akshay,
In OneVsall.m,it is saying IrCostFunction is undefined.
Why is it so?
ردحذف
الردود
Vedant Patil21 يوليو 2020 في 7:05 م
Hello,

Can you help me resolve this
octave:7> oneVsAll.m
error: 'X' undefined near line 11 column 10
error: called from
oneVsAll at line 11 column 3
ردحذف
الردود
Prateek Srivastava9 أغسطس 2020 في 3:05 م
Hi..... I used same to same implementation but the cost of my set is coming out to be 45.73 in contrast to the expected cost of 2.53.
I am using the same logic as yours but I dont know why is this happening.
Can you plz help me out?
ردحذف
الردود
pratik10 أغسطس 2020 في 9:42 ص
Hi Akshay,
I have used the same code as yours in predict.m
Within the exercise code i am getting training exercise accuracy as expected (97.5%). Also the digit is also being recognized correctly.

But when i am submitting the code for grading, i am getting the following error:

!! Submission failed: unexpected error: Index exceeds the number of array elements (16).
!! Please try again later.

Thanks in advance for the help.
ردحذف
الردود
jimmy12 أغسطس 2020 في 3:40 ص
Could you please explain the line all_theta(c,:) = ... in onevsall. I got stuck for this an hour
ردحذف
الردود
Unknown9 سبتمبر 2020 في 4:22 م
I dont know , i am getting iteration and cost on output console here i am posting some of them. Please help as i am stuck there for more than one day.

Iteration 16 | Cost: 1.018509e-01
Iteration 17 | Cost: 1.018509e-01
Iteration 18 | Cost: 1.018509e-01
Iteration 19 | Cost: 1.018509e-01
Iteration 20 | Cost: 1.018509e-01
Iteration 21 | Cost: 1.018509e-01
Iteration 22 | Cost: 1.018509e-01
Iteration 23 | Cost: 1.018509e-01
Iteration 24 | Cost: 1.018509e-01
Iteration 25 | Cost: 1.018509e-01
Iteration 26 | Cost: 1.018509e-01

all_theta =

-0.5595 0.6192 -0.5504 -0.0935
-5.4744 -0.4716 1.2613 0.6349
0.0684 -0.3756 -1.6523 -1.4101
ردحذف
الردود
Theresa19 سبتمبر 2020 في 10:47 م
Hi could you please help me? this is my code on lrcostfunction:

H = sigmoid(X*theta);
T = y.*log(H) + (1 - y).*log(1 - H);
J = -1/m*sum(T) + lambda/(2*m)*sum(theta(2:end).^2);

ta = [0; theta(2:end)];
grad = X'*(H - y)/m + lambda/m*ta;

but im getting this error:

>> lrCostFunction
Not enough input arguments.

Error in lrCostFunction (line 9)
m = length(y); % number of training examples

I try using your code to check if i was wrong but i got the same error could you help me? please
ردحذف
الردود
Unknown31 أكتوبر 2020 في 11:59 ص
Hey, I have question and that is when we were calculating grad in week 3 assignment we include
grad(1) = (1/m)* sum(X(:,1)'*(hx-y));
grad(2:end) = (1/m)* sum(X(:,2:end)'*(hx-y))+(lambda/m)*theta(2:end);
Now, when we calculate in week 4 we remove "sum" in both equations, my question is why we remove sum and when I calculate with sum it's provides wrong answer.
ردحذف
الردود
Cron20 فبراير 2021 في 6:34 ص
Hi
for the oneVsAll.m problem, how would the code look like if you don't use the fmincg function,
I'm kinda lost on the process of how to get all_theta
ردحذف
الردود
rths27 أبريل 2021 في 9:51 م
can you send submit.m and submit confg file of the of this experiment
ردحذف
الردود

إضافة تعليق

إرسال تعليق