Coursera: Machine Learning (Week 9) [Assignment Solution] - Andrew NG

▸ Anomaly detection algorithm to detect failing servers on a network.
▸ Collaborative filtering to build a recommender system for movies.

I have recently completed the Machine Learning course from Coursera by Andrew NG.

While doing the course we have to go through various quiz and assignments.

Here, I am sharing my solutions for the weekly assignments throughout the course.

These solutions are for reference only.

It is recommended that you should solve the assignments by yourself honestly then only it makes sense to complete the course.
But, In case you stuck in between, feel free to refer to the solutions provided by me.


Don't just copy paste the code for the sake of completion. 
Even if you copy the code, make sure you understand the code first.

Click here to check out week-8 assignment solutions, Scroll down for the solutions for week-9 assignment.

In this exercise, you will implement the anomaly detection algorithm and apply it to detect failing servers on a network. In the second part, you will use collaborative filtering to build a recommender system for movies. Before starting on the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.

It consist of the following files:
  • ex8.m - Octave/MATLAB script for first part of exercise
  • ex8 cofi.m - Octave/MATLAB script for second part of exercise
  • ex8data1.mat - First example Dataset for anomaly detection
  • ex8data2.mat - Second example Dataset for anomaly detection
  • ex8 movies.mat - Movie Review Dataset
  • ex8 movieParams.mat - Parameters provided for debugging
  • multivariateGaussian.m - Computes the probability density function for a Gaussian distribution
  • visualizeFit.m - 2D plot of a Gaussian distribution and a dataset
  • checkCostFunction.m - Gradient checking for collaborative filtering
  • computeNumericalGradient.m - Numerically compute gradients
  • fmincg.m - Function minimization routine (similar to fminunc)
  • loadMovieList.m - Loads the list of movies into a cell-array
  • movie ids.txt - List of movies
  • normalizeRatings.m - Mean normalization for collaborative filtering
  • submit.m - Submission script that sends your solutions to our servers
  • [*] estimateGaussian.m - Estimate the parameters of a Gaussian distribution with a diagonal covariance matrix
  • [*] selectThreshold.m - Find a threshold for anomaly detection
  • [*] cofiCostFunc.m - Implement the cost function for collaborative filtering
  • Video - YouTube videos featuring Free IOT/ML tutorials
* indicates files you will need to complete

estimateGaussian.m :

function [mu sigma2] = estimateGaussian(X)
%ESTIMATEGAUSSIAN This function estimates the parameters of a
%Gaussian distribution using the data in X
% [mu sigma2] = estimateGaussian(X),
% The input X is the dataset with each n-dimensional data point in one row
% The output is an n-dimensional vector mu, the mean of the data set
% and the variances sigma^2, an n x 1 vector

% Useful variables
[m, n] = size(X);

% You should return these values correctly
mu = zeros(n, 1);
sigma2 = zeros(n, 1);

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the mean of the data and the variances
% In particular, mu(i) should contain the mean of
% the data for the i-th feature and sigma2(i)
% should contain variance of the i-th feature.

mu = ((1/m)*sum(X))';
sigma2 = ((1/m)*sum((X-mu').^2))';

% =============================================================

selectThreshold.m :

function [bestEpsilon bestF1] = selectThreshold(yval, pval)
%SELECTTHRESHOLD Find the best threshold (epsilon) to use for selecting
% [bestEpsilon bestF1] = SELECTTHRESHOLD(yval, pval) finds the best
% threshold to use for selecting outliers based on the results from a
% validation set (pval) and the ground truth (yval).

bestEpsilon = 0;
bestF1 = 0;
F1 = 0;

stepsize = (max(pval) - min(pval)) / 1000;
for epsilon = min(pval):stepsize:max(pval)

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the F1 score of choosing epsilon as the
% threshold and place the value in F1. The code at the
% end of the loop will compare the F1 score for this
% choice of epsilon and set it to be the best epsilon if
% it is better than the current choice of epsilon.
% Note: You can use predictions = (pval < epsilon) to get a binary vector
% of 0's and 1's of the outlier predictions

cvPredictions = (pval < epsilon); % m x 1

tp = sum((cvPredictions == 1) & (yval == 1)); % m x 1
fp = sum((cvPredictions == 1) & (yval == 0)); % m x 1
fn = sum((cvPredictions == 0) & (yval == 1)); % m x 1

prec = tp/(tp+fp);
rec = tp/(tp+fn);

F1 = 2*prec*rec / (prec + rec);

% =============================================================

if F1 > bestF1
bestF1 = F1;
bestEpsilon = epsilon;

Check-out our free tutorials on IOT (Internet of Things):

cofiCostFunc.m :

function [J, grad] = cofiCostFunc(params, Y, R, num_users, num_movies, ...
num_features, lambda)
%COFICOSTFUNC Collaborative filtering cost function
% [J, grad] = COFICOSTFUNC(params, Y, R, num_users, num_movies, ...
% num_features, lambda) returns the cost and gradient for the
% collaborative filtering problem.

% Unfold the U and W matrices from params
X = reshape(params(1:num_movies*num_features), num_movies, num_features);
Theta = reshape(params(num_movies*num_features+1:end), ...
num_users, num_features);

% You need to return the following values correctly
J = 0;
X_grad = zeros(size(X)); % Nm x n
Theta_grad = zeros(size(Theta)); % Nu x n

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost function and gradient for collaborative
% filtering. Concretely, you should first implement the cost
% function (without regularization) and make sure it is
% matches our costs. After that, you should implement the
% gradient and use the checkCostFunction routine to check
% that the gradient is correct. Finally, you should implement
% regularization.
% Notes: X - num_movies x num_features matrix of movie features
% Theta - num_users x num_features matrix of user features
% Y - num_movies x num_users matrix of user ratings of movies
% R - num_movies x num_users matrix, where R(i, j) = 1 if the
% i-th movie was rated by the j-th user
% You should set the following variables correctly:
% X_grad - num_movies x num_features matrix, containing the
% partial derivatives w.r.t. to each element of X
% Theta_grad - num_users x num_features matrix, containing the
% partial derivatives w.r.t. to each element of Theta

%% %%%%% WORKING: Without Regularization %%%%%%%%%%
Error = (X*Theta') - Y;

J = (1/2)*sum(sum(Error.^2.*R));

X_grad = (Error.*R)*Theta; % Nm x n
Theta_grad = (Error.*R)'*X; % Nu x n

%% %%%%% WORKING: With Regularization
Reg_term_theta = (lambda/2)*sum(sum(Theta.^2));
Reg_term_x = (lambda/2)*sum(sum(X.^2));

J = J + Reg_term_theta + Reg_term_x;

X_grad = X_grad + lambda*X; % Nm x n
Theta_grad = Theta_grad + lambda*Theta; % Nu x n

% =============================================================

grad = [X_grad(:); Theta_grad(:)];


I tried to provide optimized solutions like vectorized implementation for each assignment. If you think that more optimization can be done, then put suggest the corrections / improvements.

Click here to see solutions for all Machine Learning Coursera Assignments.
Click here to see more codes for Raspberry Pi 3 and similar Family.
Click here to see more codes for NodeMCU ESP8266 and similar Family.
Click here to see more codes for Arduino Mega (ATMega 2560) and similar Family.

Feel free to ask doubts in the comment section. I will try my best to solve it.
If you find this helpful by any mean like, comment and share the post.
This is the simplest way to encourage me to keep doing such work.

Thanks and Regards,
-Akshay P. Daga


  1. I want to thank you so much. Dr. Ng did a great job and I am grateful to have you complement this course.

    1. Dr. Ng really did a great job. and I am glad to know that you found my post helpful.
      Thank you.

  2. Excuse me, why would I get different answers in the last part of this assignment of J = (1/2)*sum(sum(Error.^2.*R)). when I enter J = (1/2)*sum(sum(Error.*R .^2))?
    I really have no idea, thanks in advance!

    1. In 1st equation, it's calculating error square and multiplied with R.
      In 2nd equation, it's calculating R square and multiplied with error.
      That's the difference and that's why you are getting different answers.

  3. Submission failed: unexpected error: Unrecognized function or variable 'm'.

Post a Comment
Previous Post Next Post