Coursera: Machine Learning (Week 3) Quiz - Logistic Regression | Andrew NG

byAkshay Daga (APDaga) -June 06, 2021

12

▸ Logistic Regression :

Recommended Machine Learning Courses:
Coursera: Machine Learning
Coursera: Deep Learning Specialization
Coursera: Machine Learning with Python
Coursera: Advanced Machine Learning Specialization
Udemy: Machine Learning
LinkedIn: Machine Learning
Eduonix: Machine Learning
edX: Machine Learning
Fast.ai: Introduction to Machine Learning for Coders

Suppose that you have trained a logistic regression classifier, and it outputs on a new example a prediction $h_\theta(x)$ = 0.2. This means (check all that apply):
- Our estimate for P(y = 1|x; θ) is 0.8.
  h(x) gives P(y=1|x; θ), not 1 - P(y=1|x; θ)
- Our estimate for P(y = 0|x; θ) is 0.8.
  Since we must have P(y=0|x;θ) = 1 - P(y=1|x; θ), the former is
  1 - 0.2 = 0.8.
- Our estimate for P(y = 1|x; θ) is 0.2.
  h(x) is precisely P(y=1|x; θ), so each is 0.2.
- Our estimate for P(y = 0|x; θ) is 0.2.
  h(x) is P(y=1|x; θ), not P(y=0|x; θ)

Suppose you have the following training set, and fit a logistic regression classifier $h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2x_2)$ .

Which of the following are true? Check all that apply.
- Adding polynomial features (e.g., instead using $h_\theta(x) = g(\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_1^2 + \theta_4 x_1 x_2 + \theta_5 x_2^2 )$ ) could increase how well we can fit the training data.
- At the optimal value of θ (e.g., found by fminunc), we will have J(θ) ≥ 0.
- Adding polynomial features (e.g., instead using $h_\theta(x) = g(\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_1^2 + \theta_4 x_1 x_2 + \theta_5 x_2^2 )$ ) would increase J(θ) because we are now summing over more terms.
- If we train gradient descent for enough iterations, for some examples $x^{(i)}$ in the training set it is possible to obtain $h_\theta(x^{(i)} ) > 1$ .

For logistic regression, the gradient is given by $\frac{\partial }{\partial \theta_j } J(\theta) = \frac{1}{m} \sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{i})x^{(i)}_j$ . Which of these is a correct gradient descent update for logistic regression with a learning rate of $\alpha$ ? Check all that apply.
- $\theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m (h_\theta(x^{(i)-y^i}) x^{(i)}_j$ (simultaneously update for all j).
- $\theta := \theta - \alpha \frac{1}{m} \sum_{i=1}^m (\theta^Tx-y^{(i)}) x^{(i)}$ .
- $\theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m \left(\frac{1}{1+e^{-\theta^Tx^{(i)}}}-y^{(i)}\right) x^{(i)}_j$ (simultaneously update for all j).
- $\theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m (h_\theta(x^{(i)-y^i}) x^{(i)}$ (simultaneously update for all j).

Which of the following statements are true? Check all that apply.
- The one-vs-all technique allows you to use logistic regression for problems in which each $y^{(i)}$ comes from a fixed, discrete set of values.
  If each $y^{(i)}$ is one of k different values, we can give a label to each $y^{(i)} \epsilon \{1,2,....,k\}$ and use one-vs-all as described in the lecture.
- For logistic regression, sometimes gradient descent will converge to a local minimum (and fail to find the global minimum). This is the reason we prefer more advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/L-BFGS/etc).
  The cost function for logistic regression is convex, so gradient descent will always converge to the global minimum. We still might use a more advanced optimisation algorithm since they can be faster and don’t require you to select a learning rate.
- The cost function $J(\theta)$ for logistic regression trained with $m \geq 1$ examples is always greater than or equal to zero.
  The cost for any example $x^{(i)}$ is always $\geq 0$ since it is the negative log of a quantity less than one. The cost function $J(\theta)$ is a summation over the cost for each sample, so the cost function itself must be greater than or equal to zero.
- Since we train one classifier when there are two classes, we train two classifiers when there are three classes (and we do one-vs-all classification).
  We will need 3 classfiers. One-for-each class.

Check-out our free tutorials on IOT (Internet of Things):

Suppose you train a logistic classifier $h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2x_2)$ . Suppose $\theta_0 = 6$ , $\theta_1 = -1$ , $\theta_2 = 0$ . Which of the following figures represents the decision boundary found by your classifier?
- Figure:
  
  In this figure, we transition from negative to positive when x1 goes from left of 6 to right of 6 which is true for the given values of θ.
- Figure:
- Figure:
- Figure:

Click here to see solutions for all Machine Learning Coursera Assignments.
&
Click here to see more codes for Raspberry Pi 3 and similar Family.
&
Click here to see more codes for NodeMCU ESP8266 and similar Family.
&
Click here to see more codes for Arduino Mega (ATMega 2560) and similar Family.

Feel free to ask doubts in the comment section. I will try my best to answer it.
If you find this helpful by any mean like, comment and share the post.
This is the simplest way to encourage me to keep doing such work.

Thanks & Regards,
- APDaga DumpBox

12 Comments

Jitesh Pawar7 May 2020 at 16:01
Fifth question is wrong
ReplyDelete
Replies
del1 July 2020 at 10:47
2nd question has stupid options which dont make much sense without knowing formula
ReplyDelete
Replies
Unknown10 July 2020 at 05:05
theta =
6
-1
0

y = 1 if 6 + (-1(x1) + (0*x2) => GE(greaterThanEqualTo) 0
6 - x1 => = 0
-x1 => -6
x1 <= (lessThanEqualto) <= 6
therefore the decision boundary is a vertical line where x1=6 and everything to the left
of that denotes y = 1 , while everything to the right denotes y = 0

you don't have to copy somebody's answer it is in the course notes.

ReplyDelete
Replies

Add comment

Coursera: Machine Learning (Week 3) Quiz - Logistic Regression | Andrew NG

▸ Logistic Regression :

Check-out our free tutorials on IOT (Internet of Things):

12 Comments

Contact form