Coursera: Neural Networks and Deep Learning (Week 3) Quiz [MCQ Answers] - deeplearning.ai

byAkshay Daga (APDaga) -June 07, 2021

2

▸ Shallow Neural Networks :

Coursera : Neural Networks and Deep Learning Week 3 MCQ Quiz Answers | APDaga | DumpBox

Recommended Machine Learning Courses:
Coursera: Machine Learning
Coursera: Deep Learning Specialization
Coursera: Machine Learning with Python
Coursera: Advanced Machine Learning Specialization
Udemy: Machine Learning
LinkedIn: Machine Learning
Eduonix: Machine Learning
edX: Machine Learning
Fast.ai: Introduction to Machine Learning for Coders

Which of the following are true? (Check all that apply.)

$\boldsymbol{X}$ is a matrix in which each column is one training example.
Correct

$a^{[2](12)}$ denotes activation vector of the 12th layer on the 2nd training example.

$a^{[2]}_{4}$ is the activation output of the 2nd layer for the 4th training example 4

$\boldsymbol{a^{[2](12)}}$ denotes the activation vector of the 2nd layer for the 12th training example.
Correct

$\boldsymbol{a^{[2]}_{4}}$ is the activation output by the 4th neuron of the 2nd layer 4
Correct

$X$ is a matrix in which each row is one training example.

$\boldsymbol{a^{[2]}}$ denotes the activation vector of the 2nd layer.
Correct

The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. True/False?

True
Correct
Yes. As seen in lecture the output of the tanh is between -1 and 1, it thus centers the data which makes the learning simpler for the next layer.

False

Which of these is a correct vectorized implementation of forward propagation for layer l, where 1 ≤ l ≤ L?

$\boldsymbol{Z^{[l]} = W^{[l]} A^{[l-1]} + b^{[l]}}$
$\boldsymbol{A^{[l]} =g^{[l]}(Z^{[l]})}$
Correct

$Z^{[l]} = W^{[l-1]} A^{[l]} + b^{[l-1]}$
$A^{[l]} =g^{[l]}(Z^{[l]})$

$Z^{[l]} = W^{[l]} A^{[l]} + b^{[l]}$
$A^{[l+1]} =g^{[l+1]}(Z^{[l]})$

$Z^{[l]} = W^{[l]} A^{[l]} + b^{[l]}$
$A^{[l+1]} =g^{[l]}(Z^{[l]})$

You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?

ReLU

Leaky ReLU

sigmoid
Correct
Yes. Sigmoid outputs a value between 0 and 1 which makes it a very good choice for binary classification. You can classify as 0 if the output is less than 0.5 and classify as 1 if the output is more than 0.5. It can be done with tanh as well but it is less convenient as the output is between -1 and 1.

tanh

Consider the following code:
```
A = np.random.randn(4,3)
B = np.sum(A, axis = 1, keepdims = True)
```
What will be B.shape? (If you’re not sure, feel free to run this in python to find out).

(4, 1)
Correct
Yes, we use (keepdims = True) to make sure that A.shape is (4,1) and not (4, ). It makes our code more rigorous.

(4, )

(, 3)

(1, 3)

Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?

Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent each neuron in the layer will be computing the same thing as other neurons.
Correct

Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”.

Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in lecture.

The first hidden layer’s neurons will perform different computations from each other even in the first iteration; their parameters will thus keep evolving in their own way.

Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?

True

False
Correct
Yes, Logistic Regression doesn't have a hidden layer. If you initialize the weights to zeros, the first example x fed in the logistic regression will output zero but the derivatives of the Logistic Regression depend on the input x (because there's no hidden layer) which is not zero. So at the second iteration, the weights values follow x's distribution and are different from each other if x is not a constant vector.

You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(..,..)*1000. What will happen?

This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.

This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set α to be very small to prevent divergence; this will slow down learning.

It doesn’t matter. So long as you initialize the weights randomly gradient descent is not affected by whether the wights are large or small

This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.
Correct
Yes. tanh becomes at for large values, this leads its gradient to be close to zero. This slows down the optimization algorithm.

Check-out our free tutorials on IOT (Internet of Things):

Consider the following 1 hidden layer neural network

Which of the following statements are True? (Check all that apply).

$W^{[1]}$ will have shape (2, 4)

$\boldsymbol{b^{[1]}}$ will have shape (4, 1)
Correct

$\boldsymbol{W^{[1]}}$ will have shape (4, 2)
Correct

$b^{[1]}$ will have shape (2, 1)

$\boldsymbol{W^{[2]}}$ will have shape (1, 4)
Correct

$b^{[2]}$ will have shape (4, 1)

$W^{[2]}$ will have shape (4, 1)

$\boldsymbol{b^{[2]}}$ will have shape (1, 1)
Correct

In the same network as the previous question, what are the dimensions of $Z^{[1]}$ and $A^{[1]}$ ?

$Z^{[1]}$ and $A^{[1]}$ are (1,4)

$Z^{[1]}$ and $A^{[1]}$ are (4,1)

$\boldsymbol{Z^{[1]}}$ and $\boldsymbol{A^{[1]}}$ are (4,m)
Correct

$Z^{[1]}$ and $A^{[1]}$ are (4,2)

--------------------------------------------------------------------------------

Click here to see solutions for all Machine Learning Coursera Assignments.

&

Click here to see more codes for Raspberry Pi 3 and similar Family.

&

Click here to see more codes for NodeMCU ESP8266 and similar Family.

&

Click here to see more codes for Arduino Mega (ATMega 2560) and similar Family.

Feel free to ask doubts in the comment section. I will try my best to answer it.

If you find this helpful by any mean like, comment and share the post.

This is the simplest way to encourage me to keep doing such work.

Thanks & Regards,
- APDaga DumpBox

Tags: Artificial Intelligence Coursera Deep Learning Machine Learning MCQs

2 Comments

markson7 August 2019 at 15:04
This comment has been removed by a blog administrator.
ReplyDelete
Replies
Unknown14 April 2020 at 11:40
Mast hai <3
ReplyDelete
Replies