Day 1
Identify the following machine learning task:
This task predicts a numerical value given some input. To solve this task, the learning algorithm is asked to output a function:
- A) Classification
- B) Regression
- C) Structured output
Resources
Day 2
Distinguish between these two machine learning algorithms:
1 | 2 |
---|---|
In this task, the algorithm is given a new example , but with some entries missing | In this algorithm, the input is given a corrupted example . The algorithm should predict a clean example |
- A) 1 is Denoising, 2 is Imputation
- B) 1 is Synthesis, 2 is Denoising
- C) 1 is Imputation, 2 is Denoising
Resources
Day 3
Which of the following is not a machine learning technique:
- A) Supervised Learning
- B) Unsupervised Learning
- C) Semi-supervised Learning
- D) Reinforcement Learning
- E) Hyper Learning
- F) Representation Learning
Resources
Day 4
The ability for a machine learning model to perform well on previously unobserved inputs is called:
- A) Capacity
- B) Generalization
- C) Overfitting
Resources
Day 5
Define Regularization
- A) Any modification we make to a learning algorithm that is intended to reduce its generalization error but not its training error
- B) A technique used to reduce the errors by fitting the function appropriately on the given training set and avoid overfitting
- C) A technique used for tuning the function by adding an additional penalty term in the error function
- D) All of the above
Resources
Day 6
Which of the following are true about a validation set?
Value | Description |
---|---|
1 | A validation set contains examples that the training algorithm does not observe |
2 | No example from the test set can be used in the validation set |
3 | A validation set should always be constructed from the training set |
4 | A validation set should always be constructed from the test set |
- A) 1 and 2
- B) 1, 2 and 3
- C) 1 and 4
Resources
Day 7
Which of the following techniques can be used to help compensate for small datasets?
- A) K-Fold cross-validation
- B) Using a simple classifier less susceptible to overfitting
- C) Use ensemble methods where voting between classifiers may compensate for over-learning
- D) Apply Transfer Learning
- E) Apply data augmentation
- F) All of the above
Resources
Day 8
It is common to say that algorithm A is better than algorithm B if the upper bound of the 95 percent confidence interval for the error of algorithm A is less than the lower bound of the 95 percent confidence interval for the error of algorithm B
- A) True
- B) False
Resources
Day 9
Distinguish between these two statistical approaches:
1 | 2 |
---|---|
This approach is based on estimating a single value of , then making all predictions thereafter based on one estimate | This approach is to consider all possible values of when making a prediction. |
- A) 1 Frequentist Statistics, 2 Bayesian Statistics
- B) 1 Bayesian Statistics, 2 Frequentist Statistics
- C) 1 Einstein Statistics, 2 Bayesian Statistics
Resources
Day 10
Unlike logistic regression, the support vector machine (SVM) does not provide probabilities, but only outputs a class identity.
- A) True
- B) False
Resources
Day 11
The category of algorithms that employ the kernel trick is known as:
- A) kernel machines
- B) kernel methods
- C) popcorn methods
- D) Both A and B
Resources
Day 12
When can supervised learning algorithms be useful?
- A) When computational resources are constrained
- B) When building intuition for more sophisticated learning algorithms
- C) When annotations are reliable and consistent
- D) All of the above
Resources
Day 13
Which of the following could be accomplished with unsupervised learning?
- A) Density estimation
- B) Distribution sampling
- C) Data denoising
- D) Finding a manifold where data lies near
- E) Clustering into groups
- F) All of the above
Resources
Day 14
Which of the following techniques are not useful towards simplifying data representation:
- A) lower-dimenstional representation
- B) sparse representation
- C) dense representation
- D) independent representation
Resources
Day 15
Which of the following are true about Principal Component Analysis?
Value | Description |
---|---|
1 | This algorithm provides a means of compressing data |
2 | It is an unsupervised learning algorithm that learns a representation of data |
3 | It learns a representation that has lower dimensionality than the original input |
4 | It learns a representation whose elements have no linear correlation with each other |
- A) 1 and 2 are true
- B) 1 and 3 are true
- C) All are true
Resources
Day 16
To achieve full independence, a representation learning algorithm must also remove the nonlinear relationships between variables
- A) True
- B) False
Day 17
Which of these are true about p-value?
- A) It can be defined as the probability of obtaining test results at least as extreme as the results actually observed
- B) It represents how likely it is to get a particular result when the null hypothesis is assumed to be true.
- C) It is the probability of getting a sample like ours or more extreme than ours if the null hypothesis is correct
- D) All of the above
Day 18
Which of the following is true regarding singular value decomposition (SVD) and PCA?
- A) SVD is another name for PCA
- B) SVD is an alternative derivation of the principal components
- C) PCA and SVD are unrelated
Resources
Day 19
Which of the following are advantages to using sparse representation(e.g. one hot encoding) in a clustering algorithm?
Value | Description |
---|---|
1 | A natural conveyance of the idea that all examples in the same cluster are similar to each other |
2 | Confers the computational advantage that the entire representation may be captured by a single integer |
- A) 1
- B) 2
- C) 1 and 2
- D) Neither 1 or 2
Resources
Day 20
Which of the following are common difficulties pertaining to clustering?
Value | Description |
---|---|
1 | There is no single criterion that measures how well a clustering of the data corresponds to the real world |
2 | There may be many different clusterings that all correspond well to some property |
3 | It is possible to obtain a different, equally valid clustering not relveant to the task |
4 | It is difficult to measure Euclidean distance from a cluster centroid to the member of the cluster |
- A) 1,3
- B) 1,2,3
- C) 2,4
- D) All
Resources
Day 21
Nearly all of deep learning is powered by one very important algorithm: stochastic gradient descent (SGD).
- A) True
- B) False
Resources
Day 22
A recurring problem in machine learning is that large training sets are necessary for good generalization, but large training sets are also more computationally expensive.
- A) False
- B) True