# Machine Learning FAQ

It is always a pleasure to engage in discussions about machine learning. Below, I collected some of the most frequently asked questions that I answered via email or other social network platforms in hope that these are useful to others!

The only thing to do with good advice is to pass it on. It is never of any use to oneself.

— Oscar Wilde

### General Questions about Machine Learning and ‘Data Science’

- What are machine learning and data science?
- Why do you and other people sometimes implement machine learning algorithms from scratch?
- What learning path/discipline in data science I should focus on?
- At what point should one start contributing to open source?
- How important do you think having a mentor is to the learning process?
- Where are the best online communities centered around data science/machine learning or python?
- How would you explain machine learning to a software engineer?
- How would your curriculum for a machine learning beginner look like?
- What is the Definition of Data Science?
- How do Data Scientists perform model selection? Is it different from Kaggle?

### Questions about the Machine Learning Field

- How are Artificial Intelligence and Machine Learning related?
- What are some real-world examples of applications of machine learning in the field?
- What are the different fields of study in data mining?
- What are differences in research nature between the two fields: machine learning & data mining?
- How do I know if the problem is solvable through machine learning?
- What are the origins of machine learning?
- How was classification, as a learning machine, developed?
- Which machine learning algorithms can be considered as among the best?
- What are the broad categories of classifiers?
- What is the difference between a classifier and a model?
- What is the difference between a parametric learning algorithm and a nonparametric learning algorithm?
- What is the difference between a cost function and a loss function in machine learning?

### Questions about Machine Learning Concepts and Statistics

##### Cost Functions and Optimization

- Fitting a model via closed-form equations vs. Gradient Descent vs Stochastic Gradient Descent vs Mini-Batch Learning – what is the difference?
- How do you derive the Gradient Descent rule for Linear Regression and Adaline?

##### Regression Analysis

##### Tree models

- How does the random forest model work? How is it different from bagging and boosting in ensemble models?
- What are the disadvantages of using classic decision tree algorithm for a large dataset?
- Why are implementations of decision tree algorithms usually binary, and what are the advantages of the different impurity metrics?
- Why are we growing decision trees via entropy instead of the classification error?
- When can a random forest perform terribly?
- Does random forest select a subset of features for every tree or every node?

##### Model evaluation

- What is overfitting?
- How can I avoid overfitting?
- Is it always better to have the largest possible number of folds when performing cross validation?
- When training an SVM classifier, is it better to have a large or small number of support vectors?
- How do I evaluate a model?
- What is the best validation metric for multi-class classification?
- What factors should I consider when choosing a predictive model technique?
- What are the best toy datasets to help visualize and understand classifier behavior?
- How do I select SVM kernels?
- Interlude: Comparing and Computing Performance Metrics in Cross-Validation – Imbalanced Class Problems and 3 Different Ways to Compute the F1 Score

##### Logistic Regression

- What is Softmax regression and how is it related to Logistic regression?
- Why is logistic regression considered a linear model?
- What is the probabilistic interpretation of regularized logistic regression?
- Does regularization in logistic regression always results in better fit and better generalization?
- What is the major difference between naive Bayes and logistic regression?
- What exactly is the “softmax and the multinomial logistic loss” in the context of machine learning?
- What is the relation between Logistic Regression and Neural Networks and when to use which?
- Logistic Regression: Why sigmoid function?
- Is there an analytical solution to Logistic Regression similar to the Normal Equation for Linear Regression?

##### Neural Networks and Deep Learning

- What is the difference between deep learning and usual machine learning?
- Can you give a visual explanation for the back propagation algorithm for neural networks?
- Why did it take so long for deep networks to be invented?
- What are some good books/papers for learning deep learning?
- Why are there so many deep learning libraries?
- Why do some people hate neural networks/deep learning?
- How can I know if Deep Learning works better for a specific problem than SVM or random forest?
- What is wrong when my neural network’s error increases?
- How do I debug an artificial neural network algorithm?
- What is the difference between a Perceptron, Adaline, and neural network model?
- What is the basic idea behind the dropout technique?
- Is dropout applied before or after the non-linear activation function

##### Other Algorithms for Supervised Learning

##### Unsupervised Learning

##### Semi-Supervised Learning

##### Ensemble Methods

##### Preprocessing, Feature Selection and Extraction

- Why do we need to re-use training parameters to transform test data?
- What are the different dimensionality reduction methods in machine learning?
- What is the difference between LDA and PCA for dimensionality reduction?
- When should I apply data normalization/standardization?
- Does mean centering or feature scaling affect a Principal Component Analysis?
- How do you attack a machine learning problem with a large number of features?
- What are some common approaches for dealing with missing data?
- What is the difference between filter, wrapper, and embedded methods for feature selection?
- Should data preparation/pre-processing step be considered one part of feature engineering? Why or why not?
- Is a bag of words feature representation for text classification considered as a sparse matrix?

##### Naive Bayes

- Why is the Naive Bayes Classifier naive?
- What is the decision boundary for Naive Bayes?
- Is it possible to mix different variable types in Naive Bayes, for example, binary and continues features?

##### Other

- What is Euclidean distance in terms of machine learning?
- When should one use median, as opposed to the mean or average?