Just finished reading the book “The Master Algorithm”, where the author tries to find the ultimate Machine Learning algorithm that can solve different varieties of problems (text, image, predictive, time series etc)
In the book, he goes over the 5 main branches (or tribes) of Machine Learning. They are:
- The Evoluntionaries
- The Connectionist
- The Symbolist
- The Bayesians
- The Analogizers
Lets look at what the 5 tribes have to offer, and their each individual “Master Algorithm”
Master Algorithm: Genetic Algorithm
Loss Function: Fitness Function
Optimization Function: Genetic Search
This tribe tries to represent learning by mimicking evolution, where the it uses survival-of-the-fittest idea to pick the best model.
Starting with weak and not-fit models, the models go through different biologically inspired evolutionary stages: Mutation, Cross Over and Selection
Evolutionaries focus on learning the structure of the model, and not so much of the parameters (something which Connectionist with neural networks focus on)
Master Algorithm: Neural Networks
Loss Function: RMSE
Optimization Function: Gradient Descent and Back Propagation
The Connectionist represent learning by modelling it after the brain and the neuron connections inside them. Dendrites, Axons and the Cell made up the Neural Network.
In Neural Networks, we are trying to learn the weights of the connections between the cells, and not the structure of the neural network itself (Perhaps there could be a combination of GA and NN, where GA finds the structure, and NN finds the weights)
It follows the notion that a concept can be represented by firing various neurons, and concepts that are similar to each other will cause the same set of neurons firing.
Master Algorithm: Logic
Loss Function: Accuracy
Optimization Function: Inverse Deduction
Symbolist believe that all intelligence can be reduced to symbols, or rules. They build rule base systems, which obviously have huge limitations, because we can’t possibly convert all intelligence and edge cases to a hard coded rule. The moment the rule encounters just a slightest bit of noise, it fails to perform.
Also, real life scenarios and concepts are seldom defined in black and white rules, and are usually stochastic and in the grey area.
In some cases, they can perform quite well. A model that Symbolist use are decision trees, which are huge tree that have if-else rules at each branch.
Master Algorithm: Graphical Models
Loss Function: Posterior Probability
Optimization Function: Probabilistic Inference
As opposed to frequentist, which gives no value to prior beliefs, Bayesians hold prior beliefs at the center of their model.
The Bayesian works on conditional probability, or Bayes Theorem, that tells the probability of event A happening given that B has occurs
During the training phase, the model takes the training data to build a list of conditional probabilities. During testing, given certain features, the model will use the conditional probabilities to perform predictions.
A Naive Bayes algorithm is one that assumes that all probabilities are independent of each other. Although it is not a valid assumption in the real world, it still performs quite well.
Master Algorithm: Support Vector Machines
Loss Function: Margin between Support Vectors
Optimization Function: Constrained Optimization
This tribe uses similarities among various data points to categorize them to distinct classes. Based on the similarity between data points, we can relate them together.
Models in this tribe are K-Nearest-Neighbor and Support Vector Machines.
The book tries to amalgamate ideas together to form the Master Equation.
We could use posterior probability as the evaluation function, and genetic search coupled with gradient descent as the optimizer