What are the origins of machine learning?

I think that it all started with the McCulloch-Pitt (MCP) Neuron, a first model of how a neuron in a mammal’s brain could work: W. S. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4):115–133, 1943.

Note that other methods like linear regression were already invented (F. Galton. Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute of Great Britain and Ireland, pages 246–263, 1886.). Here, I would like to make the distinction between ML and statistics in terms of how ML evolved. I see ML as a field that emerged from artificial intelligence research, hence, the MCP neuron. However, ML is deeply intertwined with statistics. For example, I would describe a linear regression analysis based on the closed-form solution (normal equation) primarily as a technique that came from the statistics field, and I would associate linear regression with stochastic gradient descent learning as an ML technique. I think the early goal in ML was how the algorithm can “learn” a function by itself rather than solving an equation mathematically.

So, I would say that the first ML algorithm really was the perceptron (F. Rosenblatt. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, 1957.) What followed was the gradient descent algorithm used in adaptive linear neurons (B. Widrow. Adaptive ”Adaline” neuron using chemical ”memistors”. Number Technical Report 1553-2. Stanford Electron. Labs., Stanford, CA, October 1960.). Those single learning units were then connected to multi-layer architectures, and what followed was the multi-layer perceptron also around the first half of the 20th century.

Note that I would put, for example, Fisher’s Linear Discriminant Analysis (R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2):179–188, 1936) into developments in the statistics department – note that this was before the term ML was coined.