It’s not that “deep networks” haven’t been around in the 1960s, but the problem was how to train them. In the 1970s, backpropagation was “invented” or re-discovered – I don’t want to quote a single resource here not to offend any of the parties involved since this is a sensitive topic those days … In any case, the problem was the “vanishing gradient,” when gradient-based methods were used for learning the weights. It was observed that there was no gain going beyond 1-3 hidden layers.

So back to the question: Deep network structures existed, but it was hard/impossible to train them appropriately. I’d say the 2 main reasons why this field experienced such a leap in the recent years are

  1. availability of computing resources
  2. clever ideas to pre-train a neural network

The second point is what deep learning is all about; in a nutshell, we pre-train our deep neural networks using unsupervised learning, but this goes beyond the scope of the question …




If you like this content and you are looking for similar, more polished Q & A’s, check out my new book Machine Learning Q and AI.