The canonical example is AlexNet (2012) by Sutskever and Hinton [1]. However, despite this common belief, Ciresan et al. from Schmidhuber’s lab published the successful training of convolutional neural networks (CNNs) one year before AlexNet in “Flexible, High Performance Convolutional Neural Networks for Image Classification” [2].

Note that according to the paper mentioned above, CNN training on GPUs goes even back further to the works by Chellapilla et al. (2006) [3], Uetz and Behnke (2009) [4], and Strigl et al. (2010) [5]. However, these GPU implementations were primarily based on hard-coded code and are less flexible (for example, they don’t support online stochastic gradient descent, that is, updating after each image).

References

[1] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in Neural Information Processing Systems 25 (2012).

[2] Ciresan, Dan Claudiu, Ueli Meier, Jonathan Masci, Luca Maria Gambardella, and Jürgen Schmidhuber. “Flexible, high performance convolutional neural networks for image classification.” In Twenty-second International Joint Conference on Artificial Intelligence. 2011.

[3] Kumar Chellapilla, Sidd Puri, and Patrice Simard. “High performance convolutional neural networks for document processing.” In International Workshop on Frontiers in Handwriting Recognition, 2006.

[4] Uetz, Rafael, and Sven Behnke. “Large-scale object recognition with CUDA-accelerated hierarchical neural networks.” In 2009 IEEE international conference on intelligent computing and intelligent systems, vol. 1, pp. 536-541. IEEE, 2009.

[5] Daniel Strigl, Klaus Kofler, and Stefan Podlipnig. “Performance and scalability of GPU-based convolutional neural networks.” In 18th Euromicro Conference on Parallel, Distributed, and Network-Based Processing, 2010.




If you like this content and you are looking for similar, more polished Q & A’s, check out my new book Machine Learning Q and AI.