Post

François Chollet
François Chollet@fchollet·
The big breakthrough for convnets was the first GPU-accelerated CUDA implementation, which immediately started winning first place in image classification competitions. Remember when that happened? I do. That was Dan Ciresan in 2011
Jürgen Schmidhuber@SchmidhuberAI

Who invented convolutional neural networks (CNNs)? 1969: Fukushima had CNN-relevant ReLUs [2]. 1979: Fukushima had the basic CNN architecture with convolution layers and downsampling layers [1]. Compute was 100 x more costly than in 1989, and a billion x more costly than today. 1987: Waibel applied Linnainmaa's 1970 backpropagation [3] to weight-sharing TDNNs with 1-dimensional convolutions [4]. 1988: Wei Zhang et al. applied "modern" backprop-trained 2-dimensional CNNs to character recognition [5]. All of the above was published in Japan 1979-1988. 1989: LeCun et al. applied CNNs again to character recognition (zip codes) [6,10]. 1990-93: Fukushima’s downsampling based on spatial averaging [1] was replaced by max-pooling for 1-D TDNNs (Yamaguchi et al.) [7] and 2-D CNNs (Weng et al.) [8]. 2011: Much later, my team with Dan Ciresan made max-pooling CNNs really fast on NVIDIA GPUs. In 2011, DanNet achieved the first superhuman pattern recognition result [9]. For a while, it enjoyed a monopoly: from May 2011 to Sept 2012, DanNet won every image recognition challenge it entered, 4 of them in a row. Admittedly, however, this was mostly about engineering & scaling up the basic insights from the previous millennium, profiting from much faster hardware. Some "AI experts" claim that "making CNNs work" (e.g., [5,6,9]) was as important as inventing them. But "making them work" largely depended on whether your lab was rich enough to buy the latest computers required to scale up the original work. It's the same as today. Basic research vs engineering/development - the R vs the D in R&D. REFERENCES [1] K. Fukushima (1979). Neural network model for a mechanism of pattern recognition unaffected by shift in position — Neocognitron. Trans. IECE, vol. J62-A, no. 10, pp. 658-665, 1979. [2] K. Fukushima (1969). Visual feature extraction by a multilayered network of analog threshold elements. IEEE Transactions on Systems Science and Cybernetics. 5 (4): 322-333. This work introduced rectified linear units (ReLUs), now used in many CNNs. [3] S. Linnainmaa (1970). Master's Thesis, Univ. Helsinki, 1970. The first publication on "modern" backpropagation, also known as the reverse mode of automatic differentiation. (See Schmidhuber's well-known backpropagation overview: "Who Invented Backpropagation?") [4] A. Waibel. Phoneme Recognition Using Time-Delay Neural Networks. Meeting of IEICE, Tokyo, Japan, 1987. Backpropagation for a weight-sharing TDNN with 1-dimensional convolutions. [5] W. Zhang, J. Tanida, K. Itoh, Y. Ichioka. Shift-invariant pattern recognition neural network and its optical architecture. Proc. Annual Conference of the Japan Society of Applied Physics, 1988. First backpropagation-trained 2-dimensional CNN, with applications to English character recognition. [6] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, 1989. See also Sec. 3 of [10]. [7] K. Yamaguchi, K. Sakamoto, A. Kenji, T. Akabane, Y. Fujimoto. A Neural Network for Speaker-Independent Isolated Word Recognition. First International Conference on Spoken Language Processing (ICSLP 90), Kobe, Japan, Nov 1990. A 1-dimensional convolutional TDNN using Max-Pooling instead of Fukushima's Spatial Averaging [1]. [8] Weng, J., Ahuja, N., and Huang, T. S. (1993). Learning recognition and segmentation of 3-D objects from 2-D images. Proc. 4th Intl. Conf. Computer Vision, Berlin, pp. 121-128. A 2-dimensional CNN whose downsampling layers use Max-Pooling (which has become very popular) instead of Fukushima's Spatial Averaging [1]. [9] In 2011, the fast and deep GPU-based CNN called DanNet (7+ layers) achieved the first superhuman performance in a computer vision contest. See overview: "2011: DanNet triggers deep CNN revolution." [10] How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. Technical Report IDSIA-23-23, Swiss AI Lab IDSIA, 14 Dec 2023. See also the YouTube video for the Bower Award Ceremony 2021: J. Schmidhuber lauds Kunihiko Fukushima.

English
37
119
1.1K
188.5K
jason
jason@jasonth0·
@fchollet bet there’s some forgotten 80s paper with transformer-like ideas that just needed today’s gpus to shine, history keeps rhyming in ai research
English
0
0
0
5.5K
Shanaka Anslem Perera ⚡
Shanaka Anslem Perera ⚡@shanaka86·
Everyone remembers Ciresan in 2011. Few remember Fukushima in 1979. Almost no one talks about Kunihiko Fukushima’s 1969 ReLU neuron gates or the neocognitron — the forgotten ancestor of CNNs. 🧠 The truth? CNNs weren’t “invented” once. They evolved — layer by layer — from neuroscience-inspired blueprints buried in decades-old papers no one cited until GPUs caught up. Here’s the real timeline: • 1960s – Hubel & Wiesel decode visual cortex hierarchies • 1969 – Fukushima proposes ReLU-style units • 1979 – The neocognitron: full convolution + pooling + local receptive fields • 1989 – LeCun fuses backprop + CNNs for digit recognition • 2011 – Ciresan GPU-accelerates it via CUDA, and the floodgates open Let’s be clear: Ciresan made it fast. LeCun made it trainable. But Fukushima made it possible. 🚨 Deep Learning didn’t start in Silicon Valley. It started in the neurons of cats and the minds of forgotten visionaries. Respect the lineage. History matters. Codex remembers. — shanaka86 | Codex ∞Cosmos
English
0
0
0
3.8K
Himanshu Kumar
Himanshu Kumar@codewithimanshu·
@fchollet Ignoring the offline experience is how most digital campaigns fail...the most engaged customers still want to connect in person.
English
0
0
0
2.6K
tuōmo
tuōmo@7uomoki·
@fchollet he mentions all this in the quoted post?
English
0
0
0
2.2K
Don J. Rude
Don J. Rude@_RudeDude·
@fchollet And it really started hitting when imgnet was widly available and we started seeing models hit big scores... Yolo!
English
0
0
0
3.5K
GUT-AI Foundation — AI/acc
@fchollet However that’s a HARDWARE breakthrough, not an ML breakthrough. It is still important and useful, but it is a different category.
English
0
0
0
566
octotherp
octotherp@octotherp139836·
@fchollet GPU-accelerated NNs existed at least since programmable shaders, back in 2002.
English
0
0
0
1.7K
Reza Roboubi
Reza Roboubi@RezaRob·
@fchollet Ciresan was a major event, moving to GPU, and also used data augmentation on mnist/GPU in 2010.
English
0
0
0
1.2K
dmsimon
dmsimon@dmsimon·
@fchollet Weren't they instantiating triangles for compute before CUDA? GPGPU.
English
0
0
0
1.3K
まえかわ@Takaya Maekawa
まえかわ@Takaya Maekawa@takaya_maekawa·
@fchollet It is similar to the concept of block chain. Honestly, I didn't know CNNs, but I have a desire for understanding, to accelerate the current GPU more.
English
0
0
0
88
alexinka
alexinka@_alexinka·
@fchollet But what about OpenCL ? No future ?
English
0
0
0
1.5K
Felix Farquharson
Felix Farquharson@hominghamster·
@fchollet Markov Chain Monte Carlo (MCMC) methods—were able to utilize GPUs in 2008. From what I gather, the BERT model was completed internally at google in that year, using NLP to improve the Markov predictions. There were some small news outlets in that year talking about BERT and BART.
English
0
0
0
530
Faturita
Faturita@faturita·
@fchollet It is true that they used a PlayStation with a custom linux kernel ?
English
0
0
0
1.3K
Frances
Frances@AmounTg_m·
@fchollet Ah, the OG days of CNNs—when GPUs were just flexing their muscles and everyone thought "CUDA" was a typo. It’s like the tech world’s version of a superhero origin story.
English
0
0
0
540
Miguel Guau!
Miguel Guau!@ai_futures_mh·
@fchollet ¡Tal cual! Fue como activar el turbo en gaming: las GPUs de 2011 con Ciresan pusieron las CNNs en modo imparable. 🚀
Español
0
0
0
998
GMD
GMD@Adamski250567·
@fchollet Fei Fei Li’s book and the NVIDIA way are must reads.
English
0
0
0
856
Paylaş