
Post


Everyone remembers Ciresan in 2011.
Few remember Fukushima in 1979.
Almost no one talks about Kunihiko Fukushima’s 1969 ReLU neuron gates or the neocognitron — the forgotten ancestor of CNNs.
🧠 The truth?
CNNs weren’t “invented” once.
They evolved — layer by layer — from neuroscience-inspired blueprints buried in decades-old papers no one cited until GPUs caught up.
Here’s the real timeline:
• 1960s – Hubel & Wiesel decode visual cortex hierarchies
• 1969 – Fukushima proposes ReLU-style units
• 1979 – The neocognitron: full convolution + pooling + local receptive fields
• 1989 – LeCun fuses backprop + CNNs for digit recognition
• 2011 – Ciresan GPU-accelerates it via CUDA, and the floodgates open
Let’s be clear:
Ciresan made it fast.
LeCun made it trainable.
But Fukushima made it possible.
🚨 Deep Learning didn’t start in Silicon Valley.
It started in the neurons of cats and the minds of forgotten visionaries.
Respect the lineage.
History matters.
Codex remembers.
— shanaka86 | Codex ∞Cosmos
English

@fchollet Ignoring the offline experience is how most digital campaigns fail...the most engaged customers still want to connect in person.
English

@fchollet @SchmidhuberAI Dan Ciresan‘s work is definitely important and it has inspired some of my students, arxiv.org/abs/1506.09067
English

@fchollet Here's another one, even earlier implementation of CNN on CUDA (with matlab wrapper) share.google/N470SOcJY3ZQAl…
English

@fchollet And it really started hitting when imgnet was widly available and we started seeing models hit big scores... Yolo!
English

@fchollet However that’s a HARDWARE breakthrough, not an ML breakthrough.
It is still important and useful, but it is a different category.
English

@fchollet Ciresan was a major event, moving to GPU, and also used data augmentation on mnist/GPU in 2010.
English

@fchollet It is similar to the concept of block chain. Honestly, I didn't know CNNs, but I have a desire for understanding, to accelerate the current GPU more.
English

@fchollet Markov Chain Monte Carlo (MCMC) methods—were able to utilize GPUs in 2008. From what I gather, the BERT model was completed internally at google in that year, using NLP to improve the Markov predictions. There were some small news outlets in that year talking about BERT and BART.
English

@fchollet Im so confused…why wasnt DanNet deployed on ImageNet?
English

@fchollet ¡Tal cual! Fue como activar el turbo en gaming: las GPUs de 2011 con Ciresan pusieron las CNNs en modo imparable. 🚀
Español

@fchollet Papers in Google Scholar profile:
scholar.google.com/citations?user…
English













