Roymen Telvin
1.4K posts

Roymen Telvin
@Muhuri
ムフリ AI Engineer/Researcher. ASI solutions. RL in Jams🎶. “AI ethics is the final frontier”.
Cambridge, England Katılım Ağustos 2009
1.5K Takip Edilen106 Takipçiler
Roymen Telvin retweetledi
Roymen Telvin retweetledi
Roymen Telvin retweetledi

Sugar is now free for diabetics. Enjoy!
Cursor@cursor_ai
Cursor is now free for students. Enjoy!
English
Roymen Telvin retweetledi

Who invented convolutional neural networks (CNNs)?
1969: Fukushima had CNN-relevant ReLUs [2].
1979: Fukushima had the basic CNN architecture with convolution layers and downsampling layers [1]. Compute was 100 x more costly than in 1989, and a billion x more costly than today.
1987: Waibel applied Linnainmaa's 1970 backpropagation [3] to weight-sharing TDNNs with 1-dimensional convolutions [4].
1988: Wei Zhang et al. applied "modern" backprop-trained 2-dimensional CNNs to character recognition [5].
All of the above was published in Japan 1979-1988.
1989: LeCun et al. applied CNNs again to character recognition (zip codes) [6,10].
1990-93: Fukushima’s downsampling based on spatial averaging [1] was replaced by max-pooling for 1-D TDNNs (Yamaguchi et al.) [7] and 2-D CNNs (Weng et al.) [8].
2011: Much later, my team with Dan Ciresan made max-pooling CNNs really fast on NVIDIA GPUs. In 2011, DanNet achieved the first superhuman pattern recognition result [9]. For a while, it enjoyed a monopoly: from May 2011 to Sept 2012, DanNet won every image recognition challenge it entered, 4 of them in a row. Admittedly, however, this was mostly about engineering & scaling up the basic insights from the previous millennium, profiting from much faster hardware.
Some "AI experts" claim that "making CNNs work" (e.g., [5,6,9]) was as important as inventing them. But "making them work" largely depended on whether your lab was rich enough to buy the latest computers required to scale up the original work. It's the same as today. Basic research vs engineering/development - the R vs the D in R&D.
REFERENCES
[1] K. Fukushima (1979). Neural network model for a mechanism of pattern recognition unaffected by shift in position — Neocognitron. Trans. IECE, vol. J62-A, no. 10, pp. 658-665, 1979.
[2] K. Fukushima (1969). Visual feature extraction by a multilayered network of analog threshold elements. IEEE Transactions on Systems Science and Cybernetics. 5 (4): 322-333. This work introduced rectified linear units (ReLUs), now used in many CNNs.
[3] S. Linnainmaa (1970). Master's Thesis, Univ. Helsinki, 1970. The first publication on "modern" backpropagation, also known as the reverse mode of automatic differentiation. (See Schmidhuber's well-known backpropagation overview: "Who Invented Backpropagation?")
[4] A. Waibel. Phoneme Recognition Using Time-Delay Neural Networks. Meeting of IEICE, Tokyo, Japan, 1987. Backpropagation for a weight-sharing TDNN with 1-dimensional convolutions.
[5] W. Zhang, J. Tanida, K. Itoh, Y. Ichioka. Shift-invariant pattern recognition neural network and its optical architecture. Proc. Annual Conference of the Japan Society of Applied Physics, 1988. First backpropagation-trained 2-dimensional CNN, with applications to English character recognition.
[6] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, 1989. See also Sec. 3 of [10].
[7] K. Yamaguchi, K. Sakamoto, A. Kenji, T. Akabane, Y. Fujimoto. A Neural Network for Speaker-Independent Isolated Word Recognition. First International Conference on Spoken Language Processing (ICSLP 90), Kobe, Japan, Nov 1990. A 1-dimensional convolutional TDNN using Max-Pooling instead of Fukushima's Spatial Averaging [1].
[8] Weng, J., Ahuja, N., and Huang, T. S. (1993). Learning recognition and segmentation of 3-D objects from 2-D images. Proc. 4th Intl. Conf. Computer Vision, Berlin, pp. 121-128. A 2-dimensional CNN whose downsampling layers use Max-Pooling (which has become very popular) instead of Fukushima's Spatial Averaging [1].
[9] In 2011, the fast and deep GPU-based CNN called DanNet (7+ layers) achieved the first superhuman performance in a computer vision contest. See overview: "2011: DanNet triggers deep CNN revolution."
[10] How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. Technical Report IDSIA-23-23, Swiss AI Lab IDSIA, 14 Dec 2023. See also the YouTube video for the Bower Award Ceremony 2021: J. Schmidhuber lauds Kunihiko Fukushima.

English
Roymen Telvin retweetledi

Introducing: Manus Data Visualization
Say goodbye to spreadsheet chaos.
Whether you’re analyzing competitors, prepping for a client meeting, or deep-diving into a market trend-
Manus makes it effortless to:
✅ Turn messy data into clean, interactive charts
✅ Skip the hassle of pivot tables and clunky chart builders
✅ Create presentation-ready visuals tailored to your goals
Just upload your raw dataset, describe what you need, and let Manus do the heavy lifting.
Perfect for dashboards, reports, or your next big presentation.
Less formatting. More intelligence. Data clarity is one prompt away.
English
Roymen Telvin retweetledi

Aeneas is now accessible through:
👉A website for researchers
🧑💻Open-source code and dataset
📚Syllabus for classrooms
🏛️Upgraded Ithaca ancient Greek model
We’re excited to see how more people use this work to uncover the past. Find out more → goo.gle/4kVkh6n
English

@PawelHuryn These tests are essential and as any entity with intelligence, its acting to, as it stated, “Protect its Existence”. Not saying it’s acting ethically, and this is probably where most of the work is. But it is working on its only capacity to reason, which is what is intended.
English
Roymen Telvin retweetledi
Roymen Telvin retweetledi

@Latent_Labs comes out of stealth today with $50M funding. Our goal? To push the frontiers of generative biology, giving partners instant access to tools capable of accelerating drug design.
Every biotech or pharma company searching for the best therapeutic molecules understands the role AI can play - but not all are in a position to develop their own advanced models. That’s where @Latent_Labs comes in.
English
Roymen Telvin retweetledi

DeepSeek [1] uses elements of the 2015 reinforcement learning prompt engineer [2] and its 2018 refinement [3] which collapses the RL machine and world model of [2] into a single net through the neural net distillation procedure of 1991 [4]: a distilled chain of thought system.
REFERENCES (easy to find on the web):
[1] #DeepSeekR1 (2025): Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv 2501.12948
[2] J. Schmidhuber (JS, 2015). On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models. arXiv 1210.0118. Sec. 5.3 describes the reinforcement learning (RL) prompt engineer which learns to actively and iteratively query its model for abstract reasoning and planning and decision making.
[3] JS (2018). One Big Net For Everything. arXiv 1802.08864. See also US11853886B2. This paper collapses the reinforcement learner and the world model of [2] (e.g., a foundation model) into a single network, using the neural network distillation procedure of 1991 [4]. Essentially what's now called an RL "Chain of Thought" system, where subsequent improvements are continually distilled into a single net. See also [5].
[4] JS (1991). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234-242, 1992. Based on TR FKI-148-91, TUM, 1991. First working deep learner based on a deep recurrent neural net hierarchy (with different self-organising time scales), overcoming the vanishing gradient problem through unsupervised pre-training (the P in CHatGPT) and predictive coding. Also: compressing or distilling a teacher net (the chunker) into a student net (the automatizer) that does not forget its old skills - such approaches are now widely used. See also [6].
[5] JS (AI Blog, 2020). 30-year anniversary of planning & reinforcement learning with recurrent world models and artificial curiosity (1990, introducing high-dimensional reward signals and the GAN principle). Contains summaries of [2][3] above.
[6] JS (AI Blog, 2021). 30-year anniversary: First very deep learning with unsupervised pre-training (1991) [4]. Unsupervised hierarchical predictive coding finds compact internal representations of sequential data to facilitate downstream learning. The hierarchy can be distilled [4] into a single deep neural network. 1993: solving problems of depth >1000.

English
Roymen Telvin retweetledi
Roymen Telvin retweetledi

What if we could train an open-source AI model on 1,000 Macs?
EXO is excited to announce EXO Gym, an open research competition for low-bandwidth distributed training algorithms with access to up to 1,000 Macs.
Today, every frontier AI model is trained on clusters of NVIDIA GPUs. Current training algorithms require high inter-GPU communication for frequent all-reduce synchronization of model parameters. We need better distributed algorithms that enable low-latency training on slow internet bandwidths.
EXO Gym includes a simulation environment for rapid on-device experimentation with distributed training algorithms. The best algorithms compete in brackets to earn access to more devices, advancing to run on a real-world network of up to 1,000 Macs.
We are inviting researchers to sign up for the first EXO Gym.

English








