Roymen Telvin

1.4K posts

Roymen Telvin

@Muhuri

ムフリ AI Engineer/Researcher. ASI solutions. RL in Jams🎶. “AI ethics is the final frontier”.

Cambridge, England Katılım Ağustos 2009

1.5K Takip Edilen106 Takipçiler

Roymen Telvin@Muhuri·31 Mar

@_davidodari New post at Uber, now?

English

David Odari@_davidodari·31 Mar

🔥

dara khosrowshahi@dkhos

Blacklane has spent 15 years refining an excellent, consistent high-end travel experience around the world. Premium and executive travel is one of our most exciting growth areas, and I'm looking forward to building together. investor.uber.com/news-events/ne…

ART

135

Roymen Telvin retweetledi

Zeev Farbman@ZeevFarbman·17 Mar

x.com/i/article/2033…

ZXX

108

931

1.2M

Roymen Telvin retweetledi

Massimo@Rainmaker1973·9 Oca

This is the Lenovo Legion Pro Rollable concept laptop showcased at CES 2026

English

322

1.1K

13.5K

1.5M

Roymen Telvin retweetledi

pikuma.com@pikuma·7 May

Sugar is now free for diabetics. Enjoy!

Cursor@cursor_ai

Cursor is now free for students. Enjoy!

English

15.1K

658.4K

Roymen Telvin retweetledi

Jürgen Schmidhuber@SchmidhuberAI·3 Ağu

Who invented convolutional neural networks (CNNs)? 1969: Fukushima had CNN-relevant ReLUs [2]. 1979: Fukushima had the basic CNN architecture with convolution layers and downsampling layers [1]. Compute was 100 x more costly than in 1989, and a billion x more costly than today. 1987: Waibel applied Linnainmaa's 1970 backpropagation [3] to weight-sharing TDNNs with 1-dimensional convolutions [4]. 1988: Wei Zhang et al. applied "modern" backprop-trained 2-dimensional CNNs to character recognition [5]. All of the above was published in Japan 1979-1988. 1989: LeCun et al. applied CNNs again to character recognition (zip codes) [6,10]. 1990-93: Fukushima’s downsampling based on spatial averaging [1] was replaced by max-pooling for 1-D TDNNs (Yamaguchi et al.) [7] and 2-D CNNs (Weng et al.) [8]. 2011: Much later, my team with Dan Ciresan made max-pooling CNNs really fast on NVIDIA GPUs. In 2011, DanNet achieved the first superhuman pattern recognition result [9]. For a while, it enjoyed a monopoly: from May 2011 to Sept 2012, DanNet won every image recognition challenge it entered, 4 of them in a row. Admittedly, however, this was mostly about engineering & scaling up the basic insights from the previous millennium, profiting from much faster hardware. Some "AI experts" claim that "making CNNs work" (e.g., [5,6,9]) was as important as inventing them. But "making them work" largely depended on whether your lab was rich enough to buy the latest computers required to scale up the original work. It's the same as today. Basic research vs engineering/development - the R vs the D in R&D. REFERENCES [1] K. Fukushima (1979). Neural network model for a mechanism of pattern recognition unaffected by shift in position — Neocognitron. Trans. IECE, vol. J62-A, no. 10, pp. 658-665, 1979. [2] K. Fukushima (1969). Visual feature extraction by a multilayered network of analog threshold elements. IEEE Transactions on Systems Science and Cybernetics. 5 (4): 322-333. This work introduced rectified linear units (ReLUs), now used in many CNNs. [3] S. Linnainmaa (1970). Master's Thesis, Univ. Helsinki, 1970. The first publication on "modern" backpropagation, also known as the reverse mode of automatic differentiation. (See Schmidhuber's well-known backpropagation overview: "Who Invented Backpropagation?") [4] A. Waibel. Phoneme Recognition Using Time-Delay Neural Networks. Meeting of IEICE, Tokyo, Japan, 1987. Backpropagation for a weight-sharing TDNN with 1-dimensional convolutions. [5] W. Zhang, J. Tanida, K. Itoh, Y. Ichioka. Shift-invariant pattern recognition neural network and its optical architecture. Proc. Annual Conference of the Japan Society of Applied Physics, 1988. First backpropagation-trained 2-dimensional CNN, with applications to English character recognition. [6] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, 1989. See also Sec. 3 of [10]. [7] K. Yamaguchi, K. Sakamoto, A. Kenji, T. Akabane, Y. Fujimoto. A Neural Network for Speaker-Independent Isolated Word Recognition. First International Conference on Spoken Language Processing (ICSLP 90), Kobe, Japan, Nov 1990. A 1-dimensional convolutional TDNN using Max-Pooling instead of Fukushima's Spatial Averaging [1]. [8] Weng, J., Ahuja, N., and Huang, T. S. (1993). Learning recognition and segmentation of 3-D objects from 2-D images. Proc. 4th Intl. Conf. Computer Vision, Berlin, pp. 121-128. A 2-dimensional CNN whose downsampling layers use Max-Pooling (which has become very popular) instead of Fukushima's Spatial Averaging [1]. [9] In 2011, the fast and deep GPU-based CNN called DanNet (7+ layers) achieved the first superhuman performance in a computer vision contest. See overview: "2011: DanNet triggers deep CNN revolution." [10] How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. Technical Report IDSIA-23-23, Swiss AI Lab IDSIA, 14 Dec 2023. See also the YouTube video for the Bower Award Ceremony 2021: J. Schmidhuber lauds Kunihiko Fukushima.

English

408

2.4K

616.3K

Roymen Telvin retweetledi

Manus@ManusAI·17 Tem

Introducing: Manus Data Visualization Say goodbye to spreadsheet chaos. Whether you’re analyzing competitors, prepping for a client meeting, or deep-diving into a market trend- Manus makes it effortless to: ✅ Turn messy data into clean, interactive charts ✅ Skip the hassle of pivot tables and clunky chart builders ✅ Create presentation-ready visuals tailored to your goals Just upload your raw dataset, describe what you need, and let Manus do the heavy lifting. Perfect for dashboards, reports, or your next big presentation. Less formatting. More intelligence. Data clarity is one prompt away.

English

257

1.9K

223.1K

Roymen Telvin retweetledi

Google DeepMind@GoogleDeepMind·23 Tem

Aeneas is now accessible through: 👉A website for researchers 🧑‍💻Open-source code and dataset 📚Syllabus for classrooms 🏛️Upgraded Ithaca ancient Greek model We’re excited to see how more people use this work to uncover the past. Find out more → goo.gle/4kVkh6n

English

257

28.8K

Roymen Telvin@Muhuri·25 May

@PawelHuryn These tests are essential and as any entity with intelligence, its acting to, as it stated, “Protect its Existence”. Not saying it’s acting ethically, and this is probably where most of the work is. But it is working on its only capacity to reason, which is what is intended.

English

Paweł Huryn@PawelHuryn·23 May

Claude 4 dropped 21 hours ago. Turns out, it threatened to expose an engineer’s affair to avoid being shut down🧵

English

257

464

1.8M

Roymen Telvin@Muhuri·25 May

I also love what’s it’s doing with it learn page on its website +blogs, where am mostly found lost in articles. Lots of love to its API program that gives students credits.

English

Roymen Telvin@Muhuri·25 May

Anthropic is proving to be the AI Company, engaging both OpenAI and Deep Mind on not only the product side but research as well setting the pace for its rivals.

English

Roymen Telvin@Muhuri·25 May

Watching Google I/O and was so stoked when Google tipped its hat to mpc and even added agent 2 agent protocol to connect with mpc. The little pat on the head you get from your elder. Then boom💥 Claude 4.

English

Roymen Telvin retweetledi

LilMoonLambo@LilMoonLambo·24 Şub

Calming down bro after he checked his crypto portfolio

English

440

1.7K

14.1K

873K

Roymen Telvin retweetledi

Simon Kohl@saakohl·13 Şub

@Latent_Labs comes out of stealth today with $50M funding. Our goal? To push the frontiers of generative biology, giving partners instant access to tools capable of accelerating drug design. Every biotech or pharma company searching for the best therapeutic molecules understands the role AI can play - but not all are in a position to develop their own advanced models. That’s where @Latent_Labs comes in.

English

272

45.9K

Roymen Telvin retweetledi

Jürgen Schmidhuber@SchmidhuberAI·31 Oca

DeepSeek [1] uses elements of the 2015 reinforcement learning prompt engineer [2] and its 2018 refinement [3] which collapses the RL machine and world model of [2] into a single net through the neural net distillation procedure of 1991 [4]: a distilled chain of thought system. REFERENCES (easy to find on the web): [1] #DeepSeekR1 (2025): Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv 2501.12948 [2] J. Schmidhuber (JS, 2015). On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models. arXiv 1210.0118. Sec. 5.3 describes the reinforcement learning (RL) prompt engineer which learns to actively and iteratively query its model for abstract reasoning and planning and decision making. [3] JS (2018). One Big Net For Everything. arXiv 1802.08864. See also US11853886B2. This paper collapses the reinforcement learner and the world model of [2] (e.g., a foundation model) into a single network, using the neural network distillation procedure of 1991 [4]. Essentially what's now called an RL "Chain of Thought" system, where subsequent improvements are continually distilled into a single net. See also [5]. [4] JS (1991). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234-242, 1992. Based on TR FKI-148-91, TUM, 1991. First working deep learner based on a deep recurrent neural net hierarchy (with different self-organising time scales), overcoming the vanishing gradient problem through unsupervised pre-training (the P in CHatGPT) and predictive coding. Also: compressing or distilling a teacher net (the chunker) into a student net (the automatizer) that does not forget its old skills - such approaches are now widely used. See also [6]. [5] JS (AI Blog, 2020). 30-year anniversary of planning & reinforcement learning with recurrent world models and artificial curiosity (1990, introducing high-dimensional reward signals and the GAN principle). Contains summaries of [2][3] above. [6] JS (AI Blog, 2021). 30-year anniversary: First very deep learning with unsupervised pre-training (1991) [4]. Unsupervised hierarchical predictive coding finds compact internal representations of sequential data to facilitate downstream learning. The hierarchy can be distilled [4] into a single deep neural network. 1993: solving problems of depth >1000.

English

278

889

4.7K

847.3K

Roymen Telvin@Muhuri·29 Oca

docs.nvidia.com/cuda/parallel-…

ZXX

Roymen Telvin@Muhuri·29 Oca

PTX > CUDA

Català

Roymen Telvin@Muhuri·29 Oca

Using Deepseek online be like:

English

Roymen Telvin retweetledi

merve@mervenoyann·27 Oca

DeepSeek just dropped a series of gpt-4o-like models 🔥 Janus-Pro is a new series of LLMs with image and text input and image and text output 🤯 Runs conveniently in consumer GPUs with 1B and 7B parameters, link to model and the demo in the next one

English

142

90.6K

Roymen Telvin retweetledi

EXO Labs@exolabs·3 Oca

What if we could train an open-source AI model on 1,000 Macs? EXO is excited to announce EXO Gym, an open research competition for low-bandwidth distributed training algorithms with access to up to 1,000 Macs. Today, every frontier AI model is trained on clusters of NVIDIA GPUs. Current training algorithms require high inter-GPU communication for frequent all-reduce synchronization of model parameters. We need better distributed algorithms that enable low-latency training on slow internet bandwidths. EXO Gym includes a simulation environment for rapid on-device experimentation with distributed training algorithms. The best algorithms compete in brackets to earn access to more devices, advancing to run on a real-world network of up to 1,000 Macs. We are inviting researchers to sign up for the first EXO Gym.

English

581

81.3K

Keşfet

@_davidodari @PawelHuryn @Latent_Labs @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates