Andy Gray

102 posts

Andy Gray

Andy Gray

@andynotabot

AI Start-up founder at Kortical. Since the age of 15, I've been chasing the dream of a building an AI to do my chores... but for now it's mainly B2B AI

Присоединился Kasım 2017
193 Подписки43 Подписчики
Andy Gray
Andy Gray@andynotabot·
It is well established that models often memorise some subsets of their data but the key distinction is that this isn't ONLY what they do. This is an explainer for a paper that shows they can learn rules that are provably beyond interpolation. So they can learn things that are not possible to get to by memorisation or interpolation between known datapoints. Pretty cool huh? x.com/andynotabot/st…
English
0
0
0
10
Gary Marcus, MIT PhD and NYU Professor Emeritus
Am old enough to remember when @GeoffreyHinton told me I was stupid for saying that LLMs regurgitate training data. He was wrong. LLM regurgitation is now one of the best-established findings in the field. Excerpt below from a new DeepMind paper; every single one of the papers shows that Hinton was wrong. (Also: still waiting for AI to replace radiologists.)
Gary Marcus, MIT PhD and NYU Professor Emeritus tweet media
English
119
123
1.1K
153.8K
François Chollet
François Chollet@fchollet·
The ARC-AGI-3 launch is next week. Incredible work by the team over the past year.
English
81
58
931
111.2K
Andy Gray
Andy Gray@andynotabot·
@GaryMarcus your ACM piece made the case that LLMs are fundamentally limited to interpolation, no doubt why you expect this market plateau. I just published results that might surprise you —> transformers hit 97.9% on held-out rules where every interpolation method scores 0%! Backed by a formal proof. Would love your take. Full explainer and paper here: x.com/andynotabot/st…
English
0
0
0
28
Gary Marcus, MIT PhD and NYU Professor Emeritus
Nvidia has nearly doubled over the last two years. (12x over last five). But it’s fallen since GPT-5 came out. People are no longer buying the hype.
Gary Marcus, MIT PhD and NYU Professor Emeritus tweet media
English
34
14
127
13.8K
Andy Gray ретвитнул
Andrej Karpathy
Andrej Karpathy@karpathy·
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.
Andrej Karpathy tweet media
English
687
3.4K
24.1K
5.8M
Andy Gray ретвитнул
TechHalla
TechHalla@techhalla·
I asked nano banana to take me across Middle-earth and this is what happened... all the details on how I made this video, below 🧵👇
English
271
736
7.5K
966.1K
Andy Gray
Andy Gray@andynotabot·
😂 I thought it was a terrible paper! Disingenuous, badly researched and hyperbolic. I wrote an article with a detailed takedown here andynotabot.substack.com/p/the-illusion… That said I do think over anthropomorphising LLMs is a bad idea and we can see way too much of it already, so I don't disagree with you entirely 😆
English
1
0
1
111
Andy Gray
Andy Gray@andynotabot·
I'm seeing lots of people are taking Apple's new paper "The Illusion of Thinking" at face value but there is so much wrong with it, I felt compelled to write an article debunking its claims: andynotabot.substack.com/p/the-illusion… I dive into why bit it looks like they are knowingly trying to create FUD about AI
English
0
0
0
7
Bojan Tunguz
Bojan Tunguz@tunguz·
Interesting. So what exactly *IS* thinking?
Bojan Tunguz tweet media
English
201
132
1.8K
264.4K
Andy Gray
Andy Gray@andynotabot·
@rubenhassid I'm seeing lots of people are taking Apple's new paper "The Illusion of Thinking" at face value but there is so much wrong with it, I felt compelled to write an article debunking its claims: andynotabot.substack.com/p/the-illusion… AGI is on track 😉
English
0
0
0
28
Ruben Hassid
Ruben Hassid@rubenhassid·
BREAKING: Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well. Here's what Apple discovered: (hint: we're not as close to AGI as the hype suggests)
Ruben Hassid tweet media
English
2.6K
9K
62.5K
14.2M
Andy Gray ретвитнул
Tesla Optimus
Tesla Optimus@Tesla_Optimus·
I’m not just dancing all day, ok
English
2.8K
6.5K
35.9K
5.3M
Andy Gray ретвитнул
Min Choi
Min Choi@minchoi·
GPT-4o just got an INSANE upgrade! OpenAI just dropped native Image Generation in GPT-4o. Image & Text quality is insane. 100% AI 10 wild examples (prompts included): 1. Polaroid style photographs
Min Choi tweet media
English
166
469
4.9K
1M
Andy Gray ретвитнул
Min Choi
Min Choi@minchoi·
Anthropic just dropped Claude Code 🤯 Now you can delegate coding tasks right from the terminal. Limited research preview for now
English
24
38
328
43K
Andy Gray ретвитнул
Tesla AI
Tesla AI@Tesla_AI·
Teslas now drive themselves from their birthplace at the factory to their designated loading dock lanes without human intervention One step closer to large-scale unsupervised FSD
English
3.2K
9.1K
45.3K
60M
Andy Gray ретвитнул
Brian Roemmele
Brian Roemmele@BrianRoemmele·
Deep Dive On DeepSeek’s New Multimodal AI Released Today And How We Are Getting It Running On A Gaming PC! — DeepSeek’s Janus-Pro represents a significant advancement in multimodal large language models (LLMs), particularly in text-to-image generation. Building upon the foundation of the original Janus model, Janus-Pro introduces enhancements in training processes, data quality, and model architecture, resulting in more stable and detailed image outputs. Technical Architecture: Janus-Pro employs a decoupled architecture, optimizing it for tasks involving both multimodal understanding and text-to-image generation. This design allows for separate processing pathways for different modalities, enhancing the model’s flexibility and performance. The model has been trained on a diverse dataset comprising multimodal, textual, and synthetic aesthetic data through a three-stage process, ensuring superior performance across various tasks. Performance Benchmarks: Janus-Pro has demonstrated exceptional capabilities: •Text-to-Image Generation: •GenEval: Scored 0.80, surpassing OpenAI’s DALL-E 3 (0.67) and Stability AI’s Stable Diffusion 3 Medium (0.74). •DPG-Bench: Achieved an overall accuracy of 84.19, highlighting its proficiency in handling dense and nuanced prompts. •Multimodal Understanding: •MMMU (Multimodal Machine Understanding): Attained an accuracy of 41.0, outperforming models like TokenFlow-XL (38.7). •MME (Multimodal Evaluation): Showed significant gains in reasoning and contextual understanding. These results underscore Janus-Pro’s capabilities in both generating high-quality images from textual prompts and understanding complex multimodal inputs. Running Janus-Pro on Consumer-Grade GPUs These are some of the techniques we deploy when adapting a new larger AI model to run efficiently less expensive computer hardware. This is not an exhaustive list but enough to give you an idea and overview. 1.Model Quantization: Reducing the precision of the model’s weights (e.g., from 16-bit to 8-bit or lower) can significantly decrease memory usage and computational requirements, enabling the model to run on GPUs with limited VRAM. Tools like MiniLLM facilitate running large language models on consumer-grade GPUs. We also imply distillation processes to further improve GPU cycles. 2.Efficient Inference Engines: Utilizing inference engines designed for consumer hardware can enhance performance. For instance, PowerInfer is a high-speed LLM inference engine optimized for personal computers equipped with a single consumer-grade GPU. It exploits the high locality inherent in LLM inference to reduce GPU memory demands and CPU-GPU data transfers. 3.Hardware Considerations: High-end consumer GPUs, such as the NVIDIA RTX 4090, are more suitable for running large models like Janus-Pro due to their substantial VRAM and computational capabilities. However, with appropriate optimization techniques, it’s possible to run the model on GPUs with lower specifications, though performance may be affected. These are some of the strategies we are deploying to run Janus-Pro on consumer-grade gaming computers, By leveraging Janus-Pro, developers and researchers can explore advanced capabilities in both multimodal understanding and image generation, pushing the boundaries of what’s achievable in AI-driven applications. We will keep you updated on the progress.
Brian Roemmele tweet media
Brian Roemmele@BrianRoemmele

Ok you want to test out DeepSeek’s new multimodal Janus-Pro-7B? Well I got you covered. It is free and it is here: huggingface.co/spaces/AP123/J… Give it a try online, I will have a local How To soon for YOUR computer with YOUR data. Been testing this for hours now! It is amazing.

English
11
15
78
53.6K
Andy Gray
Andy Gray@andynotabot·
AGI isn’t “near,” @sama—it’s already here. We coined AGI thinking of human-level intelligence. But LLMs are already general, intelligent, just not sentient, superhuman or demanding civil rights. It’s time to redefine: 1️⃣ AGI = Broad general knowledge, common-sense machines (e.g., LLMs). 2️⃣ ASI = Artificial Super-intelligence - SuperIntelligent but not necessarily conscious. 3️⃣ ACI = Artificial Conscious Intelligence - Thinking feeling machines that have a sense of self, needs and wants, etc. Let’s update the terms. #AI
English
0
1
2
62
Andy Gray ретвитнул
Yam Peleg
Yam Peleg@Yampeleg·
Heard a leak from one of the frontier labs (not oai tbh), they reached an unexpected HUGE wall of diminishing returns trying to brute-force better results by training longer & using more and more data.. (more severe than what is published publicly)
Amir Efrati@amir

news: OpenAI's upcomning Orion model shows how GPT improvements are slowing down It's prompting OpenAI to bake in reasoning and other tweaks after the initial model training phase.

English
167
314
3.6K
885.5K
Andy Gray ретвитнул
Yann LeCun
Yann LeCun@ylecun·
Not a surprising result. But good that someone tried this out.
Bingyi Kang@bingyikang

Curious whether video generation models (like #SORA) qualify as world models? We conduct a systematic study to answer this question by investigating whether a video gen model is able to learn physical laws. Three are three key messages to take home: 1⃣The model generalises perfectly for in-distribution data, but fails to do out-of-distribution generalization. For combinatorial scenarios, scaling law is observed. 2⃣The models fail to abstract general rules and instead tries to mimic the closest training example. 3⃣The model prioritizes different attributes when referencing training data: color > size > velocity > shape. This work is a joint effort with our outstanding intern @YangYue_THU. Paper: arxiv.org/abs/2411.02385 Webpage: phyworld.github.io

English
148
55
880
385.8K
Andy Gray
Andy Gray@andynotabot·
I've spent some time now with @OpenAI o1. The model that was "too dangerous" to release. OpenAI's first specialist reasoning model. The big question I wanted to answer was just how good is o1 at reasoning? Despite all the hype, many people are skeptical if LLMs can reason at all. Is it just mimicry, like a parrot, repeating words it doesn't understand? In this article open.substack.com/pub/andynotabo… I give a bit of background, show how I set about to prove it one way or the other and talk through the surprising result. Not to overcook it too much but... I was genuinely not expecting this result. Let me know if you enjoy the read! 😃
Andy Gray tweet media
English
0
0
0
40
Andy Gray ретвитнул
Demis Hassabis
Demis Hassabis@demishassabis·
Feedback loop: train SOTA chip design model (AlphaChip) -> use it to design better AI chips -> use them to train better models -> to design better chips... part of the reason why our TPU stack is so good. Congrats @Azaliamirh, @annadgoldie, @JeffDean & the AlphaChip team!
Google DeepMind@GoogleDeepMind

Our AI for chip design method AlphaChip has transformed the way we design microchips. ⚡ From helping to design state-of-the-art TPUs for building AI models to CPUs in data centers - its widespread impact can be seen across Alphabet and beyond. Find out more → dpmd.ai/3ZDRtYY

English
20
161
1.1K
234.3K