cOfDirac

47 posts

cOfDirac

cOfDirac

@cOfDirac

exploring synthetic a priori

Poincaré disk Katılım Ocak 2026
57 Takip Edilen1 Takipçiler
cOfDirac
cOfDirac@cOfDirac·
@cloneofsimo AGI doesn't need to do what Ramanujan did, but it should solve problems as he did.
English
0
0
0
31
NIK
NIK@ns123abc·
🚨 Google DeepMind CEO Sir Demis Hassabis: “Today’s systems, are nowhere near [AGI]. Doesn’t matter how many Erdős problems you solve… I think it’s far, far from what a true invention or someone like a Ramanujan would have been able to do” it’s over for the Erdős hype
English
147
208
2.8K
425K
cOfDirac
cOfDirac@cOfDirac·
@Yuchenj_UW I agree that a title shouldn't decide the perception of your capability, but I do think it helps other people understand your role and responsibilities quicker. What would be best if you had a hierarchy but progression didn't have to be sequential.
English
0
0
1
628
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
Tech industry spent decades building a title and leveling system. Greg brought the “Member of Technical Staff”, originally invented at Bell Labs, to OpenAI. It has been adopted by Anthropic, xAI, Thinky, and many AI startups. Young MTS can have huge impact. Alec created GPT for example. In a traditional system, he was just an “L4 software engineer”. Databricks AI recently started using MTS as well. I think this is a very positive change in Silicon Valley.
Yuchen Jin tweet media
Yuchen Jin@Yuchenj_UW

Whoever invented “Member of Technical Staff” was a genius. It filters out Staff/Principal title-maxxers, protects engineering and research from corporate ladder brain, and leaves recruiters staring at LinkedIn like: “Is this person L4 or L7?” MTS is the best title. Happy to be MTS.

English
67
52
1.1K
286.1K
cOfDirac
cOfDirac@cOfDirac·
@zackabrams @emollick I did. It is very impressive, but not overly surprisingly and doesn't detract from my previous point. In fact I would say that this is more towards proof that current models are just getting better at pretending to be intelligent than actually trying to get to AGI.
English
0
0
0
57
cOfDirac
cOfDirac@cOfDirac·
@emollick not true. models still are unable to do very basic problems that reveal that their complex problem solving is a result of good pattern recognition as opposed to any sort of intelligence. we will not reach the latter by scaling the same architectures in the same training loops.
English
1
0
0
477
cOfDirac
cOfDirac@cOfDirac·
@Google @Android have you considered removing the backdoor that lets governments circumvent vpns and spy on people?
English
0
0
9
2K
Google
Google@Google·
We’re rolling out new updates to make your everyday @Android experience even better, including: 🤳 Screen Reactions, so you can record yourself and your screen at the same time — without switching apps or setting up a green screen 📸 An improved Instagram experience in partnership with Meta, including ultra HDR video, Night Mode integrations, brand new tools in the Edits app and more 📴 New digital wellbeing tools, like Pause Point, to help you reclaim your time and use apps more mindfully 😀 Nearly 4,000 redesigned emoji 🤝 New features to make it even easier to switch to Android from another phone, so your passwords, photos, messages, favorite apps, contacts and even your homescreen travel with you 🛜 Expanded Quick Share compatibility, so you can easily share files with more types of devices #TheAndroidShow
English
141
189
2.3K
216.4K
cOfDirac
cOfDirac@cOfDirac·
@alex_whedon From my understanding the attention itself is O(m^2) with m being the chosen sparse tokens where m <= sqrt(n) where n is the total tokens as opposed to actual linear attention yes? Is the attention algorithm novel as well? If not which did you choose?
English
0
0
0
24
Alexander Whedon
Alexander Whedon@alex_whedon·
Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.
English
1.5K
2.9K
23K
12.7M
cOfDirac
cOfDirac@cOfDirac·
@vikhyatk optimal transport is important though
English
0
0
1
170
vik
vik@vikhyatk·
signs that an AI researcher has llm psychosis: - random matrix theory - optimal transport - went to ayahuasca retreat - "I've been thinking a lot about Yoneda lately" - wife left him
English
20
27
460
56.2K
How To AI
How To AI@HowToAI_·
MIT proved every major AI model is secretly converging on the same "brain." It’s called the “platonic representation hypothesis,” and it’s one of the most mind-blowing papers you’ll ever read. You train a vision model purely on images. You train a language model purely on text. They use completely different architectures. They process completely different data. They should have completely different "brains." But as these models scale up, something impossible is happening. When researchers measure how they organize information, the mathematical geometry is identical. A model that only "sees" images and a model that only "reads" text are measuring the distance between concepts in the exact same way. The models are converging. The researchers named this after Plato’s Allegory of the Cave. Plato believed that everything we experience is just a shadow of a deeper, hidden, perfect reality. The paper argues that AI models are doing the exact same thing. They are looking at the different "shadows" of human data, text, images, audio. And they are independently discovering the exact same underlying structure of the universe to make sense of it. It doesn't matter what company built the AI. It doesn't matter what data it was trained on. As models get larger, they stop memorizing their specific tasks. They are forced to build a statistical model of reality itself. And there is only one reality to map. 2024, Arxiv
How To AI tweet media
English
243
825
3.9K
295.9K
cOfDirac
cOfDirac@cOfDirac·
@fhuszar see I don't mind genuine grounding in my ideas but sometimes it's just picking at air for the sake of it
English
0
0
5
927
Ferenc Huszár
Ferenc Huszár@fhuszar·
I noticed Claude has started to very methodically push back on every idea I discuss. In every response there are always caveats and the "the one thing I'd push back on" section. Oh my god, is this what it feels like to talk to me? Sorry, everyone.
English
21
5
620
29.4K
cOfDirac
cOfDirac@cOfDirac·
@CrumbsSpace @eliebakouch I'm not saying that our current approaches are useless; they're super useful. They're just not steps towards AGI.
English
0
0
0
7
spaceCrumbs
spaceCrumbs@CrumbsSpace·
@cOfDirac @eliebakouch True but "dumb" intelligence is also pretty useful enough to start capitalizing on AI. Similar to how you don't need AlphaZero to beat most humans at chess - stockfish running on a potato cpu will do. Agency > intelligence in the real world.
English
1
0
1
30
elie
elie@eliebakouch·
i might be very wrong here, but i don't think "no human data, no pre-training" is the right approach to get frontier models or scientific breakthroughs any time soon
elie tweet media
Ineffable Intelligence@IneffableLabs

Introducing Ineffable Intelligence. Led by David Silver, we're assembling the best engineers and researchers in the world to make first contact with superintelligence. We’ll be solving the hardest problems in AI on the way. Come join us. ineffable.ai

English
38
11
299
72.5K
紫云
紫云@dviolettchan·
@cOfDirac Maybe you can try applying for some micro‑funding programs from companies or government agencies. These programs don’t require a lengthy proposal and are far simpler than applying for something like an NSF.
English
1
0
0
31
紫云
紫云@dviolettchan·
The trickiest cost of being an independent researcher may be conference registration and publication fees. This is especially painful for researchers in low-income countries. Nowadays, even remote registration for some top conferences can still cost $500-$1,000. If you are doing unpaid remote research without institutional support, you may end up paying a lot to publish your own work.
English
4
1
52
7.1K
cOfDirac
cOfDirac@cOfDirac·
@dviolettchan I would love to go do some more classical ML research some day, but I'm so entrenched in LLM research for such a long time, it's all I think about. And no, I've never had a good idea for finetuning, every idea I have is training from scratch.
English
1
0
0
34
紫云
紫云@dviolettchan·
@cOfDirac Maybe you should work on topics that are less expensive. Even API calls alone can be a huge cost in some cases, let alone LLM training.
English
1
0
0
81
cOfDirac
cOfDirac@cOfDirac·
@jino_rohit I would say that learning to read PTX is the only absolute requirement though if you hope to be able to do any serious profiling
English
1
0
1
19
cOfDirac
cOfDirac@cOfDirac·
@jino_rohit you can skip CUDA (but learn to read PTX), Triton is only good if you're willing to learn Gluon/TLX, TileLang and CuTe seem pretty nice if you wanna max numbers, helion seems great for cross platform and easy code. either way, they all do the job so pick your poison.
English
1
0
2
314
Jino Rohit
Jino Rohit@jino_rohit·
cuda, triton, cutlass, cute, tilelang, thunderkittens, mojo, helion. so which one do you even learn at this point?
English
45
4
255
17.7K
cOfDirac
cOfDirac@cOfDirac·
@eliebakouch It's undeniable that LLMs currently produce impressive results, but there's tiny cracks on the surface that reveal that they're the furthest thing from any sort of general intelligence. I think this is a fault of how we train them and the data we use.
English
1
0
0
18
cOfDirac
cOfDirac@cOfDirac·
@eliebakouch I can't speak for their approach but personally I have a strong feeling that we'll never get to AGI without moving past our current regime of shoveling data into models. It will never be enough, and it's debatable if it works at all.
English
1
0
0
213
Lunexa
Lunexa@Lunexalith·
@pigeon__s Isn't data set quality far more important than data set size?
English
2
0
16
1.7K
Hanchi Sun
Hanchi Sun@sun_hanchi·
If ur goal is AGI: 1. Is mHC’s sinkhorn AGI? 2. Is sqrt(softplus(•)) AGI? 3. Is HashMoE AGI? 4. Is using two sets of coefficients for muon AGI? How do they serve the purpose of AGI?
DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English
20
2
45
22K