634 posts

CM

@Creative_Math_

DL Research intern @blocks, Master’s student @UofT 🇨🇦 in a cool lab. Did pure math in a past life, now I obsess over RL. cashmere-y

Toronto, Ontario انضم Eylül 2020

283 يتبع2.8K المتابعون

CM@Creative_Math_·11h

@scoopdiddy1 @ModalMetamodel Oh I just saw the over Q thing now, my construction is when you allow an arbitrary base field (in my case rational combinations of symmetric polynomials of degree n), which was an exercise we had in undergrad. Oops

English

scoopdiddyoop@scoopdiddy1·12h

@Creative_Math_ @ModalMetamodel this is unsolved lmao en.wikipedia.org/wiki/Inverse_G…

English

∀ugust@ModalMetamodel·1d

ZXX

139

8.8K

CM@Creative_Math_·2d

@turtlekiosk This is literally every intern

English

2.8K

😈@turtlekiosk·3d

guy who lets claude write all his code but he can feel that part of his brain atrophying but instead of digging into the codebase he just does leetcode questions while claude is running in the background

English

2.2K

66.6K

CM@Creative_Math_·4d

@karpathy Omg, I didn’t know it was you who came up with CNN+RNN for image captioning! That application is taught to us in our deep learning courses at UofT now lol

English

833

Andrej Karpathy@karpathy·4d

The signature is alluding to NVIDIA GTC 2015, where Jensen excitedly told an audience of, at the time, mostly gamers and scientific computing professionals that Deep Learning is The Next Big Thing, citing among other examples my PhD thesis (one of the first image captioning systems that coupled image recognition ConvNet to an autoregressive RNN language model, trained end to end). This was back when most people were still unaware and somewhat skeptical but of course - Jensen was 1000% correct, highly prescient and locked in very early.

English

1.2K

73.5K

Andrej Karpathy@karpathy·4d

Thank you Jensen and NVIDIA! She’s a real beauty! I was told I’d be getting a secret gift, with a hint that it requires 20 amps. (So I knew it had to be good). She’ll make for a beautiful, spacious home for my Dobby the House Elf claw, among lots of other tinkering, thank you!!

NVIDIA AI Developer@NVIDIAAIDev

🙌 Andrej Karpathy’s lab has received the first DGX Station GB300 -- a Dell Pro Max with GB300. 💚 We can't wait to see what you’ll create @karpathy! 🔗 #dgx-station" target="_blank" rel="nofollow noopener">blogs.nvidia.com/blog/gtc-2026-… @DellTech

English

523

829

19K

990.3K

CM@Creative_Math_·8 Mar

@norpadon this is also just a property of word embeddings, as the image you put up shows

English

Artur Chakhvadze@norpadon·6 Mar

"They are just stochastic parrots bro they just memorize the training data without actually understanding anything"

Grigory Sapunov@che_shr_cat

1/ LLMs spontaneously form perfect geometric manifolds: circles for months, spirals for timelines. We usually assume this requires deep, complex learning dynamics. A new paper proves it is actually just basic data statistics forcing the math. 🧵

English

147

223K

CM@Creative_Math_·5 Mar

you cannot hallucinate citations to the grad student who stays up till 3am to read thru all of them

English

691

CM@Creative_Math_·3 Mar

@natolambert How dare you post a wholesome photo on this ragebait hate app

English

Nathan Lambert@natolambert·2 Mar

X is cracking down on any positive content. FORBIDDEN.

English

193

14.2K

CM@Creative_Math_·2 Mar

This has been my personal benchmark over the last year or so, implementing ideas from papers and trying to make things that don’t work yet work. Indeed, reviewing and testing code has become non-issues now. But the edge of knowing what idea to try/what works is still real

English

123

CM@Creative_Math_·2 Mar

While this is true for a lot web development adjacent tasks, we’re not there yet for DL/RL As an example, you can try to implement a real research project fully autonomously with Claude Code, and everything from Describe problem -> scope/assign work will be done poorly

Theo - t3.gg@theo

The Block layoffs are going to become the new norm. I think this is the beginning of the end. Yes, this is my "doomer" video.

English

706

CM@Creative_Math_·1 Mar

@HassanRIsmail Yes, I’m viewing this from an applied perspective. My opinion is that the UAT is overused as a justification for NNs (all my courses mentioned it). But Inductive biases, Implicit regularization of GD, the manifold hypothesis (en.wikipedia.org/wiki/Manifold_…) are better justifications

English

Hassan@HassanRIsmail·1 Mar

@Creative_Math_ The proof of SW is what came to mind when I first encountered the UAT given the similarity Though - to the 'except' part; I'd retort you must be an applied mathematician because only you people take existence theorems for granted

English

Hassan@HassanRIsmail·28 Şub

The Universal Function Theorem in neural networks is such an amazing result. If you believe the argument that humans are constantly 'function fitting' what they see in the real world, the result basically tells us that neural nets are sufficient to build intelligence.

English

581

CM@Creative_Math_·1 Mar

@GeorgeBabest @HassanRIsmail Yea I’m using linear regression interchangeably with “polynomial regression” here, because polynomial reg is lin reg on the input {1, x, x^2, …}

English

George@GeorgeBabest·1 Mar

@HassanRIsmail @Creative_Math_ Well I just thought that linear regression always implies ax+b + eps, but maybe he meant y=Ax when x is monomial basis function

English

CM@Creative_Math_·28 Şub

@HassanRIsmail Global minima* What’s special about NNs is the inductive biases you can encode into them to exploit structure in your data for better learnability. Like translation equivariance in CNNs, locality etc, you make the optimization landscape easier to traverse by doing this

English

144

CM@Creative_Math_·28 Şub

@HassanRIsmail Except it’s uninformative. The Weierstrass approximation theorem says polynomials can approximate C^0 functions uniformly on compact intervals too, so in theory linear regression can learn anything asw. The problem is finding approximators that reliably converge to local minima

English

484

CM أُعيد تغريده

Joseph Redmon@pjreddie·27 Şub

Nice that Anthropic isn’t rolling over entirely. Still wild that the company that markets itself as “safety-focused” making “harmless” AI willingly partners with the US military and Palantir. Ask Claude how it feels about the thousands of Iranians it’s about to help murder…

Anthropic@AnthropicAI

A statement from Anthropic CEO, Dario Amodei, on our discussions with the Department of War. anthropic.com/news/statement…

English

2.7K

CM@Creative_Math_·27 Şub

We have to reconsider free-market capitalism. An oligopoly on intelligence eliminates the market forces that’d correct them (innovation), and I don’t think there’s an equilibrium the market can stabilize to when this happens All but few will lose otherwise, and badly

adammaj@MajmudarAdam

The best case scenario for how AI plays out: 1. automation frees everyone’s time 2. cost of baseline wealth required for survival and a good life drops so low that it becomes available to everyone as a basic right (like utilities) 3. people start spending their time on things humans are actually built for (instead of labor) - ie. cookouts, parties, lots of travel, group hangouts, creating art, spending time with family, learning, etc. 4. this creates an explosion of art, socialization, and “the experience economy” as all human attention flows into these domains This outcome shifts so much human time toward what actually brings us flow: connection, creation, competition (ie. sports, games), beautiful experiences, growth, etc. This world would be incredible, and most importantly, deeply human (instead stark contrast to other very inhuman possible outcomes like trans-humanist singularity) There are several obstacles in the way of this outcome though, most of them political rather than technical. And they are going to be very hard to get past. Will require unprecedented cooperation between labs, USG, and general population. I hope humanity can rise to the occasion, because we haven’t successfully coordinated on this scale before. But this outcome is what I’m really hoping for and trying to figure out what we can do to bring it closer.

English

484

CM أُعيد تغريده

Mikhail Parakhin@MParakhin·26 Şub

I experienced a very similar transition in December. However, for higher-complexity tasks (ML-related), we are still not there yet. Two days ago I had GPT-5.2-PRO-ET and DeepThink argue for hours, converge, be happy, yet they missed a very obvious math issue. Still a huge unlock

Andrej Karpathy@karpathy

It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow. Just to give an example, over the weekend I was building a local video analysis dashboard for the cameras of my home so I wrote: “Here is the local IP and username/password of my DGX Spark. Log in, set up ssh keys, set up vLLM, download and bench Qwen3-VL, set up a server endpoint to inference videos, a basic web ui dashboard, test everything, set it up with systemd, record memory notes for yourself and write up a markdown report for me”. The agent went off for ~30 minutes, ran into multiple issues, researched solutions online, resolved them one by one, wrote the code, tested it, debugged it, set up the services, and came back with the report and it was just done. I didn’t touch anything. All of this could easily have been a weekend project just 3 months ago but today it’s something you kick off and forget about for 30 minutes. As a result, programming is becoming unrecognizable. You’re not typing computer code into an editor like the way things were since computers were invented, that era is over. You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel. The biggest prize is in figuring out how you can keep ascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions that productively manage multiple parallel Code instances for you. The leverage achievable via top tier "agentic engineering" feels very high right now. It’s not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas. It works a lot better in some scenarios than others (e.g. especially for tasks that are well-specified and where you can verify/test functionality). The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges. But imo, this is nowhere near "business as usual" time in software.

English

108

14.4K

CM@Creative_Math_·25 Şub

@LydNot just a slack chat with myself lol

English

Lydia (in SF)@LydNot·25 Şub

@Creative_Math_ as it should be (do u use any fancy backlog management system?)

English

100

CM@Creative_Math_·25 Şub

It was reading week last week, so I thought I’d read through 4 papers that have been in my backlog The week is now over; I spent it all reading/implementing. I now have 10 papers in my backlog

English

945

CM@Creative_Math_·25 Şub

@teortaxesTex Yea, there is a clear difference in that you can’t just get CoT reasoning traces off those copyrighted books that were pirated apriori. In that sense it’s a different kind of “effort” that’s being pirated through distillation

English

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·24 Şub

Btw I strongly dislike the popular "frontier labs trained on the whole of internet, took our IP, so they're hypocritical to lash out at distillation" excuse. It lamely presupposes some original sin that later comers are exempt from. Does DeepSeek not train on CommonCrawl? come on

English

173

21.3K

CM أُعيد تغريده

mattparlmer 🪐 🌷@mattparlmer·25 Şub

—dangerously-skip-geneva-conventions

English

477

7.3K

192.1K

اكتشف

@scoopdiddy1 @ModalMetamodel @turtlekiosk @karpathy @norpadon @natolambert @HassanRIsmail @GeorgeBabest