CM

634 posts

CM banner
CM

CM

@Creative_Math_

DL Research intern @blocks, Master’s student @UofT 🇨🇦 in a cool lab. Did pure math in a past life, now I obsess over RL. cashmere-y

Toronto, Ontario انضم Eylül 2020
283 يتبع2.8K المتابعون
CM
CM@Creative_Math_·
@scoopdiddy1 @ModalMetamodel Oh I just saw the over Q thing now, my construction is when you allow an arbitrary base field (in my case rational combinations of symmetric polynomials of degree n), which was an exercise we had in undergrad. Oops
English
1
0
1
11
∀ugust
∀ugust@ModalMetamodel·
ZXX
2
6
139
8.8K
CM
CM@Creative_Math_·
@turtlekiosk This is literally every intern
English
0
0
20
2.8K
😈
😈@turtlekiosk·
guy who lets claude write all his code but he can feel that part of his brain atrophying but instead of digging into the codebase he just does leetcode questions while claude is running in the background
English
14
48
2.2K
66.6K
CM
CM@Creative_Math_·
@karpathy Omg, I didn’t know it was you who came up with CNN+RNN for image captioning! That application is taught to us in our deep learning courses at UofT now lol
English
0
0
8
833
Andrej Karpathy
Andrej Karpathy@karpathy·
The signature is alluding to NVIDIA GTC 2015, where Jensen excitedly told an audience of, at the time, mostly gamers and scientific computing professionals that Deep Learning is The Next Big Thing, citing among other examples my PhD thesis (one of the first image captioning systems that coupled image recognition ConvNet to an autoregressive RNN language model, trained end to end). This was back when most people were still unaware and somewhat skeptical but of course - Jensen was 1000% correct, highly prescient and locked in very early.
Andrej Karpathy tweet media
English
28
49
1.2K
73.5K
Andrej Karpathy
Andrej Karpathy@karpathy·
Thank you Jensen and NVIDIA! She’s a real beauty! I was told I’d be getting a secret gift, with a hint that it requires 20 amps. (So I knew it had to be good). She’ll make for a beautiful, spacious home for my Dobby the House Elf claw, among lots of other tinkering, thank you!!
NVIDIA AI Developer@NVIDIAAIDev

🙌 Andrej Karpathy’s lab has received the first DGX Station GB300 -- a Dell Pro Max with GB300. 💚 We can't wait to see what you’ll create @karpathy! 🔗 #dgx-station" target="_blank" rel="nofollow noopener">blogs.nvidia.com/blog/gtc-2026-… @DellTech

English
523
829
19K
990.3K
CM
CM@Creative_Math_·
@norpadon this is also just a property of word embeddings, as the image you put up shows
English
0
0
3
51
CM
CM@Creative_Math_·
you cannot hallucinate citations to the grad student who stays up till 3am to read thru all of them
CM tweet media
English
0
0
6
691
CM
CM@Creative_Math_·
@natolambert How dare you post a wholesome photo on this ragebait hate app
English
0
0
0
72
Nathan Lambert
Nathan Lambert@natolambert·
X is cracking down on any positive content. FORBIDDEN.
Nathan Lambert tweet media
English
12
2
193
14.2K
CM
CM@Creative_Math_·
This has been my personal benchmark over the last year or so, implementing ideas from papers and trying to make things that don’t work yet work. Indeed, reviewing and testing code has become non-issues now. But the edge of knowing what idea to try/what works is still real
English
0
0
0
123
CM
CM@Creative_Math_·
@HassanRIsmail Yes, I’m viewing this from an applied perspective. My opinion is that the UAT is overused as a justification for NNs (all my courses mentioned it). But Inductive biases, Implicit regularization of GD, the manifold hypothesis (en.wikipedia.org/wiki/Manifold_…) are better justifications
English
0
0
3
45
Hassan
Hassan@HassanRIsmail·
@Creative_Math_ The proof of SW is what came to mind when I first encountered the UAT given the similarity Though - to the 'except' part; I'd retort you must be an applied mathematician because only you people take existence theorems for granted
English
2
0
1
70
Hassan
Hassan@HassanRIsmail·
The Universal Function Theorem in neural networks is such an amazing result. If you believe the argument that humans are constantly 'function fitting' what they see in the real world, the result basically tells us that neural nets are sufficient to build intelligence.
English
2
0
9
581
CM
CM@Creative_Math_·
@GeorgeBabest @HassanRIsmail Yea I’m using linear regression interchangeably with “polynomial regression” here, because polynomial reg is lin reg on the input {1, x, x^2, …}
English
0
0
1
20
George
George@GeorgeBabest·
@HassanRIsmail @Creative_Math_ Well I just thought that linear regression always implies ax+b + eps, but maybe he meant y=Ax when x is monomial basis function
English
1
0
0
31
CM
CM@Creative_Math_·
@HassanRIsmail Global minima* What’s special about NNs is the inductive biases you can encode into them to exploit structure in your data for better learnability. Like translation equivariance in CNNs, locality etc, you make the optimization landscape easier to traverse by doing this
English
3
0
5
144
CM
CM@Creative_Math_·
@HassanRIsmail Except it’s uninformative. The Weierstrass approximation theorem says polynomials can approximate C^0 functions uniformly on compact intervals too, so in theory linear regression can learn anything asw. The problem is finding approximators that reliably converge to local minima
English
4
0
16
484
CM أُعيد تغريده
Joseph Redmon
Joseph Redmon@pjreddie·
Nice that Anthropic isn’t rolling over entirely. Still wild that the company that markets itself as “safety-focused” making “harmless” AI willingly partners with the US military and Palantir. Ask Claude how it feels about the thousands of Iranians it’s about to help murder…
Anthropic@AnthropicAI

A statement from Anthropic CEO, Dario Amodei, on our discussions with the Department of War. anthropic.com/news/statement…

English
2
3
31
2.7K
CM
CM@Creative_Math_·
We have to reconsider free-market capitalism. An oligopoly on intelligence eliminates the market forces that’d correct them (innovation), and I don’t think there’s an equilibrium the market can stabilize to when this happens All but few will lose otherwise, and badly
adammaj@MajmudarAdam

The best case scenario for how AI plays out: 1. automation frees everyone’s time 2. cost of baseline wealth required for survival and a good life drops so low that it becomes available to everyone as a basic right (like utilities) 3. people start spending their time on things humans are actually built for (instead of labor) - ie. cookouts, parties, lots of travel, group hangouts, creating art, spending time with family, learning, etc. 4. this creates an explosion of art, socialization, and “the experience economy” as all human attention flows into these domains This outcome shifts so much human time toward what actually brings us flow: connection, creation, competition (ie. sports, games), beautiful experiences, growth, etc. This world would be incredible, and most importantly, deeply human (instead stark contrast to other very inhuman possible outcomes like trans-humanist singularity) There are several obstacles in the way of this outcome though, most of them political rather than technical. And they are going to be very hard to get past. Will require unprecedented cooperation between labs, USG, and general population. I hope humanity can rise to the occasion, because we haven’t successfully coordinated on this scale before. But this outcome is what I’m really hoping for and trying to figure out what we can do to bring it closer.

English
2
0
8
484
CM أُعيد تغريده
Mikhail Parakhin
Mikhail Parakhin@MParakhin·
I experienced a very similar transition in December. However, for higher-complexity tasks (ML-related), we are still not there yet. Two days ago I had GPT-5.2-PRO-ET and DeepThink argue for hours, converge, be happy, yet they missed a very obvious math issue. Still a huge unlock
Andrej Karpathy@karpathy

It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow. Just to give an example, over the weekend I was building a local video analysis dashboard for the cameras of my home so I wrote: “Here is the local IP and username/password of my DGX Spark. Log in, set up ssh keys, set up vLLM, download and bench Qwen3-VL, set up a server endpoint to inference videos, a basic web ui dashboard, test everything, set it up with systemd, record memory notes for yourself and write up a markdown report for me”. The agent went off for ~30 minutes, ran into multiple issues, researched solutions online, resolved them one by one, wrote the code, tested it, debugged it, set up the services, and came back with the report and it was just done. I didn’t touch anything. All of this could easily have been a weekend project just 3 months ago but today it’s something you kick off and forget about for 30 minutes. As a result, programming is becoming unrecognizable. You’re not typing computer code into an editor like the way things were since computers were invented, that era is over. You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel. The biggest prize is in figuring out how you can keep ascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions that productively manage multiple parallel Code instances for you. The leverage achievable via top tier "agentic engineering" feels very high right now. It’s not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas. It works a lot better in some scenarios than others (e.g. especially for tasks that are well-specified and where you can verify/test functionality). The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges. But imo, this is nowhere near "business as usual" time in software.

English
6
4
108
14.4K
CM
CM@Creative_Math_·
@LydNot just a slack chat with myself lol
English
0
0
1
46
CM
CM@Creative_Math_·
It was reading week last week, so I thought I’d read through 4 papers that have been in my backlog The week is now over; I spent it all reading/implementing. I now have 10 papers in my backlog
English
1
0
15
945
CM
CM@Creative_Math_·
@teortaxesTex Yea, there is a clear difference in that you can’t just get CoT reasoning traces off those copyrighted books that were pirated apriori. In that sense it’s a different kind of “effort” that’s being pirated through distillation
English
0
0
2
76
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Btw I strongly dislike the popular "frontier labs trained on the whole of internet, took our IP, so they're hypocritical to lash out at distillation" excuse. It lamely presupposes some original sin that later comers are exempt from. Does DeepSeek not train on CommonCrawl? come on
English
28
5
173
21.3K
CM أُعيد تغريده
mattparlmer 🪐 🌷
mattparlmer 🪐 🌷@mattparlmer·
—dangerously-skip-geneva-conventions
English
52
477
7.3K
192.1K