Vikramank Singh

8K posts

Vikramank Singh

@vikramank14

Senior Applied Scientist @AmazonScience working on ML and Database Systems. Grad @UCBerkeley ‘20. Speaker @TEDx ‘18.

San Francisco, CA Katılım Ekim 2013

539 Takip Edilen353 Takipçiler

Vikramank Singh@vikramank14·28 Oca

@tszzl 🤣

QME

roon@tszzl·27 Oca

satya nadella was promoted to ceo of microsoft because he was willing to be paid less than heritage american steve ballmer, and due to ethnic nepotistic hiring practices by the majority shareholder billu gateswala

English

133

236

4.8K

307.6K

Vikramank Singh retweetledi

Andrej Karpathy@karpathy·27 Oca

I don't have too too much to add on top of this earlier post on V3 and I think it applies to R1 too (which is the more recent, thinking equivalent). I will say that Deep Learning has a legendary ravenous appetite for compute, like no other algorithm that has ever been developed in AI. You may not always be utilizing it fully but I would never bet against compute as the upper bound for achievable intelligence in the long run. Not just for an individual final training run, but also for the entire innovation / experimentation engine that silently underlies all the algorithmic innovations. Data has historically been seen as a separate category from compute, but even data is downstream of compute to a large extent - you can spend compute to create data. Tons of it. You've heard this called synthetic data generation, but less obviously, there is a very deep connection (equivalence even) between "synthetic data generation" and "reinforcement learning". In the trial-and-error learning process in RL, the "trial" is model generating (synthetic) data, which it then learns from based on the "error" (/reward). Conversely, when you generate synthetic data and then rank or filter it in any way, your filter is straight up equivalent to a 0-1 advantage function - congrats you're doing crappy RL. Last thought. Not sure if this is obvious. There are two major types of learning, in both children and in deep learning. There is 1) imitation learning (watch and repeat, i.e. pretraining, supervised finetuning), and 2) trial-and-error learning (reinforcement learning). My favorite simple example is AlphaGo - 1) is learning by imitating expert players, 2) is reinforcement learning to win the game. Almost every single shocking result of deep learning, and the source of all *magic* is always 2. 2 is significantly significantly more powerful. 2 is what surprises you. 2 is when the paddle learns to hit the ball behind the blocks in Breakout. 2 is when AlphaGo beats even Lee Sedol. And 2 is the "aha moment" when the DeepSeek (or o1 etc.) discovers that it works well to re-evaluate your assumptions, backtrack, try something else, etc. It's the solving strategies you see this model use in its chain of thought. It's how it goes back and forth thinking to itself. These thoughts are *emergent* (!!!) and this is actually seriously incredible, impressive and new (as in publicly available and documented etc.). The model could never learn this with 1 (by imitation), because the cognition of the model and the cognition of the human labeler is different. The human would never know to correctly annotate these kinds of solving strategies and what they should even look like. They have to be discovered during reinforcement learning as empirically and statistically useful towards a final outcome. (Last last thought/reference this time for real is that RL is powerful but RLHF is not. RLHF is not RL. I have a separate rant on that in an earlier tweet x.com/karpathy/statu…)

Andrej Karpathy@karpathy

DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M). For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being brought up today are more around 100K GPUs. E.g. Llama 3 405B used 30.8M GPU-hours, while DeepSeek-V3 looks to be a stronger model at only 2.8M GPU-hours (~11X less compute). If the model also passes vibe checks (e.g. LLM arena rankings are ongoing, my few quick tests went well so far) it will be a highly impressive display of research and engineering under resource constraints. Does this mean you don't need large GPU clusters for frontier LLMs? No but you have to ensure that you're not wasteful with what you have, and this looks like a nice demonstration that there's still a lot to get through with both data and algorithms. Very nice & detailed tech report too, reading through.

English

364

2.1K

14.4K

2.4M

Vikramank Singh retweetledi

María Ballesteros 💚@m_ballesteross·21 Oca

I’m a US citizen because I was born while my dad was a student. My citizenship allowed me to work at UChicago, which then got me into a Harvard PhD, a degree I hope to use to teach fellow Americans. The US has given me so many opportunities Hurts to think I’m not wanted here.

English

1.8K

222

2.6K

1.2M

Vikramank Singh retweetledi

Min Choi@minchoi·22 Oca

3. Building Perplexity clone with DeepSeek R1 in about an hour without writing single line of code x.com/rileybrown_ai/…

Riley Brown@rileybrown

I built Perplexity but with @deepseek_ai Reasoner model in about an hour and i'm posting the full video of it later today. As always, didn't write a single line of code. Should I open source this one?

English

546

112.5K

Vikramank Singh retweetledi

Atal Agarwal 💜🚐🌍@atalovesyou·21 Oca

The impact of EO would be huge. Story I received today 👇 I don't know if this feels like a pricking case of betrayal in many levels just for me or if this applies to a lot of others. I've been brought up to believe in merit, the value of hard work and dedication and doing what's right by the law. I completed my masters in this country from a reputed institute and I major in cyber security. I also have an approved petition in the National Interest waiver and exceptional ability category (EB2-NIW ) but because of my country of birth I'm stuck in backlogs. My husband works for one of the companies that Elon founded and is also an accomplished engineer with skills in advanced manufacturing. He doesn't have a US citizenship like me and we are both on work visas. We are both expecting our first kid in the next two months and as we have built a whole life together here for over 8+ years, after having worked extremely hard, it feels like a last stab to watch birthright citizenship also being taken away from our yet to be born child. I know this might seem like a rant but I'm sure there are human stories like mine that makes this new EO extremely unfair. My request to you is: can you help amplify stories like ours? I would like to keep names out of this until we figure out the situation going forward. However I'm sure I'm not the only one. We are both legal immigrants who are very much within the "jurisdictional authority" of the US. We studied here. We have built our lives here. We are not 'birth tourists'. We pay our taxes. We pay for an education system that even we don't see benefitting our own kid in the years to come. We are in perpetual fear and uncertainty. There is an unjust and unforgiving lack of political will to do right by the folks abiding by the law even if we are a silent minority. I hope this gets amplified and I hope you'd help in doing that. Thank you very much for reading and I appreciate your time. @elonmusk - what are you even doing?

English

353

116.8K

Vikramank Singh retweetledi

Andrej Karpathy@karpathy·17 Nis

# scheduling workloads to run on humans Some computational workloads in human organizations are best "run on a CPU": take one single, highly competent person and assign them a task to complete in a single-threaded fashion, without synchronization. Usually the best fit when starting something new. Comparable to "building the skeleton" of a thing. Other workloads are best run on a GPU: take a larger number of (possibly more junior) people and assign tasks in parallel: massively multi-threaded, requiring synchronization overhead. Usually a good fit for later stages of a project, or parts that naturally afford parallelism, comparable to "fleshing out" a thing when the skeleton is there. There's some middle ground here - sometimes you can imagine a multi-threaded CPU execution of a small team collaborating. A good manager will understand the computational geometry of the project at hand and know when to delegate parts of it on the CPU or on the GPU. One notable place where the analogy breaks down a bit is that the worst thing that can happen when you misallocate computer resources is that your program will run slower. But in human organizations it can be much worse - not just slower, but the result can be of lower quality overall, more brittle, more disorganized, less consistent, uglier. The most common stumbling point here is trying to parallelize something that was supposed to run on the CPU. In the common tongue, this comes from the misunderstanding that something can go faster if you put more people on it, usually leading to outcomes where something is "designed by a committee" - not only is the thing actually slower, but the philosophy is inconsistent, the entropy is high, and the long-term outcomes much worse. The opposite problem is more rare and usually looks like someone doing something repetitive, uninteresting or tedious, where they could really benefit from more help. I think this is one accidental advantage of startups - they lack resources of large companies and run compute on powerful CPUs, winning in cases where that is the right thing to do. Larger companies, especially in cases where something is deemed of high strategic importance, will almost always reach for too much parallelism. TLDR: Think about your project, its computational geometry, its inherent parallelism, and which parts are a best fit for a CPU or a GPU.

English

345

2.9K

334.7K

Vikramank Singh@vikramank14·31 Oca

Thanks @Werner for sharing!

Werner Vogels@Werner

Panda: Performance debugging for databases using LLM agents - a very interesting application of LLMs by @vikramank14 and the team at #AWS AI Labs, published at @cidrdb. Download the pdf from @AmazonScience. amazon.science/publications/p…

English

283

Vikramank Singh retweetledi

Werner Vogels@Werner·31 Oca

English

112

17.8K

Vikramank Singh retweetledi

Andrew Akbashev@Andrew_Akbashev·29 Ağu

This is how I advise my #PhD students to write research manuscripts (in case someone finds it helpful). General points: 1. Research questions addressed by your manuscript are key and should guide you. 2. Don’t view your manuscript as an article. See it as a STORY. 3. Pick the writing style that is easily understood by a broader community. Make reading easy. 4. Most of data should get into the paper. If some doesn’t support the hypothesis, it still must be in the Suppl. Information. It must show the reproducibility limits. 5. Make the paper shorter, not longer. Cut out things that may sound like ‘bluff’ or ‘decoration’ of the story. Use well-defined terminology, don’t invent it unless clearly necessary. 6. Focus on reporting & explaining the numbers. Minimize discussions of qualitative outcomes and your imagination. ▫️ Specific steps: 1️⃣ First, formulate and polish the key questions that your study addresses. It may take hours or even days (even though you've been doing research in this area for years). A single study should address no more than 1-3 key questions. It’s your perfect start for writing. 2️⃣ Write down the structure of your STORY first: Sections and Subsections that will answer those questions. Into each subsection, put 1-2 sentences that formulate the message(s) from this subsection. It will hugely help you navigate the manuscript later and save a lot of time. 3️⃣ Write approximate messages in the conclusion section. Usually, no more than 1-4 sentences. ▫️ At this point, SHARE your [structure+questions+messages] document with your advisor for feedback. Toss it back and forth until you both converge. You can also include major collaborators if needed. ▫️ 4️⃣ Write the introduction part. Put down the paragraphs that introduce a reader into the key question(s) of the manuscript and the background of your story. 5️⃣ Write the main text for each section, smoothly and firmly. Each paragraph should add a separate value and end with a message-like sentence. Follow the “First… Second… Third…” structure for paragraphs when possible, it gives rigor and readability to your story. 6️⃣ Write the conclusions. Add a broader perspective that is justified and not generic. 7️⃣ Write the abstract. It must have simple terminology and clearly explain what readers can find inside the paper. It also should contain the key conclusions. 8️⃣ Write up 4-5 different titles and spend >30 mins with your team discussing which title sounds best. Finally, iterate on the resulting draft within your team. The number of drafts can easily exceed 20. ▫️ ❗In addition, I always emphasize that a high quality of your research paper: - sharpen your writing and analytical skills. - shapes your reputation. - shows who you are as a researcher and communicator. ▫️ p.s. Everyone has a different style of advising and writing. You can adopt only some specific steps if you find them helpful. p.p.s. Another way that we sometimes use is by starting with figures ('story in figures' style). #AcademicTwitter #AcademicChatter

English

1.3K

5.7K

859.1K

Vikramank Singh retweetledi

Ayman Al-Abdullah 🧱@aymanalabdul·22 Tem

The five levels of wealth: Level 1 - No Watch Level 2 - Apple Watch Level 3 - Rolex Level 4 - Patek Level 5 - No Watch

English

585

2.2K

22K

3.3M

Vikramank Singh retweetledi

Eliezer Yudkowsky ⏹️@ESYudkowsky·5 Nis

So the actual scary part to me is that GPT4 understands what it means to say, "Compress this in a way where *you* can decompress it." Humans take for granted that we know our own capabilities, that we reflect, that we can imagine how we would react to a future input, we can optimize over possible future inputs like that. Since when does GPT4 know what "you" refers to, what "you" can do, since when can it imagine "you" well enough to answer an unprecedented and untrained question about how to compress a sentence that "you" will decompress? And solve it in a way no human would, that couldn't just be imitative? Like. It found a short sentence with, I assume, embedding similar to the long sentence. How does it KNOW? It doesn't have access to the encodings! ("Stochastic parrot" my fucking ass.)

gfodor.id@gfodor

MY JAW IS ON THE FLOOR.

English

209

284

2.8K

1.5M

Vikramank Singh@vikramank14·17 Mar

Join us for #AWSonAir March 17 at 12pm PT 🍀 to learn about #AmazonGuardDuty RDS Protection for #AmazonAurora ☞ go.aws/42h4hDB

English

101

Vikramank Singh retweetledi

Matt Miller@brainiaq2000·18 Kas

It’s been a ride

English

3.6K

3.4K

29.1K

Vikramank Singh retweetledi

Papers with Code@paperswithcode·15 Kas

🪐 Introducing Galactica. A large language model for science. Can summarize academic literature, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more. Explore and get weights: galactica.org

English

210

7.4K

Vikramank Singh retweetledi

Jacky Liang@jackyliang42·2 Kas

How can robots perform a wide variety of novel tasks from natural language? Execited to present Code as Policies - using language models to directly write robot policy code from language instructions. See paper, colabs, blog, and demos at code-as-policies.github.io long 🧵👇

English

144

651

Vikramank Singh retweetledi

Davis Blalock@davisblalock·29 Eki

"What Makes Convolutional Models Great on Long Sequence Modeling?" CNNs—not transformers—now dominate the hardest sequence modeling benchmark. Here's how this happened: [1/14]

English

402

Vikramank Singh@vikramank14·28 Eki

@ylecun @jamesbuchanan27 Also, these systems are fundamentally deterministic but what makes them stochastic are millions of end-points (users) interacting with them in an unpredictable manner. It'll be interesting how this uncertainty can be factorized in aleatoric & epistemic, 'Z' in JEPA and estimated.

English

Vikramank Singh@vikramank14·28 Eki

@ylecun @jamesbuchanan27 More specifically, in our group we look at building these world models for systems like databases that are well studied and interpretable but complex and configurable. So modular world models or causal models seem to be a better approach than large monolith architecture (4/n)

English

Yann LeCun@ylecun·27 Eki

I gave an interview with the Spanish national TV (channel 1) which will run today at 15h00 in the news.

English

Keşfet

@tszzl @elonmusk @Werner @cidrdb @AmazonScience @ylecun @jamesbuchanan27 @BarackObama