will ye
1.6K posts

will ye
@will__ye
member of whimsical staff, ex-ramp applied ai
nyc → sf Katılım Mayıs 2016
473 Takip Edilen5K Takipçiler

@vmfunc Hi, we don't have a dedicated phone line for support but please create a ticket here so we can investigate: frame.work/support
English

@threebarebears My sister said you named the protagonist after them, is that true or are they capping
English

EVERYONE! If you're in SF this Sunday, come hear an awesome behind the scenes talk about the making of #hoppers , featuring some talented key folks from the movie!! (i'll be in the back cheering them on, come say hi) 💛
The Walt Disney Family Museum@WDFMuseum
“Hop” over to our theater on Sunday, March 22 to earn how story and animation inform each other to create the world of of Disney and Pixar’s latest feature film, "Hoppers" (2026), with Pixar’s Margaret Spencer, John Cody Kim, James W. Brown, and Cody Lyon. bit.ly/4l6kXaC
English

@juliarturc @_avichawla This guy’s content is either wrong or stolen, what a slop merchant
English

@_avichawla I’m glad you found my video helpful since 80% of this post is repurposing it. Next time you borrow my content just tag me, we can cross promote.
youtu.be/jrJKRYAdh7I?si…

YouTube
English

You're in a Research Scientist interview at DeepMind.
The interviewer asks:
"Our investors want us to contribute to open-source.
Gemini crushed benchmarks.
But we'll lose competitive edge by open-sourcing it.
What to do?"
You: "Release a research paper."
Here's what you missed:
LLMs today don't just learn from raw text; they also learn from each other.
For example:
- Llama 4 Scout & Maverick were trained using Llama 4 Behemoth.
- Gemma 2 and 3 were trained using Gemini.
Distillation helps us do so, and the visual explains 3 popular techniques.
1️⃣ Soft-label distillation
Generate token-level softmax probabilities over the entire corpus using:
- A frozen pre-trained Teacher LLM.
- An untrained Student LLM.
Train the Student LLM to match the Teacher's probabilities.
In soft-label distillation, access to Teacher's probabilities gives max knowledge transfer.
But you must have access to the Teacher’s weights to get the probability distribution.
Even if you have access, there's another problem!
Say your vocab has 100k tokens and data has 5 trillion tokens.
Storing softmax probabilities over the entire vocab for each input token needs 500M GBs of memory under fp8 precision.
The second technique solves this.
2️⃣ Hard-label distillation
- Use the Teacher LLM to get the output token.
- Get the softmax probs. from the Student LLM.
- Train the Student to match Teacher's output.
DeepSeek-R1 was distilled into Qwen & Llama using this technique.
3️⃣ Co-distillation
- Start with an untrained Teacher and Student LLM.
- Generate softmax probs over the current batch from both models.
- Train the Teacher LLM on the hard labels.
- Train the Student LLM to match softmax probs of the Teacher.
Meta used co-distillation to train Llama 4 Scout and Maverick from Llama 4 Behemoth.
Of course, during the initial stages, soft labels of the Teacher LLM won't be accurate.
That is why Student LLM is trained using both soft labels + ground-truth hard labels.
____
Find me → @_avichawla
Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

English

@Cmillet77 @im_roy_lee @amasad Not true. People of all ages and professions would rather not have to put in headphones to watch something on twitter in public. And deaf people exist?
English

@im_roy_lee @amasad Their videos are not for people who need subtitles 😁
English

We’ve raised $400M at a $9B valuation.
Investors include Georgian, G Squared, Prysm, 1789, YC, Coatue, a16z, Craft, and QIA, with strategic investments from Accenture, Databricks, Okta, and Tether. We’re also lucky to have incredible individuals backing us, including Shaq and Jared Leto.
This funding will help us scale our ambition and expand beyond coding into AI systems that center human creativity.
Replit is now used at 85% of the Fortune 500. We have an opportunity to help shape the future of work. One where AI abstracts away the boring parts and humans shine as creative directors.
We’re also investing more globally, particularly in Europe, Asia, and the Middle East. Innovation can come from anywhere in the world, and we want to help unlock it.
English

a student took the ELO rating system from chess
ran it through 95,491 tennis matches over 43 years, and trained an XGBoost model that predicts winners with 85% accuracy
he tested it on the Australian Open 2025 completely outside the training data
99 out of 116 matches correct
called every single Sinner win through the entire tournament
the champion, before the first ball was hit
no team, no funding, a laptop and free CSVs from the internet
this is the best breakdown of a real sports prediction model I've seen
study it or feed it to your AI agent
Phosphen@phosphenq
English

@mrvo5 @YIMBYLAND How many people were living in the office building?
English

@YIMBYLAND YAS! Displacement of Brown and Black people. YAS! Gentrification masked as urbanization. YAS! Even more congestion. YAS! More UNAFFORDABLE housing.
English

Butt ugly 6-story office building turns into 72-story mixed-use tower with 1,200+ new homes in Downtown Brooklyn.
LET'S FREAKING GOOOO!



NYC Planning@NYCPlanning
1,200+ new homes are coming to Downtown Brooklyn! With today's @nyccouncil approval, 395 Flatbush Ave Ext is the FIRST project in Brooklyn to use our new high-density zoning districts. The project also creates new jobs + services, public open space & subway entrance upgrades!
English

@courtne This article is 10x longer than it needs to be, it’s the same stuff rehashed. And as other ppl pointed out, calvin is not the best example lol. A much better one is @AndrewSteinborn, who dropped out of university of west georgia and programmed Minecraft servers
English

@NWischoff Would feel cozier at a different venue! Looks like a meeting room, especially with the bright lights and office chairs.
English

@NYSocialBee @calder_mchugh why is it corny? seem reasonable that one should know what a neighborhood is called to judge it. “mind your business” he’s from new york this IS his business
English

@calder_mchugh I think this is very corny and I think if it doesn’t apply to you then you should probably just mind your business :) sorry 😇
English

Sorry for New York native posting but I don’t think you should be allowed to mourn the death of a neighborhood if you don’t actually know what it’s called :)
Social Bee@NYSocialBee
Just walked through west village.
English

This one screenshot alone has his face 5 times 🤣🤣🤣 bro is pranking us all

@levelsio@levelsio
So after my last tweet icon.com now became the founder's Tinder profile
English

Excited to share this preprint that describes my latest work on using GPUs to accelerate processing of RNA-seq data.
The title says it all: "RNA-seq analysis in seconds using GPUs" now on biorxiv biorxiv.org/content/10.648…
Figure 1 shows they key result

English

Dear @AnthropicAI, my lab builds a lot of OSS for genomics (github.com/COMBINE-lab). While we lack the widespread OSS market of popular NPM packages, pieces of our software are critical in biomedical research. Please consider extending your Claude Max offer to such labs!
English

I would like to purchase a handful of code problems that modern LLMs can’t solve.
Requirements:
- programmatically verifiable (can be tested without human interaction)
- “before” state (repo before the commit that implements the solution)
- example code that actually solves the problem
I am willing to pay up to $500 per problem that I can easily test locally and confirm current models (gpt-5.3-codex, opus 4.6) are unable to solve.
If you can’t tell, I’m running out of “too hard for LLM” code tasks 🙃🙃🙃
English

















