Stephen Fernandes

1.4K posts

Stephen Fernandes

@stephennfern

building LLMs for Indian Languages

Goa India Katılım Nisan 2013

783 Takip Edilen62 Takipçiler

Stephen Fernandes@stephennfern·15h

@Sid_Arora_18 @LTIatCMU @Meta @shinjiw_at_cmu Congratulations Siddhant 🎊

English

142

Siddhant Arora@Sid_Arora_18·15h

Finished my PhD at @LTIatCMU . Excited to start at @Meta as Research Scientist, where I’ll be working on speech processing and conversational AI for wearables like smart glasses. Grateful to my advisor @shinjiw_at_cmu, collaborators, friends, and family for all the support.

English

161

8.1K

Stephen Fernandes@stephennfern·1d

The CUDA Moat has been broken ✨️

Google for Developers@googledevs

🌟 Join the TorchTPU session at #PyTorchCon Europe 2026 in Paris on April 7–8. Learn the strategy and roadmap that brings a native @PyTorch experience to TPUs. The program includes infrastructure, applications, and agent-based systems, with sessions on training, inference, gen AI, responsible AI, security, privacy, and frameworks.

English

Stephen Fernandes@stephennfern·1d

@ylecun @julianboolean_ he was just trying to get your attention, to just do reach maxxing

English

162

Yann LeCun@ylecun·1d

@julianboolean_ You simply did not understand the argument.

English

731

64.1K

Julian@julianboolean_·2d

It's interesting to think about how LeCun got this so wrong In a sense, he was perfectly correct. LLMs almost always get answers "wrong" - if by "wrong" you mean that somewhere in the reasoning trace there was a misstep But we don't care about the reasoning trace and its numerous misfires; we only care about the final answer. "So "the probability that any produced token takes us outside the set of correct answers" is meaningless - we can't define correctness until the last token. There is no exponential divergence.

English

303

91.6K

Stephen Fernandes@stephennfern·1d

@maharshii whats the training regime ? one wrist curl for every kernel launch ?

English

392

maharshi@maharshii·1d

what being a nerd does to you

English

250

8.4K

Stephen Fernandes@stephennfern·1d

the polarization in mindset that AI brings

English

Stephen Fernandes@stephennfern·1d

@0xSero @huggingface Open source must win 🏆

English

0xSero@0xSero·4d

Best models to run on your hardware level I'll be doing this every week, I hope you guys enjoy. ---- 8 GB ---- Autocomplete for coding (like Cursor Tab) - huggingface.co/NexVeridian/ze… - huggingface.co/bartowski/zed-… Tool calling, assistant style - huggingface.co/nvidia/NVIDIA-… ---- 16 Gb ---- Here things get better: Multimodal - huggingface.co/Qwen/Qwen3.5-9B - huggingface.co/Tesslate/OmniC… - huggingface.co/unsloth/Qwen3.… ---- 24 GB ---- - The best model you can get (thanks Qwen) huggingface.co/Qwen/Qwen3.5-2… - Great model (strong agents) huggingface.co/nvidia/Nemotro… - Mine hehe huggingface.co/0xSero/Qwen-3.… I'm doing a weekly series

English

219

376

3.7K

565.8K

Stephen Fernandes retweetledi

Tom Turney@no_stp_on_snek·1d

the original TurboQuant paper tested on A100 with models up to 8B. 6 days later, a bunch of strangers on the internet had it built and running on: - Apple Silicon M1 through M5 - NVIDIA 3080 Ti through DGX Spark Blackwell - AMD RX 6800 XT and 9070 - a 10-year-old Tesla P40 - an 8GB MacBook Air - models from 3.8B to 70B across 6 architecture families - 30+ independent testers along the way we found new optimizations the paper didn't cover and failure modes it didn't test. the fact that a loose group of people across the world can read a paper, build implementations from scratch, stress-test across hardware none of us could individually afford, and push the research further in under a week is genuinely one of the best things about this era. the tools and the community make it possible. open source is something else.

English

483

4.9K

138.9K

Stephen Fernandes@stephennfern·2d

@jedmaczan This is dope ✨️✨️✨️ awaiting ...

English

Jędrzej Maczan@jedmaczan·2d

I built a tiny-vllm in C++ and CUDA - paged attention - continuous batching - educational - 100% human-written™ And now I writing a course where you will build your own vLLM yourself. Still work in progress, I'll finish by the end of April. All for free ofc, just a GitHub repo

English

593

17.9K

Stephen Fernandes@stephennfern·2d

I've been seriously telling everyone around even these non tech novices who barely even use a laptop ... just get a macbook and experience prestine design in software and hardware getting an iPhone is subjective and optional but getting a mac isn't.

Abhishek Bhatnagar@abhishek

This MacBook, bought 10 years ago, has been working like new even today. 👀 Tell me which Windows laptop can last this long, and I will wait. 🫠

English

Stephen Fernandes@stephennfern·2d

Its officially an epidemic right now. Every single day, another tech-illiterate grifter aka wanna be Steve Jobs starts yapping about their billion-dollar "unicorn" idea like they've just cracked the matrix. Like bro first just learn to center a div

English

Stephen Fernandes@stephennfern·2d

The VC-funded subsidiary's $20/month plans are gradually coming to an end. You can barely make ends meet on the $20 Claude Code/Cursor subscription anymore. These multi-trillion parameter models were never economical to run at $20/month. You were just subsidized by VC money.

English

Stephen Fernandes@stephennfern·2d

@asmah2107 isn't RadixAttention in SgLang more superior in terms of inference performance than PagedAttention in vLLM ?

English

195

Ashutosh Maheshwari@asmah2107·2d

Best KV cache out there to Speed up inference ? Anything better than vLLM (PagedAttention) ?

English

13.2K

Stephen Fernandes@stephennfern·6d

@etash_guha Congrats Etash ✨️✨️✨️

English

Etash Guha@etash_guha·6d

Career Update: I’m joining Anthropic on the pretraining team! Excited to learn from all the brilliant and creative people there. Let’s go train some models!

English

736

34.4K

Stephen Fernandes@stephennfern·6d

with the release of TurboQuant google just gave everyone a free performance upgrade on all of their existing hardware. and with this all of a sudden GPUs became marginally more economical to run inference on

English

Stephen Fernandes@stephennfern·25 Mar

for around $1200 a year you could argubly run a few mini inference projects

vast.ai@vast_ai

RTX 4090 from $0.14/hr. Spot instances for batch inference, evals, fine-tuning. Pay per second.

English

Stephen Fernandes@stephennfern·25 Mar

If this even hypothetically turns out to be true, we could: - one shot prompt and cure cancer - one shot prompt to rewrite every software on earth in direct machine code

JJ@JosephJacks_

Within 1 year, I think Google will train a huge model on 2 million TPU v7 Ironwood chips running for 6 months, producing close to 10 ZettaFLOPS at peak and 384 Petabytes of HBM. 7.6 × 10²⁸ total FLOPS — 3,800× GPT-4’s training compute. The net result would be a 54-Trillion parameter MoE with 5 Trillion active per token, 32-million token context window, trained on 500 Trillion multimodal tokens with 55% of all compute going to RL reasoning training.

English

266

Stephen Fernandes@stephennfern·24 Mar

@brahma_4u @YouTubeIndia @YouTube YouTube premium is the only way ! i pay youtube HAFTA just so that i dont see his face

English

Shubham Mishra@brahma_4u·23 Mar

Dear @YouTubeIndia @YouTube Kindly do something about this guy, he ruins every video.

English

535

223

583.8K

Stephen Fernandes@stephennfern·24 Mar

non-tech founders right now

Jackson@zeroxjackson

I never learned to code. First it was too hard, then it was too easy.

English

Stephen Fernandes@stephennfern·24 Mar

lets assume we theoretically scaled VJPEA architecture to 1 trillion parameters and fed it with the whole youtube corpus what possibly could JEPA understand about the world? would it understand concepts like gravity, wind, texture, viscosity etc .. all coalescence together ?

English

Stephen Fernandes@stephennfern·23 Mar

PEAK FLEX !!! 🫣🤯

Keşfet

@Sid_Arora_18 @LTIatCMU @Meta @shinjiw_at_cmu @ylecun @julianboolean_ @maharshii @0xSero