Stephen Fernandes

1.4K posts

Stephen Fernandes banner
Stephen Fernandes

Stephen Fernandes

@stephennfern

building LLMs for Indian Languages

Goa India Katılım Nisan 2013
783 Takip Edilen62 Takipçiler
Siddhant Arora
Siddhant Arora@Sid_Arora_18·
Finished my PhD at @LTIatCMU . Excited to start at @Meta as Research Scientist, where I’ll be working on speech processing and conversational AI for wearables like smart glasses. Grateful to my advisor @shinjiw_at_cmu, collaborators, friends, and family for all the support.
Siddhant Arora tweet mediaSiddhant Arora tweet media
English
13
5
161
8.1K
Julian
Julian@julianboolean_·
It's interesting to think about how LeCun got this so wrong In a sense, he was perfectly correct. LLMs almost always get answers "wrong" - if by "wrong" you mean that somewhere in the reasoning trace there was a misstep But we don't care about the reasoning trace and its numerous misfires; we only care about the final answer. "So "the probability that any produced token takes us outside the set of correct answers" is meaningless - we can't define correctness until the last token. There is no exponential divergence.
Julian tweet media
English
58
16
303
91.6K
maharshi
maharshi@maharshii·
what being a nerd does to you
maharshi tweet media
English
15
0
250
8.4K
Stephen Fernandes
Stephen Fernandes@stephennfern·
the polarization in mindset that AI brings
Stephen Fernandes tweet media
English
0
0
1
12
0xSero
0xSero@0xSero·
Best models to run on your hardware level I'll be doing this every week, I hope you guys enjoy. ---- 8 GB ---- Autocomplete for coding (like Cursor Tab) - huggingface.co/NexVeridian/ze… - huggingface.co/bartowski/zed-… Tool calling, assistant style - huggingface.co/nvidia/NVIDIA-… ---- 16 Gb ---- Here things get better: Multimodal - huggingface.co/Qwen/Qwen3.5-9B - huggingface.co/Tesslate/OmniC… - huggingface.co/unsloth/Qwen3.… ---- 24 GB ---- - The best model you can get (thanks Qwen) huggingface.co/Qwen/Qwen3.5-2… - Great model (strong agents) huggingface.co/nvidia/Nemotro… - Mine hehe huggingface.co/0xSero/Qwen-3.… I'm doing a weekly series
English
219
376
3.7K
565.8K
Stephen Fernandes retweetledi
Tom Turney
Tom Turney@no_stp_on_snek·
the original TurboQuant paper tested on A100 with models up to 8B. 6 days later, a bunch of strangers on the internet had it built and running on: - Apple Silicon M1 through M5 - NVIDIA 3080 Ti through DGX Spark Blackwell - AMD RX 6800 XT and 9070 - a 10-year-old Tesla P40 - an 8GB MacBook Air - models from 3.8B to 70B across 6 architecture families - 30+ independent testers along the way we found new optimizations the paper didn't cover and failure modes it didn't test. the fact that a loose group of people across the world can read a paper, build implementations from scratch, stress-test across hardware none of us could individually afford, and push the research further in under a week is genuinely one of the best things about this era. the tools and the community make it possible. open source is something else.
Tom Turney tweet media
English
51
483
4.9K
138.9K
Jędrzej Maczan
Jędrzej Maczan@jedmaczan·
I built a tiny-vllm in C++ and CUDA - paged attention - continuous batching - educational - 100% human-written™ And now I writing a course where you will build your own vLLM yourself. Still work in progress, I'll finish by the end of April. All for free ofc, just a GitHub repo
English
15
30
593
17.9K
Stephen Fernandes
Stephen Fernandes@stephennfern·
Its officially an epidemic right now. Every single day, another tech-illiterate grifter aka wanna be Steve Jobs starts yapping about their billion-dollar "unicorn" idea like they've just cracked the matrix. Like bro first just learn to center a div
Stephen Fernandes tweet media
English
0
0
1
15
Stephen Fernandes
Stephen Fernandes@stephennfern·
The VC-funded subsidiary's $20/month plans are gradually coming to an end. You can barely make ends meet on the $20 Claude Code/Cursor subscription anymore. These multi-trillion parameter models were never economical to run at $20/month. You were just subsidized by VC money.
English
0
0
1
28
Stephen Fernandes
Stephen Fernandes@stephennfern·
@asmah2107 isn't RadixAttention in SgLang more superior in terms of inference performance than PagedAttention in vLLM ?
English
0
0
0
195
Ashutosh Maheshwari
Ashutosh Maheshwari@asmah2107·
Best KV cache out there to Speed up inference ? Anything better than vLLM (PagedAttention) ?
English
14
3
96
13.2K
Etash Guha
Etash Guha@etash_guha·
Career Update: I’m joining Anthropic on the pretraining team! Excited to learn from all the brilliant and creative people there. Let’s go train some models!
Etash Guha tweet media
English
69
7
736
34.4K
Stephen Fernandes
Stephen Fernandes@stephennfern·
with the release of TurboQuant google just gave everyone a free performance upgrade on all of their existing hardware. and with this all of a sudden GPUs became marginally more economical to run inference on
English
0
0
0
18
Stephen Fernandes
Stephen Fernandes@stephennfern·
lets assume we theoretically scaled VJPEA architecture to 1 trillion parameters and fed it with the whole youtube corpus what possibly could JEPA understand about the world? would it understand concepts like gravity, wind, texture, viscosity etc .. all coalescence together ?
English
0
0
0
21