Prem Viswanathan

423 posts

Prem Viswanathan

Prem Viswanathan

@prempv

Building @swift_cx. Adjunct @ CMU. Prev @aws

Pittsburgh, PA Katılım Ağustos 2009
2.1K Takip Edilen598 Takipçiler
Nikunj Kothari
Nikunj Kothari@nikunj·
TIL - you can spawn subagents for skills in Claude Code. What.. I feel so stupid now. This would have saved me SO much time. Every day, you learn something new.
English
22
14
414
50.4K
Prem Viswanathan
Prem Viswanathan@prempv·
Claude Opus 4.5 is definitely having some hiccups. Similar quality issues with @WisprFlow this evening. AI quality degradation is the new "stackoverflow / aws is down" moment
English
0
0
0
185
Prem Viswanathan retweetledi
Graham Neubig
Graham Neubig@gneubig·
We're hiring! We have positions open for Members of the Technical Staff for Agent R&D and many other positions. Think of the best researcher or engineer you know, don't you want them building in the open? Listings below! allhandsai.applytojob.com/apply/
English
0
3
12
1.8K
Prem Viswanathan
Prem Viswanathan@prempv·
Compression's always relative: today's model capacities were sci-fi 10 yrs ago. For coding, we thought we would need 100M context. But smaller context with many parallel exploration and convergence is winning. I expect the same here to solve the end goal - how does one go about reading McKinsey slides.
English
0
0
0
17
Dileep George
Dileep George@dileeplearning·
I love Andrej...but this makes no sense to me. I don't see how converting text to image ('pixels') makes it any better for language modeling. What am I missing?
Andrej Karpathy@karpathy

I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input. Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in: - more information compression (see paper) => shorter context windows, more efficiency - significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images. - input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful. - delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go. OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa. So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to. Now I have to also fight the urge to side quest an image-input-only version of nanochat...

English
337
36
992
395.2K
Prem Viswanathan
Prem Viswanathan@prempv·
@_cartick @dileeplearning We humans read text visually isn't it. With sufficient resolution surely this isn't a problem. Vision and audio as the two universal input modalities makes a ton of sense IMO.
English
1
0
1
40
Karthik Ramasamy
Karthik Ramasamy@_cartick·
@dileeplearning Major issue with visual approach is that with how image gets tokenized, it is harder to differentiate 1,000 vs 1.000 or 10.00. Think of use cases involving finance and medicine that would be very brutal.
English
1
0
1
132
Prem Viswanathan
Prem Viswanathan@prempv·
@arafatkatze @andrew_melby @cline @AmpCode @Cursor When you refer to RAG, you are essentially talking about pure vector search, which is indeed quite problematic. But having it as an optional tool should be fine. Your concern about the overhead of vector search relative to the lift it offers is a very valid and quite Underrated.
English
1
0
0
40
Ara
Ara@arafatkatze·
Ara@arafatkatze

>an eval and benchmark would be great - that’s a great thing to publish! (and would bolster your alls position) I might not have the bandwidth at the moment but we can share this internally. Although if you really wanna prove your case you can try tweaking Aider github.com/Aider-AI/aider/ It has all standard benchmarks and based on my understanding they use Treesitter and other techniques and not rag chunks. You can keep everything else exactly the same where the search retrieval tactic is the only thing tweaked. Personally I do not trust the SWE benchmark and most benchmarks as they have been gamed but if you specifically A/B test between the search mechanism that will be very insightful.

English
1
0
1
93
Ara
Ara@arafatkatze·
In building AI agents @cline , we've identified three mind viruses Mind Viruses are seductive ideas that sound smart, but don’t work in practice. 1. Multi-Agent Orchestration 2. RAG (Retrieval Augmented Generation) 3. More Instructions = Better Results Let's explore why!
Ara tweet media
English
96
163
2.3K
511.8K
Graham Neubig
Graham Neubig@gneubig·
Because it's so easy to write code now, I also think of new ways to do things with code. For instance, I'm creating slides using reveal.js (revealjs.com), sending emails with resend.com, and writing music with strudel (strudel.cc).
English
2
1
10
2K
Graham Neubig
Graham Neubig@gneubig·
I'm preparing for a talk on agents and the future of work, so I decided to check the effect of agents on my own work. The attached chart is the number of pull requests I made by month w/ and w/o code by OpenHands agents. A few observations 🧵
Graham Neubig tweet media
English
4
11
159
12.5K
Delip Rao e/σ
Delip Rao e/σ@deliprao·
Anthropic or Anthropic-sponsored safety papers
Delip Rao e/σ tweet media
English
44
196
2.4K
145.7K