Ishan Anand

841 posts

Ishan Anand banner
Ishan Anand

Ishan Anand

@ananis25

reading twetes

Mumbai, India Katılım Haziran 2013
478 Takip Edilen66 Takipçiler
Ishan Anand
Ishan Anand@ananis25·
Useful discussion. A simpler maxim could be - if code (no matter where on the typing -> vibe-code axis) is helping hone your mental model of the problem space. That alone determines if it is net gain or future tech debt.
David Cramer@zeeg

im fully convinced that LLMs are not an actual net productivity boost (today) they remove the barrier to get started, but they create increasingly complex software which does not appear to be maintainable so far, in my situations, they appear to slow down long term velocity

English
0
0
0
32
Ishan Anand
Ishan Anand@ananis25·
@johngfriedman To me, embeddings feel like fuzzy query expansion at ingestion time, synonyms get grouped together in the vector index. Now that LLM agents can do query expansion at test-time, and only if necessary, it does dilute the necessity of embeddings. But definitely useful at scale!
English
0
0
1
13
John Friedman
John Friedman@johngfriedman·
But my general feeling is that RAG is a useful tool, oversold by hype people who used it for dumb stuff, as is typical, and now people are overcorrecting negatively partially due to original annoying hype push, with some irony as the hype people have driven both cycles.
English
1
0
1
77
John Friedman
John Friedman@johngfriedman·
I agree. I am fairly close to the financial agents space, and people were doing silly RAG pipelines that were better off as greps. Those people are now saying 'RAG is dead'.
ry@rywalker

@ctjlewis rag has its place but the amount of companies that spent 6 months building vector pipelines...

English
2
0
5
429
Ishan Anand
Ishan Anand@ananis25·
@johngfriedman Yeah, this seems like the standard tradeoff. You gotta batch somewhere when writing to TB sized datasets incrementally. Do you have to support relational queries (filter/aggregations) on top of the data too? Else it feels pretty solid.
English
1
0
1
15
John Friedman
John Friedman@johngfriedman·
This does risk missing data if instance gets knocked out for some reason. For my use cases, not a big issue.
English
1
0
0
70
John Friedman
John Friedman@johngfriedman·
Finally at projects where R2 latency is an issue, and so S3, and S3 express looks really attractive.
English
1
0
2
174
Ishan Anand
Ishan Anand@ananis25·
This is neat. Though feel like the gains mostly come from expending more tokens. For ex - you could get claude code to summarize all your successful sessions (rollouts) into concise feature docs. And let it reuse it for future changes.
Shangyin Tan@ShangyinT

GEPA for skills is here! Introducing gskill, an automated pipeline to learn agent skills with @gepa_ai. With learned skills, we boost Claude Code’s repository task resolution rate to near-perfect levels, while making it 47% faster. Here's how we did it:

English
0
0
0
75
Ishan Anand
Ishan Anand@ananis25·
@jsuarez It is a little harsh - like anything else, if you distance yourself from the source, you cede having a good mental model for it. If used to strengthen your understanding (plan/ask-questions/review), they are very useful.
English
0
0
2
321
Joseph Suarez 🐡
Joseph Suarez 🐡@jsuarez·
I am officially sick and tired of the LLM brogrammer con. Models are most useful when the codebase you are dealing with is complete and utter crap. You were too lazy to write something human readable and are now proudly paying AI companies to deal with your stupidity.
Joseph Suarez 🐡 tweet media
English
16
2
161
8.5K
Ishan Anand
Ishan Anand@ananis25·
@johngfriedman Great! Worth playing with some knobs in their indexing config. I imagine using s3 express to host the index and a cheap instance would be snappy enough for a TB of data.
English
1
0
1
29
John Friedman
John Friedman@johngfriedman·
@ananis25 Interesting! 800 MB index is not big. Will look into it. Thanks!
English
1
0
1
24
John Friedman
John Friedman@johngfriedman·
I'm trying to implement some sort of text search API on ~1tb of csv data with columns: date, id, filename, text, ... I'm used to throwing text data into AWS RDS, and using defaults, but for this size, it's ballpark $500/month. Very interested in optimization strategies. Currently looking into preprocessing the data to make it a lot smaller.
English
1
0
2
1.4K
Ishan Anand
Ishan Anand@ananis25·
@johngfriedman Quickwit was neat - 800MB index and pretty quick ingestion. I _think_ they support offloading the index to S3 too, but havent tried it. (They also wrote the Tantivy library which everyone uses for text search, including lance. So, likely feature complete)
English
1
0
1
50
Ishan Anand
Ishan Anand@ananis25·
@johngfriedman I tried out a bunch of them with ~10GB of sec data today, and quickwit the most straightforward. - Lance is nice, but took twice the space compared to parquet. - Duckdb stalled while indexing. Since it is a single instruction, no visibility on how far it got
English
1
0
1
46
Ishan Anand
Ishan Anand@ananis25·
@johngfriedman Looking through LLM datasets - fetched slices of the fineweb-edu dataset and wanted to look through/cluster it. Also, some RAG prototyping at work when it seemed like vector search the thing, before query expansion with LLMs took over.
English
0
0
0
12
Ishan Anand
Ishan Anand@ananis25·
@johngfriedman Curious - do you need the data to be csv? If it is flat data (even nested is okay), putting it into columnar formats like lancedb/duckdb and using their text search extensions would do I think.
English
1
0
1
51
John Friedman
John Friedman@johngfriedman·
The data is the SEC html corpus (9tb) converted to text, while preserving hierarchy (sections, etc). A lot of my users have requested some sort of full text search API, but I haven't implemented it yet, due to cost.
English
1
0
0
147
Ishan Anand retweetledi
neural oscillator of uncertain significance
you have all lost sight of the way. the purpose of automation was to allow us to focus on the things that actually matter in life: the implementation details.
English
3
51
385
20.1K
Ishan Anand
Ishan Anand@ananis25·
@CFGeek - let it write the control flow in code, delegate to a regular interpreter, and only retrieve the output Latter saves a lot of tokens, and just makes sense. The RLM work also talks about these tool calls being LLM calls themselves, which is neat!
English
0
0
1
43
Ishan Anand
Ishan Anand@ananis25·
@CFGeek Not a lot imo since most agent harnesses were moving to something similar (code-mode, programmatic-tool-calling). The crux really is - if the LLM agent needs to perform a composition of tool calls, do you - let LLM manage the control flow, append tool outputs to context, vs
English
1
0
2
446
Charles Foster
Charles Foster@CFGeek·
I’ll admit: I haven’t been paying much attention to the recursive language models (RLM) discourse. Am I missing out big-time?
alex zhang@a1zhang

Fundamentally, what really is the difference between an RLM and S={context folding, Codex, Claude Code, Terminus, agents, etc.}? This is the last and most important RLM post I'll make for a while to finally answer all the "this is trivially obvious" from HackerNews, Reddit, X, etc. I know there's a lot of noise rn, but this is the one thread I'd rly ask you not to skip! For a while I didn't have a super clear answer to this. and no, it's NOT that: 1. CC sub-agents are user-defined while the LM defines the sub-agent in RLMs. this is a minor difference that I suspect Anthropic will phase out at some point 2. Coding scaffolds use a file system while the original paper uses a Python REPL. In fact, FS is a REPL. 3. RLMs offload context into a variable. CC / Codex implicitly do this by saving to files, and yes, I know that people have been doing this for time But I think after some long convos with @lateinteraction @zli11010 and @ChenSun92, I can articulate a lot better that all of these things are important but are missing what actually matters. RLMs enable **symbolic recursion** -- this means the RLM can spawn recursive calls embedded in symbolic logic. In simple terms, the recursive LM call lives inside the REPL. While for CC / Codex, the sub-agent call is spawned directly as a tool by the main model. This is a subtle difference but extremely significant. Consider the following example. Say I want my agent to ingest 1M files and find a function I'm looking for. 1. The CC model will hopefully sequentially launch 1M JSON-like sub-agent tool calls per file in its main context, get the answer for each (maybe save to a file), and return. 2. The RLM will write a for-loop / parallel map over each sub-agent call to open each file, save to a var / file, and grab the answer. The problem here is that we rely on Opus 4.5 to perform a programmatic action (launch 1M sub-agent calls) without the guarantee that it'll actually do it. Now maybe it's good enough to do this, but consider a nastier task, where we want to launch sub-calls only for files satisfying some weird property P (this is quite common for say looking at databases or complicated monorepos). In fact, all tool calls are launched this way, which is fundamentally limiting (we already write code to programmatically perform operations (e.g. search if XYZ)! The tldr; here is that the REPL and sub-calling tool being *separate* is not a good thing. It's such a subtle / simple point but it lends itself extremely well to more robust / programmatic model reasoning through training. Beyond the fact that CC / Codex are trained specifically for coding tasks, this minor difference leads to a whole class of new solutions that RLMs can solve. (Thanks to @zli11010 for this point) From a PL perspective, the way Codex / CC handle sub-agents is almost *silly*. If we think of the REPL that the RLM uses as a "language" or sorts, sub-calling should be a feature of this language. It shouldn't be separate, and is strictly less expressive than the RLM design. Hope this clears things up, happy to answer more questions but I plan on updating the paper to articulate this better and make things clearer :)

English
18
3
80
20.6K
Ishan Anand
Ishan Anand@ananis25·
This is pretty cool, but imo a better benchmark for swe agents would be - if they can maintain a fork of a giant project up to date. Like if Cursor's fork could be kept up to date with Vscode upstream, w/o any eng input. That's show em!
Cursor@cursor_ai

We've been working on very long-running coding agents. In a recent week-long run, our system peaked at over 1,000 commits per hour across hundreds of agents. We're sharing our findings and an early research preview inside Cursor.

English
0
0
0
35
Ishan Anand
Ishan Anand@ananis25·
@matsonj MDS with an agent, in a (sand)box! Would you say, the grammar of graphics library are almost that?
English
0
0
1
88
Ishan Anand
Ishan Anand@ananis25·
@joodalooped @threepointone Hah, thin difference imo! If I only write a getOne/getAll/editOne and get back a sql-like "database" in memory, I'd say they are handling sync for me. Super curious how well it works out with non relational databases!
English
0
0
0
37
judah
judah@joodalooped·
@ananis25 @threepointone that's what i mean, they wrap objects that are being synced in a convenient/efficient query model not doing the syncing themselves
English
1
0
1
52
Ishan Anand
Ishan Anand@ananis25·
@joodalooped @threepointone I think they work with arbitrary data sources too, as long as they are wrapped in a certain way. So, effectively they handle the syncing.
English
1
0
1
40
judah
judah@joodalooped·
@threepointone if i'm understanding TaDb correctly, it's a view/subscription layer over synced data?
English
2
0
0
170
Ishan Anand
Ishan Anand@ananis25·
@suchenzang Could it be a good trade-off? Breaking the task into more steps/tool calls afford more opportunity for grounding - so more reliability. albeit not directly reflected in the benchmarks.
English
0
0
1
143