Ishan Anand

841 posts

Ishan Anand

@ananis25

reading twetes

Mumbai, India Katılım Haziran 2013

478 Takip Edilen66 Takipçiler

Ishan Anand@ananis25·17 Mar

Useful discussion. A simpler maxim could be - if code (no matter where on the typing -> vibe-code axis) is helping hone your mental model of the problem space. That alone determines if it is net gain or future tech debt.

David Cramer@zeeg

im fully convinced that LLMs are not an actual net productivity boost (today) they remove the barrier to get started, but they create increasingly complex software which does not appear to be maintainable so far, in my situations, they appear to slow down long term velocity

English

Ishan Anand@ananis25·24 Şub

@johngfriedman To me, embeddings feel like fuzzy query expansion at ingestion time, synonyms get grouped together in the vector index. Now that LLM agents can do query expansion at test-time, and only if necessary, it does dilute the necessity of embeddings. But definitely useful at scale!

English

John Friedman@johngfriedman·24 Şub

But my general feeling is that RAG is a useful tool, oversold by hype people who used it for dumb stuff, as is typical, and now people are overcorrecting negatively partially due to original annoying hype push, with some irony as the hype people have driven both cycles.

English

John Friedman@johngfriedman·24 Şub

I agree. I am fairly close to the financial agents space, and people were doing silly RAG pipelines that were better off as greps. Those people are now saying 'RAG is dead'.

ry@rywalker

@ctjlewis rag has its place but the amount of companies that spent 6 months building vector pipelines...

English

429

Ishan Anand@ananis25·23 Şub

@johngfriedman Yeah, this seems like the standard tradeoff. You gotta batch somewhere when writing to TB sized datasets incrementally. Do you have to support relational queries (filter/aggregations) on top of the data too? Else it feels pretty solid.

English

John Friedman@johngfriedman·23 Şub

This does risk missing data if instance gets knocked out for some reason. For my use cases, not a big issue.

English

John Friedman@johngfriedman·23 Şub

Finally at projects where R2 latency is an issue, and so S3, and S3 express looks really attractive.

English

174

Ishan Anand@ananis25·21 Şub

This is neat. Though feel like the gains mostly come from expending more tokens. For ex - you could get claude code to summarize all your successful sessions (rollouts) into concise feature docs. And let it reuse it for future changes.

Shangyin Tan@ShangyinT

GEPA for skills is here! Introducing gskill, an automated pipeline to learn agent skills with @gepa_ai. With learned skills, we boost Claude Code’s repository task resolution rate to near-perfect levels, while making it 47% faster. Here's how we did it:

English

Ishan Anand@ananis25·20 Şub

@jsuarez It is a little harsh - like anything else, if you distance yourself from the source, you cede having a good mental model for it. If used to strengthen your understanding (plan/ask-questions/review), they are very useful.

English

321

Joseph Suarez 🐡@jsuarez·20 Şub

I am officially sick and tired of the LLM brogrammer con. Models are most useful when the codebase you are dealing with is complete and utter crap. You were too lazy to write something human readable and are now proudly paying AI companies to deal with your stupidity.

English

161

8.5K

Ishan Anand@ananis25·19 Şub

@johngfriedman Great! Worth playing with some knobs in their indexing config. I imagine using s3 express to host the index and a cheap instance would be snappy enough for a TB of data.

English

John Friedman@johngfriedman·19 Şub

@ananis25 Interesting! 800 MB index is not big. Will look into it. Thanks!

English

John Friedman@johngfriedman·18 Şub

I'm trying to implement some sort of text search API on ~1tb of csv data with columns: date, id, filename, text, ... I'm used to throwing text data into AWS RDS, and using defaults, but for this size, it's ballpark $500/month. Very interested in optimization strategies. Currently looking into preprocessing the data to make it a lot smaller.

English

1.4K

Ishan Anand@ananis25·18 Şub

@johngfriedman Quickwit was neat - 800MB index and pretty quick ingestion. I _think_ they support offloading the index to S3 too, but havent tried it. (They also wrote the Tantivy library which everyone uses for text search, including lance. So, likely feature complete)

English

Ishan Anand@ananis25·18 Şub

@johngfriedman I tried out a bunch of them with ~10GB of sec data today, and quickwit the most straightforward. - Lance is nice, but took twice the space compared to parquet. - Duckdb stalled while indexing. Since it is a single instruction, no visibility on how far it got

English

Ishan Anand@ananis25·18 Şub

@johngfriedman Looking through LLM datasets - fetched slices of the fineweb-edu dataset and wanted to look through/cluster it. Also, some RAG prototyping at work when it seemed like vector search the thing, before query expansion with LLMs took over.

English

John Friedman@johngfriedman·18 Şub

@ananis25 What have you used lance for?

English

Ishan Anand@ananis25·18 Şub

@johngfriedman Curious - do you need the data to be csv? If it is flat data (even nested is okay), putting it into columnar formats like lancedb/duckdb and using their text search extensions would do I think.

English

John Friedman@johngfriedman·18 Şub

The data is the SEC html corpus (9tb) converted to text, while preserving hierarchy (sections, etc). A lot of my users have requested some sort of full text search API, but I haven't implemented it yet, due to cost.

English

147

Ishan Anand retweetledi

neural oscillator of uncertain significance@mycoliza·10 Şub

you have all lost sight of the way. the purpose of automation was to allow us to focus on the things that actually matter in life: the implementation details.

English

385

20.1K

Ishan Anand@ananis25·8 Şub

@CFGeek - let it write the control flow in code, delegate to a regular interpreter, and only retrieve the output Latter saves a lot of tokens, and just makes sense. The RLM work also talks about these tool calls being LLM calls themselves, which is neat!

English

Ishan Anand@ananis25·8 Şub

@CFGeek Not a lot imo since most agent harnesses were moving to something similar (code-mode, programmatic-tool-calling). The crux really is - if the LLM agent needs to perform a composition of tool calls, do you - let LLM manage the control flow, append tool outputs to context, vs

English

446

Charles Foster@CFGeek·8 Şub

I’ll admit: I haven’t been paying much attention to the recursive language models (RLM) discourse. Am I missing out big-time?

alex zhang@a1zhang

Fundamentally, what really is the difference between an RLM and S={context folding, Codex, Claude Code, Terminus, agents, etc.}? This is the last and most important RLM post I'll make for a while to finally answer all the "this is trivially obvious" from HackerNews, Reddit, X, etc. I know there's a lot of noise rn, but this is the one thread I'd rly ask you not to skip! For a while I didn't have a super clear answer to this. and no, it's NOT that: 1. CC sub-agents are user-defined while the LM defines the sub-agent in RLMs. this is a minor difference that I suspect Anthropic will phase out at some point 2. Coding scaffolds use a file system while the original paper uses a Python REPL. In fact, FS is a REPL. 3. RLMs offload context into a variable. CC / Codex implicitly do this by saving to files, and yes, I know that people have been doing this for time But I think after some long convos with @lateinteraction @zli11010 and @ChenSun92, I can articulate a lot better that all of these things are important but are missing what actually matters. RLMs enable **symbolic recursion** -- this means the RLM can spawn recursive calls embedded in symbolic logic. In simple terms, the recursive LM call lives inside the REPL. While for CC / Codex, the sub-agent call is spawned directly as a tool by the main model. This is a subtle difference but extremely significant. Consider the following example. Say I want my agent to ingest 1M files and find a function I'm looking for. 1. The CC model will hopefully sequentially launch 1M JSON-like sub-agent tool calls per file in its main context, get the answer for each (maybe save to a file), and return. 2. The RLM will write a for-loop / parallel map over each sub-agent call to open each file, save to a var / file, and grab the answer. The problem here is that we rely on Opus 4.5 to perform a programmatic action (launch 1M sub-agent calls) without the guarantee that it'll actually do it. Now maybe it's good enough to do this, but consider a nastier task, where we want to launch sub-calls only for files satisfying some weird property P (this is quite common for say looking at databases or complicated monorepos). In fact, all tool calls are launched this way, which is fundamentally limiting (we already write code to programmatically perform operations (e.g. search if XYZ)! The tldr; here is that the REPL and sub-calling tool being *separate* is not a good thing. It's such a subtle / simple point but it lends itself extremely well to more robust / programmatic model reasoning through training. Beyond the fact that CC / Codex are trained specifically for coding tasks, this minor difference leads to a whole class of new solutions that RLMs can solve. (Thanks to @zli11010 for this point) From a PL perspective, the way Codex / CC handle sub-agents is almost *silly*. If we think of the REPL that the RLM uses as a "language" or sorts, sub-calling should be a feature of this language. It shouldn't be separate, and is strictly less expressive than the RLM design. Hope this clears things up, happy to answer more questions but I plan on updating the paper to articulate this better and make things clearer :)

English

20.6K

Ishan Anand@ananis25·7 Şub

This is pretty cool, but imo a better benchmark for swe agents would be - if they can maintain a fork of a giant project up to date. Like if Cursor's fork could be kept up to date with Vscode upstream, w/o any eng input. That's show em!

Cursor@cursor_ai

We've been working on very long-running coding agents. In a recent week-long run, our system peaked at over 1,000 commits per hour across hundreds of agents. We're sharing our findings and an early research preview inside Cursor.

English

Ishan Anand@ananis25·3 Şub

@matsonj MDS with an agent, in a (sand)box! Would you say, the grammar of graphics library are almost that?

English

Jacob Matson@matsonj·3 Şub

This is exactly why I've been working on a charting skill built for AI first.

Ashpreet Bedi@ashpreetbedi

I'm fairly confident we're at the cusp of a new architecture for agents. Going from stateless tools in a loop to machines that learn and improve. Every Agent 1.0 will evolve into this pattern. Dash not only solves a clear pain point, it does so with an architecture that enables the agent to learn from its mistakes, layer in context as needed, and get smarter with use. Github repo if you want to check it out: github.com/agno-agi/dash

English

15.9K

Ishan Anand@ananis25·15 Eki

@joodalooped @threepointone Hah, thin difference imo! If I only write a getOne/getAll/editOne and get back a sql-like "database" in memory, I'd say they are handling sync for me. Super curious how well it works out with non relational databases!

English

judah@joodalooped·15 Eki

@ananis25 @threepointone that's what i mean, they wrap objects that are being synced in a convenient/efficient query model not doing the syncing themselves

English

sunil pai@threepointone·15 Eki

I've been playing with tanstack db and I smell a winner

Kyle Mathews@kylemathews

We just posted an ambitious new RFC for TanStack DB! "On-Demand Collection Loading via loadSubset" github.com/TanStack/db/di…

English

5.8K

Ishan Anand@ananis25·15 Eki

@joodalooped @threepointone I think they work with arbitrary data sources too, as long as they are wrapped in a certain way. So, effectively they handle the syncing.

English

judah@joodalooped·15 Eki

@threepointone if i'm understanding TaDb correctly, it's a view/subscription layer over synced data?

English

170

Ishan Anand@ananis25·30 Eyl

@suchenzang Could it be a good trade-off? Breaking the task into more steps/tool calls afford more opportunity for grounding - so more reliability. albeit not directly reflected in the benchmarks.

English

143

Susan Zhang@suchenzang·30 Eyl

👀👀

Kilian Lieret@KLieret

Sonnet 4.5 takes significantly more steps to solve instances than Sonnet 4, making it more expensive to run in practice

ART

10K

Ishan Anand@ananis25·7 Eyl

This sentiment always surprises me - google ships its org chart sure, but not integrating their offerings everywhere, is to avoid antitrust lawsuits, more likely.

roon@tszzl

it is pretty telling that when you ride in a waymo you can’t give instructions to gemini to play a song or change destination or drive differently. when one of the great gilded tech monopolies of the world does not yet have a cohesive ai picture, what hope has the broader economy

English

114

Keşfet

@johngfriedman @jsuarez @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA