Kayvane Shakerifar
156 posts


LLMs are really good at writing code, so why are we giving them 100 different tools instead of just giving them code execution?
This idea came up in a conversation, and it just made sense and felt like it was right in front. It feels like a much cleaner way to structure things. Instead of turning the context window into a dumping ground of raw outputs, you let the model write code, process the data, and return only what actually matters.
You are not just making things cleaner, you are likely saving a lot of tokens as well. The model only sees the results it needs instead of parsing through noise.
This becomes even more obvious with things like web search or scraping. HTML is mostly garbage, and pushing all of it into the context is just inefficient. Filtering it through code first makes far more sense.
I haven’t tested this deeply yet, but it’s interesting to see Anthropic leaning into a similar direction. Feels like a strong validation of the idea.
Intuitively, this should improve latency, cost, and accuracy by turning the LLM into more of a controller than a processor.

English

@jxnlco - I’ve seen you asking around for feedback on codex. I’ve been using it alongside CC for about 6 weeks now and it’s now my preferred tool of the 2. Here is some feedback
English

@samuelcolvin @mitsuhiko I think the excellent name choice is going over everyone’s head
English

Fuck it, a bit early but here goes:
Monty: a new python implementation, from scratch, in rust, for LLMs to run code without host access.
Startup time measured in single digit microseconds, not seconds.
@mitsuhiko here's another sandbox/not-sandbox to be snarky about 😜
Thanks @threepointone @dsp_ (inadvertently) for the idea.
github.com/pydantic/monty
English

@chrisalbon The codex app has a really nice diff section where you can comment on the files themselves in the app and push those comments back to codex. It’s feels similar to the IDE experience but focused. Switching to the IDE from the app is an integrated 1-click. I’m a big fan
English

@mervenoyann @pcuenq I’ve wanted this for so long, built one myself but will check it out 🙌🏼
English

we just shipped daggr, a new library to build complex AI workflows 🤗
it's a breeze to code and debug apps, and visualize the workflow itself 🙌🏻
try it out and let us know what you think!
Hugging Face@huggingface
Introducing daggr: a new way of building apps 🔥 daggr combines best of all worlds, mix-and-match model endpoints, Gradio apps, functions programmatically, inspect the pipeline visually 🙌🏻 Try it out, build and share to get featured!
English

@vamsibatchuk @GoogleAIStudio This is so so cool, well done 💪🏼🙌🏼
English

Ever wondered where words come from? 🗺️ built this app on @GoogleAIStudio called 'Wanderword' to map the evolution of language through time and space.
It uses Gemini to trace linguistic roots and D3.js to animate the geographic migration of words through history.
…derword-141284551734.us-west1.run.app
English

@jxnlco Waiting for ruff to release custom rules feature so Incan do this in python - until then looking at semgrep custom ast rules
English

ai coding - you could be writing more lint rules
One of my big takeaways from working with Vignesh is that, while oftentimes I will add style preferences to the agent files, Vignesh, on the other hand, will actually have the AI write a new ESLint rule and just turn on pre-commit hooks. I'm curious if folks are doing the same.
What do you do?
English

@deanimatedmonk @Vignesh_ey @rive_app Companion iOS app that turns your plants into tamagochis, they can prompt you to feed them!
English

@Vignesh_ey @rive_app I actually thought about OLEDs early on, but they’re very limited compared to what I can do with Rive on the web. You’d need a different, much lighter visual language for a pot display. Still, the idea of a “pet plant” feels promising, worth exploring.
English

Made this plant persona (I call him Tiny) using @rive_app 's data binding, GPT and some hardware (ESP32 and a few sensors like touch, mositure, hum, light, temp).
Most of the logic is on my client side rn. But will try and see how far I can take the logic part with scripting (They keep on building so fast!)
English

@adamdotdev The usage limits on codex $200 plan are waaay more generous than claude code
English





