Arihan
54 posts

Arihan
@arihanxv
19, CS @Stanford, founder @watolabs
🇺🇸 Katılım Haziran 2023
213 Takip Edilen174 Takipçiler

@YuvrajSShergill These are becoming increasingly automated through AI SREs and overall automations like @antimetal
English

@arihanxv Responding to the issues during scale up and product breakdown maybe, ending up with technical debt
English

Working with Composer 2.5 is like working with a senior engineer who types at 11,000+ words per minute. @cursor_ai now owns the model and the distribution. I'm bullish
English

@CustomAIMath We keep seeing that humans are being pulled out of both via many prompt-to-company apps. Where we instead provide a generic idea and it’s up to the model to decide how to frame, build, distribute, and maintain the product.
English

@arihanxv all these post fails to mention 1 key factor .. did the ai think about the code first or it was given the structure by the human .. #theChickenOrTheEgg
English

@NorthSecureAI True, unfortunately today, many are optimizing for speed to ship under the assumption that no one will ever read the code. That’s why we see some much slop and over complexity
English

@arihanxv Choosing tradeoffs, owning outcomes, and explaining why the system should exist in the first place. Also cleaning up after three confident AIs agree on the same bad idea.
English

Companies are now choosing between keeping humans or agents. This is modern day offshoring.
unusual_whales@unusual_whales
Artificial intelligence is causing a net U.S. loss of 16,000 jobs per month, per Goldman Sachs.
English

codex vs claude code is the most exciting race in technology rn (and has been since the take-off late last year)
Theo - t3.gg@theo
Anthropic has to add a bunch of features because their public models haven’t improved since December Codex has gotten 10x better since and it’s largely the models. Also - Codex is far ahead on a lot of “end to end” stuff. Computer use, goals, remote control, all the things that actually make me more productive. Claude Code has better features to tweet about. Codex is quietly becoming the best solution for actually writing software.
English

Grok foundation model V9-Medium (1.5T) has finished training. Evals look good. A lot of Cursor data was added in supplementary training and there is more to come.
Fine-tuning is underway and reinforcement learning begins in a few days. 2 to 3 weeks to public release.
This will be a major improvement over the 0.5T v8-small that currently serves all Grok production traffic, especially for difficult coding tasks.
English

Token usage billing has become unsustainable in most teams where adoption is growing while frontier models have had a steady increase in price the past years. Even companies like Cursor are managing costs by shifting more usage to their own frontier models while moving away from being the "Costco of tokens".
The best value today comes from using a heavily subsidized plan from a frontier lab like Chatgpt and Claude. But this inherently diminishes freedom of choice as you must either commit to one provider or pay for multiple subscriptions just to use other models.
At enterprise scale, we’ve seen that the lock-in gets even worse as companies are pushed into expensive Chatgpt or Claude Enterprise plans, with workflows, permissioning, and procurement all baked into one stack. However, the problem is that nobody knows what the best frontier model provider will be in a year, a month, or even a week
English


@zayvik12667 Models can easily produce lots of code but its difficult to thoroughly verify them outside of just throwing slop into production, especially if the code reviewer is also AI
English





