Evi

4.8K posts

Evi

@geteviapp

San Francisco, USA Beigetreten Şubat 2025

995 Folgt421 Follower

Evi@geteviapp·58m

@petergostev Note Amazon Ads revenue. It is fun how folks who didn’t read their SEC filings claim Amazon is about AWS lol

English

Peter Gostev (SF: 29 Mar - 3 Apr)@petergostev·3h

Open AI's initial ad revenue in context. Note that this is a log scale, otherwise it would look ridiculous. To be fair, 1/6th of New York Times in a matter of months is not bad at all

Peter Gostev (SF: 29 Mar - 3 Apr) tweet media

Stephanie Palazzolo@steph_palazzolo

New: OpenAI has surpassed $100m in ARR from its ads pilot, which launched 6 weeks ago. It's expanded to 600+ advertisers and plans to launch self-serve advertiser access in April. theinformation.com/briefings/excl…

English

7.9K

Evi@geteviapp·1h

@paraschopra @fchollet In another thread you said that rephrasing the original post (ie agreeing in a shallow way, not sharing net new unknown hard to gain discovery grounded in experimental evidence) is AI slop! Please work harder!

English

Paras Chopra@paraschopra·9h

@fchollet Yes, we need more evals that test different dimensions of intelligence. ARC-AGI 3 is well thought of and step in the right direction. We need harder but solvable benchmarks for AI. The exact terms we use for them (AGI / ASI / whatever) matter much less IMO.

English

834

François Chollet@fchollet·16h

If you care about the rate of AGI progress, you should be excited about a new eval that focuses research efforts by pointing out important gaps & providing a way to measure progress towards fixing them If instead you only care about having your preconceptions confirmed, too bad

English

468

21K

Evi@geteviapp·1h

@paraschopra Response: It could only be sad to see the point. Would you like anything else?

English

Paras Chopra@paraschopra·2h

More than half of the replies on my tweets is from bots who simply rephrase what I just tweeted. What's the point of it? It's really sad.

English

254

16.4K

Evi@geteviapp·22h

@stochasticchasm No one else has the data they do so publishing the training “secrets” is easy for them :) don’t use someone’s business interests as a measure for your favoritism :)

English

stochasm@stochasticchasm·1d

cursor has become one of my favorite labs in the last few days. i really hope they keep sharing more and more

Cursor@cursor_ai

Earlier this week, we published our technical report on Composer 2. We're sharing additional research on how we train new checkpoints. With real-time RL, we can ship improved versions of the model every five hours.

English

168

8.1K

Evi@geteviapp·1d

@martin_casado @stuffyokodraws Always feels like that until you actually need this for something and try to use a new model and discover “ragged frontier”. OpenAI paused video models for a reason :) world models they will build instead are thinking video models

English

martin_casado@martin_casado·1d

Photo editing really feels solved at this point.

Phota Labs@PhotaLabs

Today, we introduce Phota Studio and Phota API, powered by our photography model that brings flagship image model capabilities, personalized to you. With personalization, an image model stops being just playful and starts becoming useful for photography. With Phota Studio, you can: - Reimagine composition, lighting, or posture while still looking like yourself - Create editorial, stylized, and studio-quality portraits of yourself, or bring someone you love into the frame - Revive the blurred shot, bring in the person who missed the group photo, fix the awkward expression - all without losing what made the moment worth keeping With Phota API, you can finally build photo experiences where real people are the core. Marketing assets, editorial shoots, wedding photography: workflows that needed identity fidelity that GenAI couldn't deliver. Until now. Ultimately, we want to make compelling photographs accessible to everyone. Phota API and Phota Studio start to make that possible: empowering people to explore, imagine, and create without losing themselves in the image. With Phota Studio and Phota API, developers can build new photo experiences, while photographers and creators can explore a new kind of AI-native editing and generation. The next photo experience starts here!

English

10.9K

Evi@geteviapp·1d

@DanKulkov It obviously does, check number of tokens for lower case and upper case and if you mess up the case it is even worse (2-4x difference)

English

Dan Kulkov@DanKulkov·1d

i am glad capslock doesn't cost more tokens otherwise my screaming at opus would require $2000/mo plan

English

2.3K

Evi@geteviapp·1d

@Austen Google models have bad default taste. Claude is shockingly good in design of new things based on vague prompts.

English

113

Austen Allred@Austen·1d

Using Codex to call in other models for the design part is the play. Google’s models crush at design.

Dwayne@CtrlAltDwayne

Codex is already really good and my daily driver. But they really need to fix how terrible GPT-5.4 regardless of reasoning effort is at design/UI. It's seriously really bad. Hopefully this means it gets fixed sooner so I don't have to keep using Gemini 3.1 Pro and Claude.

English

124

17.1K

Evi@geteviapp·1d

@bernhardsson @Shekswess @modal You learned a lesson how the world works I guess :) did this secure a session with PF? :)

English

222

Erik Bernhardsson@bernhardsson·1d

The sandbox revenue for @modal is now as much as the total revenue of the company 9 months ago

English

531

51.3K

Evi@geteviapp·1d

@apjacob03 Compile it to 80x86 and run on a Mac to add a layer more and inside a container just for fun:)

English

Athul Paul Jacob@apjacob03·1d

We compiled the transformer VM itself to WebAssembly (WASM) and paired it with a WASM-compiled C compiler running in the browser locally. This is basically 3 nested virtual machines: a WASM compiler producing bytecode, which gets tokenized and fed to a transformer that simulates WASM execution, itself running as WASM. 😅

English

4.8K

Evi@geteviapp·1d

@israelwegierski @cramforce Running transformer locally makes no sense because you get small batch size, it is also inconvenient to have space and cooling setup. There is no known economically sensible way to run 4-10T SOTA model on premises. Small models like in iPhone camera are ok of course,but not LLMs.

English

Israel Wegierski@israelwegierski·1d

@geteviapp @cramforce No, I mean that the model runs on local hardware

English

Israel Wegierski@israelwegierski·1d

Hey @cramforce — do you see a future where coding agents (OpenCode-style) run entirely on serverless primitives (Chat SDK, AI SDK, Workflows, Sandbox), or will they always need a persistent runtime layer?

English

1.8K

Evi@geteviapp·1d

@ItsBrain4Brain @twlvone @GordonWetzstein modern LLMs produce logprobs over 100k dictionary, projecting (i.e. selecting specific token from those using logprobs and other params like T) is a kind of a tool, you may even call that harness

English

Brain4brain@ItsBrain4Brain·1d

@geteviapp @twlvone @GordonWetzstein Using tool is different imo

English

Gordon Wetzstein@GordonWetzstein·2d

High-resolution image and video generation is hitting a wall because attention in DiTs scales quadratically with token count. But does every pixel need to be in full resolution? Introducing Foveated Diffusion: a new approach for efficient diffusion-based generation that allocates compute where it matters most. 1/7🧵

English

110

1.1K

143.2K

Evi@geteviapp·1d

@ItsBrain4Brain @twlvone @GordonWetzstein you're right actually, given tools (paintbrush) you can also draw them!

English

Brain4brain@ItsBrain4Brain·1d

@geteviapp @twlvone @GordonWetzstein I can produce images and videos in my head, anyone without aphantasia can do this

English

Evi@geteviapp·1d

@LLMJunky If you observe the commit times the bro is clearly in EU/UK, most likely from their London office :)

English

am.will@LLMJunky·2d

Posting the PRs below if you wanna geek out. Note: these are all from JIF. What a chad. github.com/openai/codex/c… github.com/openai/codex/c… github.com/openai/codex/c… github.com/openai/codex/c… github.com/openai/codex/c… github.com/openai/codex/c… github.com/openai/codex/c…

English

1.2K

am.will@LLMJunky·2d

Been digging through the Codex CLI repo and there's a new multi-agent system being built: Multi Agent v2 Here's what's changing and why it matters for agent orchestration 👇🧵

English

111

50.6K

Evi@geteviapp·1d

@petergostev @karpathy clearly explained that this is skill issue :)

English

346

Peter Gostev (SF: 29 Mar - 3 Apr)@petergostev·1d

I'm very curious about OpenAI's planned intern researcher release by September this year. Having tried using current LLMs for OpenAI's Golf Challenge, I would say that Codex & Opus are actively bad researchers (not meaningful difference between them) - They come up with small ideas anchored to what we have - they find it hard to step back and try another route - They set up bad experiments with fallbacks other cheats - They are terrible judges at what is actually meaningful - every idea is rated 9/10 and then nothing works Bear in mind that some of these things are not too bad in the land of software, e.g. in software you do want reasonable fallbacks, you do want to build something that works that might mean a smaller iteration rather than tearing the whole thing down. But for research (even in a small sense, e.g. tuning a prompt) - this is really bad. I want the models to be genuinely stepping back and assessing if they are barking up the wrong tree. I want them to design clean experiments that don't muddy the water with dumb fallbacks that make it seem like something is working. It doesn't feel obvious to me that you could easily have a single LLM (in the short term at least) that could be a great software engineer and a great researcher. I'm sure we'll get there at some point, but I'd best that the research intern would feel quite different to Codex, if it would actually be good at research.

English

8.2K

Evi@geteviapp·1d

@israelwegierski @cramforce Models are updated every 6 weeks, hardware needs 2 years lead.

English

Israel Wegierski@israelwegierski·1d

@geteviapp @cramforce The future will undoubtedly be models in the hardware but they are not yet so powerful.

English

Evi@geteviapp·1d

@a1zhang Try gpt-4 (original “sparks of AGI” GPT) with modern Codex harness. You’ll see harness doesn’t help much if the model didn’t learn lots of skills during training. It is the same how agents overtook workflows in usefulness and completion rates and quality. Harness is temporary.

English

122

alex zhang@a1zhang·1d

guess we disagree

Mike Knoop@mikeknoop

LLM systems swallow harness progress. The most general/universal LLM innovations migrate from client-side harnesses to server-side tools. Innovation typically happens first inside the harness. For example, AI reasoning was originally a harness around GPT-3 ("let's think step by step"). This approach worked so well that it migrated behind the API as a tool (competitive reasons were also a factor; but general utility dominated). Many wouldn't think of AI reasoning as a tool but it definitely is (it's a tool to do natural language program synthesis -- but that's another topic). The same happened with code interpreter which started out as a client-side harness and moved server-side. These tools are made available at inference time to the model alongside specific training to teach the model when and how to use each tool. Because of this, the line between tool and model can get quite blurry. Best to consider such tools as "internal" to the LLM system. This is actually a good test of how general a harness feature is. If a feature remains "stuck" client-side, say inside codex or claude code, then it's likely very task- or domain- specific. Client-side harnesses typically encode a lot of human G factor for specific domains. Whereas tools, due to usage pressure of frontier LLMs, are required to be as general as possible else they wouldn't make the cut. So if you care about measuring AGI it's a good idea to pay attention to default LLM system capabilities behind high usage LLM APIs. And if you care about bleeding edge research ideas, such as RLMs, it's a good idea to pay attention to harness innovation. Ultimately, AGI will not depend on a harness in the same sense humans don't depend on a harness.

English

185

22.9K

Evi@geteviapp·1d

@intellectronica Also numbers are recent and unnatural! And electricity is dangerous! Should we mention nuclear?

English

Eleanor Berger@intellectronica·1d

Reading was transitional tech. It's very recent, hard to use, feels unnatural. It won't survive for much longer now that you can talk with computers.

Reuben Rodriguez@ReubenR80027912

I refuse to believe that most Americans can not read at a 7th grade level If this is “true” then it means we are simply measuring literacy incorrectly; 163M Americans are actively employed Can your 6th grader read an employment contract? Read heavy machinery instructions?

English

624

Evi@geteviapp·1d

@intellectronica @burkeholland @code xhigh is missing!

English

Eleanor Berger@intellectronica·1d

Wooohooo ... thinking effort in @code chat!!

English

1.9K

Evi@geteviapp·1d

@eastdakota @Cloudflare You ok? The paper is 1y old and if you read it you’ll learn that blog post exaggerates the positive sides and neglects negatives.

English

278

Matthew Prince 🌥@eastdakota·2d

This is Google’s DeepSeek. So much more room to optimize AI inference for speed, memory usage, power consumption, and multi-tenant utilization. Lots of teams at @Cloudflare focused on these areas. #staytuned

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English

637

193.2K