Phil

2.2K posts

Phil

@phill__1

Currently working on Tech Support AI

H Katılım Ocak 2019

426 Takip Edilen2K Takipçiler

Phil@phill__1·2d

@sumitdotml Dario was acting all high and mighty for not making huge infra bets vs sama. Guess openai actually better forcasted ai demand

English

1.9K

sumit@sumitdotml·2d

just incredible what's the point of shipping every single day when your users keep hitting api errors at an alarming frequency? maybe the people who're getting praised for their shipping speed should actually sleep properly so that their brains can comprehend this enormity?

sumit@sumitdotml

cool

English

459

145.3K

Phil@phill__1·22 Mar

@diegocabezas01 Highly recommend impeccable.style

English

881

Diego | AI 🚀 - e/acc@diegocabezas01·22 Mar

GPT-5.4 front end skill in codex should be on by default

English

454

43.8K

Phil@phill__1·22 Mar

@HanchungLee Also like, the largest customers almost certainly pay by invoice + bank transfer, not credit card

English

473

Han@HanchungLee·22 Mar

financial literacy psa: payment processing volume is 1000x larger than gdp. so, yes, ramp data is used only because it’s available. and is tiny enough that it can’t be extrapolated.

Eric Glyman@eglyman

If your kid’s lemonade stand processes 0.5–1% of US GDP, then yes, that’s a fair analogy for @tryramp. Ramp’s data is useful for the same reason it gets cited at all: it is quite consistent with the revenue figures OpenAI and Anthropic release. If it weren’t, no one would care.

English

138

53.2K

Phil@phill__1·2 Mar

@peterom Can't wait!

English

POM@peterom·2 Mar

@phill__1 There's a new version coming out that's a LOT better, i feel almost bad for having released what I have!

English

303

Phil@phill__1·2 Mar

I have tried a bunch of these projects, but after having codex run on my Desloppify for 8h straight I have to say, this is by far the best. The code has gotten so much better, stuff I would have never touched is cleanly refactored, everything works. Huge props

POM@peterom

Introducing Desloppify v.0.8. Thanks to many workflow improvements + new agent planning tools, it can now run for days on end - autonomously finding, understanding, & fixing large and small code quality problems. There's no reason your slop code can't be beautiful!

English

1.3K

Phil@phill__1·28 Şub

>new model out >dam it just one shots everything, never shipped so fast >it’s adding esoteric performance tweaks I don't even understand. we are so back >codebase growing 5k locs per day >what? model makes more mistakes now?! >they clearly quantized it in the backend >:(

English

188

Phil@phill__1·26 Şub

@scaling01 This is obviously extremely useful for web development assets, PowerPoint creation etc. Probably replaces 80% of my image gen model usage

English

113

Lisan al Gaib@scaling01·26 Şub

I'm so unimpressed by that honestly because I can think of an RL environment + reward function instantly SVGs are no fun when you can just RL

Jimmy Apples 🍎/acc@apples_jimmy

Ok svg benchmark has been saturated. It does it in real time so you can watch. These guys released the StarVector paper last year.

English

130

14.4K

Phil@phill__1·21 Şub

@Bayesian0_0 If they don't add more 8h+ tasks quickly, more then 30h for opus 5 seems possible. There are only 31 tasks with 8h+ and only 5 with 16h+, so solving just 1-2 more per bucket leads to wierd behavior in the logistical regression

English

Bayesian@Bayesian0_0·20 Şub

I've been creating time horizon markets on manifold. It's getting hard to pick reasonable buckets for opus 5. <10h, 10-15, 15-20, 20-25, 25-30? seems too wide, but also 1h wide buckets are not gonna work, i aim for ~10 buckets. also considering moving to 80% time horizon

Bayesian@Bayesian0_0

This is crazy

English

4.2K

Phil@phill__1·16 Şub

@test_tm7873 Not benchmaxxed, the benchmarks aren't even that spectacular

English

testtm@test_tm7873·16 Şub

Qwen 3.5 again benchmaxxed model? :( or. just bad at selling stuff.

Andon Labs@andonlabs

Qwen 3.5 goes bankrupt on Vending-Bench 2

English

1.1K

Phil@phill__1·12 Şub

@scaling01 There are only 7 ppl on codeforces with a higher elo, so close to beeing superhuman

English

4.7K

Lisan al Gaib@scaling01·12 Şub

Gemini 3 Deep Think (feb update) Benchmarks

Lisan al Gaib@scaling01

Gemini 3 Deep Think scores 84.6% on ARC-AGI-2

English

571

375.5K

Phil@phill__1·9 Şub

@_AashishReddy Claude ai extracts everything that can't be OCRd into images, so a pdf can lead to 100 images beeing added on upload depending on the pdf formatting

English

Aashish Reddy@_AashishReddy·9 Şub

Anyone know why Claude is hitting me with this for a pdf that's only 36k tokens?

English

1.5K

Phil@phill__1·6 Şub

@adonis_singh @nicdunz Making the balls bounce in 3d in a dodecahedron seems decently harder. At least opus 4.6 can't get the collisions right on a first try

English

adi@adonis_singh·6 Şub

@nicdunz yeah hexagon test is saturated asf now, someone needs to make a similar but much harder variant

English

330

nic@nicdunz·6 Şub

its basically the same

Melvyn • Builder@melvynx

First test with the polygon: → Opus takes 2 minutes while Codex takes 4 minutes. → Opus adds interesting details (speed value, colors) not Codex. In the first iterations, Codex outputs the index.html in the chat instead of creating the file. So stupid...

English

1.4K

Phil@phill__1·1 Şub

@Lydskia @ArmandDoma Like 2g for me

English

Julia Robots@Lydskia·1 Şub

@phill__1 @ArmandDoma You have to take massive amounts of phenibut for it to make you feel goofy/good like you would after drinking a few. And don’t ever take it WITH alcohol or you’ll be blacking out after like five

English

Armand Domalewski@ArmandDoma·1 Şub

they really should invent a form of alcohol that gives you the same buzzy benefits without all the negative health consequences. I’m a Polish Catholic, I like to drink, but I’m 36 and one wine glass tanks my Oura ring sleep score lmao

English

261

36.5K

Phil@phill__1·29 Oca

@test_tm7873 I think it's this new agentic vision announced today blog.google/innovation-and…

English

testtm@test_tm7873·29 Oca

Did you know that Gemini Thinking (flash thinking) can split image to many smaller images to better read ? :)

English

1.8K

Phil@phill__1·28 Oca

@thogge We still don't have an answer but it's now a 1.4T question

English

870

tyler hogge@thogge·27 Oca

Remember this article? It’s been 2.5 years. So, how’d it shake out?

English

16K

Phil@phill__1·26 Oca

@Angaisb_ Needed 20x in the opus 4/4.1 era, downgraded to 5x with opus 4.5 and it's enough for me. I feel like 20x with opus 4 and 5x with opus 4.5 have very similar limits

English

Angel 🌼@Angaisb_·25 Oca

Claude Max users ($100): are the usage limits enough? I'm thinking about maybe getting it for a month but I don't wanna waste that much money without first knowing how good the limits are

English

157

221

58.2K

Phil@phill__1·15 Oca

@btibor91 Same with poke from @interaction

English

940

Tibor Blaho@btibor91·15 Oca

ChatGPT is no longer available on WhatsApp

English

391

59.9K

Phil@phill__1·13 Oca

@Presidentlin Yea I feel like the future of slide creation will just be optimized skill[.]md files. As soon as models are consistently very good at it, Claude ai, Gemini app and chatgpt will all get the feature natively

English

Lincoln 🇿🇦@Presidentlin·13 Oca

I am once again thankful I dodged the @phill__1 snipe of making a slide maker tool. The model makers are going to make one, it will be 90% good, but will require some sort of wizardry to get the type of results Elie or Alex gets. For those who fail at being a wizard, you'll get the Nano Banana look. It will be hard to explain to the market why your tool is better than the dozen out there. Most have tried and churned from Gamma. It will be a product category that will quietly get shelved away or move to the equivalent of Webflow creators. LLMs and Vibe coding should have killed Webflow, but it's still here. The arc is long for this AI wave, we are yet to see the v2 and v3 Lovable clones. As an idea, it has a high TAM, but it honestly just doesn't seem all that fun.

English

244

Phil@phill__1·7 Oca

@dejavucoder Why did they stop publishing the valuation at which they are raising tho?

English

sankalp@dejavucoder·6 Oca

xai raises 20B "looking ahead, grok 5 is currently in training, and we are focused on launching innovative new consumer and enterprise products that harness the power of grok, colossus, and 𝕏 to transform how we live, work, and play."

xAI@xai

x.ai/news/series-e

English

2.3K

Keşfet

@sumitdotml @diegocabezas01 @HanchungLee @peterom @scaling01 @Bayesian0_0 @test_tm7873 @_AashishReddy