Santiago

28 posts

Santiago

@extindar

Beigetreten Mart 2022

38 Folgt3 Follower

Santiago retweetet

Rhys@RhysSullivan·1d

how it feels to open 50 slop PRs on your own repo

English

679

19.7K

Santiago@extindar·5h

@heyblake movienight.gg !

Blake Emal@heyblake·15h

Fork it Drop your landing page URL I'll give 1 piece of advice to as many of you as I can

English

365

139

17.4K

Santiago@extindar·5h

@LauraLunaTech movienight.gg tracking my movie nights with my friends and working on a public version!

English

Laura Luna@LauraLunaTech·16h

4 days ago I had 0 followers. Yesterday I gained 218 in a single day. I genuinely underestimated how powerful X is for tech networking. 🤯 The best part so far has been meeting smart people building cool things. What are you working on right now? 👇

English

178

211

8.8K

Santiago@extindar·5h

Interesting how much better if you directly ask for the specific exercise

English

Santiago@extindar·5h

Using GPT 5.5 Thinking to make a PT plan and got some pretty funky looking proportions

English

Santiago@extindar·6h

@LizBeal12 @can Looks like a little under 1% of tax payers get audited annually with the likelihood going up the higher your income.

English

566

Liz Beal@LizBeal12·7h

@extindar @can Has anyone been audited on their HSA spending?

English

835

can@can·14h

ill regret asking this but what’s the grift behind everything being hsa/fsa eligible now?

English

2.5K

417.4K

Santiago@extindar·6h

@exigentveracity @can apps.irs.gov/app/vita/conte… irs.gov/pub/irs-pdf/p5… You pay a 20% penalty and ordinary income tax for non qualified expenses if you’re under 65.

English

528

Eren Yeager@exigentveracity·6h

@extindar @can This is not true

English

801

Santiago@extindar·7h

@SummersJohns69 @Sky1821084 @bryan_king @can You’re gambling that the IRS doesn’t audit you. If they do and you can’t prove you used HSA money for allowable purchases you have to pay it back, pay ordinary income taxes and pay a 20% penalty

English

237

Johnston Summers@SummersJohns69·7h

@Sky1821084 @bryan_king @can You most definitely don’t need to provide receipts to hsa company. You use their card to pay and this is it

English

474

Santiago@extindar·21h

@paul_cal @KLieret I think the point is to see how the underlying models perform not how good are the harnesses

English

Paul Calcraft@paul_cal·1d

@KLieret Thanks Kilian, yes this para was the source. Good agentic benchmarks should use standard sota harnesses to be relevant imo. Overwhelming majority of agentic coding is currently done in Codex & Claude Code, so those are the important results. Look forward to the leaderboard

English

215

Paul Calcraft@paul_cal·1d

ProgramBench is an *agentic coding* benchmark where everything scores 0%. But they DIDN'T test Codex or Claude Code! Harness doesn't even have context mngment/compaction. Models apparently never hit ctxt limits, so this obvs isn't a fair test of the sota (cc/cx would run long)

Deedy@deedydas

The creators of SWE-Bench just dropped a really simple new benchmark every LLM gets 0% on. ProgramBench asks: can models recreate real executable programs (ffmpeg, SQLite, ripgrep) from scratch with no internet? We are far from saturated on model quality.

English

8.5K

Santiago retweetet

Ofir Press@OfirPress·1d

1) Our team at Meta has a tough new coding benchmark challenging models to code entire programs including ffmpeg and the PHP compiler from scratch. 2) Top accuracy is 0% 3) We will be making the benchmark harder.

John Yang@jyangballin

How much of SQLite, FFmpeg, PHP compiler can LMs code from scratch? Given just an executable and no starter code or internet access. Introducing ProgramBench: 200 rigorous, whole-repo generation tasks where models design, build, and ship a working program end to end. 🧵

English

120.5K

Santiago@extindar·1d

@xyzabcrgb @codetaur Fun

internet@xyzabcrgb·1d

@codetaur What is the purpose of this?

English

508

Codetard@codetaur·5 Eki

got kinda sniped by seeing interactive bookshelf on twitter and thinking "there's no way that's not automatable"

English

37.3K

Santiago@extindar·2d

@AnechoicMedia_ @cremieuxrecueil Traffic hitting sites isn’t free and agents aren’t paying

English

AnechoicMedia@AnechoicMedia_·2d

@cremieuxrecueil Completely unnecessary imo. More of the internet should be llm-enabled, especially reference material like source code repository and documentation that was difficult for humans, but readily ingested by agents.

English

2.8K

Crémieux@cremieuxrecueil·2d

A real downside of the rise of LLMs is that we now have to sit through Cloudflare verification pages all the time.

English

3.7K

63.5K

Santiago@extindar·2d

@0ximjosh If you don’t use and live your product you are too far removed

English

Josh@0ximjosh·2d

It feels like people are forgetting the best companies are ones born from friction in your own life. I see too many tech startups solving problems in fields not a single employee has worked in

English

218

10.2K

Santiago@extindar·2d

@AshtonAU @steveruizok @orcdev This looks sick, gonna try it out

English

Ashton@AshtonAU·3d

@steveruizok I've seen warcraftcn.com by @orcdev !

English

1.4K

Steve Ruiz@steveruizok·4d

github if peak blizzard made it

English

121

164

3.7K

168.4K

Santiago@extindar·2d

@benhylak thanks for the motivation to post - i built this for my friends and figured i'd release it, appreciate any feedback! movienight.gg

English

1.2K

ben hylak@benhylak·2d

the most annoying person you've ever met is always a few weeks away from shipping

English

1.1K

77.6K

Santiago@extindar·3d

@favo_rion @divydend yeah they just act dumb / send you another link. it's not worth the time to respond

English

619

vsp@favo_rion·3d

@divydend has anyone ever tried reverse baiting them for the hell of it? act like you already opened the file, and theres an actual game in there, saying its the most dogshit thing ever? or say they sent the wrong zip file bcs its filled with porn or their school work or something

English

8.4K

div_y@divydend·3d

yo i think i'm good

English

638

4.2K

90.3K

Santiago@extindar·6d

It's ok Codex, you can say goblins

English

Santiago@extindar·25 Nis

@maddiedreese Did you track how much % of usage it took up compared to 5.4? Would be interesting to see if it was more efficient as well

English

Maddie D. Reese@maddiedreese·24 Nis

GPT-5.5 Extra High’s Codex Computer Use portrait of me! Definitely an improvement over 5.4. I think I’m going to call this “MaddieBench”

English

100

6.1K

Santiago@extindar·15 Nis

@spunkweaver @felixrieseberg You good bro?

English

Spuuunk@spunkweaver·15 Nis

@felixrieseberg Have you ever done anything valuable, or is it just this kind of piddly shit for your whole career?

English

595

Felix Rieseberg@felixrieseberg·14 Nis

Today is a big day! We're launching a ~ new ~ version of Claude Code in the desktop app. It's been redesigned from the ground up for parallel work and is a lot faster. It's been my main way to use Claude Code for the last few weeks.

English

617

461

9.9K

945.6K

Entdecken

@heyblake @LauraLunaTech @LizBeal12 @can @exigentveracity @SummersJohns69 @Sky1821084 @bryan_king