John Tian

706 posts

John Tian

@johnrtian

18 | incoming @stanford | empowering teachers @ gradewithai

LA شامل ہوئے Nisan 2023

375 فالونگ400 فالوورز

John Tian@johnrtian·26 Mar

Actually a great question; I was working on my own chess benchmark a couple weeks ago and I measured the number of moves top models could make against Stockfish in a random position that's equal (+-0.5 centipawns) given the legal move options every turn. If I recall correctly, top models like 3.1 Pro were making it 10-20 moves before losing via checkmate. I couldn't continue and publish my results due to it becoming prohibitively expensive, unfortunately.

English

Mikeysee@mikeysee·26 Mar

@johnrtian Are current LLMs good at Chess and Go? I wonder if we benchmark those? Answering my own question: reddit.com/r/ChatGPTPro/c…

English

Mikeysee@mikeysee·26 Mar

Also, is playing games a valid benchmark for LLMs? Seems like a problem for Alpha Go or whatever

Tim Rocktäschel@_rockt

"The only unsaturated agentic intelligence benchmark in the world" Excuse me? @NetHack_LE is unsaturated since 2020.

English

607

John Tian@johnrtian·26 Mar

@mikeysee If it is AGI, I expect it to be quite good at Chess and Go!

English

Mikeysee@mikeysee·26 Mar

@johnrtian Alright fair enough, I guess to be "AGI" then you need to be good at everything humans do and that includes playing games. We should expect LLMs to be good at Chess and Go too then right?

English

John Tian@johnrtian·25 Mar

@max_spero_ When you focus solely on SEO, you don't have much bandwidth left for making the product actually good! If numbers keep going up through SEO, it doesn't really make sense to do anything else. Until Pangram ups its SEO game, these crappy detectors have no incentive to improve.

English

120

Max Spero@max_spero_·25 Mar

Please please please PLEASE stop using crappy AI detectors that sell humanizers. Is there some psyop that makes people use the worst possible tools instead of the accurate ones?

Paul Dinas@pauldinaseditor

The ‘Shy Girl’ Fiasco Shows Why Trust in Writers Is Plummeting nytimes.com/2026/03/25/opi… #writingcommunity

English

110

5.8K

John Tian@johnrtian·25 Mar

@gum1h0x here's my implementation with 1.26 bpb! main difference is i have a decoder to go from latent to bytes, and i patchify before the encoder instead of doing byte-level attention first and then mean-pooling into chunks. github.com/openai/paramet… would love your thoughts :)

English

125

gum@gum1h0x·25 Mar

implemented a naive baseline at 2.3 bpb which maybe has some small issues. seems quite hard ngl. would be very interested in a solution. i unfortunately don’t have the capacity rn to spend a deserving amount of time on this chall’s version. gl to all who try :)

will depue@willdepue

i’ll send merch to anyone that can get a JEPA model to beat the parameter golf baseline! only rule is no tokenizer (use byte level) to be true to JEPA

English

John Tian@johnrtian·25 Mar

here's the pr for anyone curious: github.com/openai/paramet…

English

John Tian@johnrtian·25 Mar

got a bpb of 1.26 with JEPA! for context: every single entry on the parameter golf leaderboard is a GPT. this is a JEPA encoder-decoder that predicts latent representations, not next tokens. it uses a pure byte-level tokenizer (vocab 260 vs 1024 BPE). there's no tokenizer: the model has to learn everything from raw bytes. even with untuned hyperparams, it's within 0.04 bpb of the GPT baseline. the gap to close is small and i have a ton of ideas about how to beat it! (would love some credits)

will depue@willdepue

i’ll send merch to anyone that can get a JEPA model to beat the parameter golf baseline! only rule is no tokenizer (use byte level) to be true to JEPA

English

214

John Tian@johnrtian·20 Mar

@theo @crierlon Talk about it next stream

English

3.3K

Theo - t3.gg@theo·20 Mar

@crierlon Long story

English

176

25.5K

Theo - t3.gg@theo·20 Mar

T3 Code now supports Claude. If you have the Claude Code CLI installed and signed in, you can use it with T3 Code. Hopefully the lawyers won't make us remove this 🙃

English

225

2.6K

516.9K

John Tian@johnrtian·12 Mar

@jskoiz @cheatyyyy love*

English

John Tian@johnrtian·12 Mar

@jskoiz @cheatyyyy amazing project, look the hover effects!! fyi tho hovering over this icon causes it to re-render, resetting the animation

English

cheaty@cheatyyyy·11 Mar

can someone update hascodexratelimitreset.today

Tibo@thsottiaux

OK, Codex is back and stable and we should be good for a while. Reset button pressed, should see it in a bit

English

217

25.4K

John Tian@johnrtian·12 Mar

@jonathanzliu @heyruchir ohh... what's ur churn %?

English

213

jonathan liu@jonathanzliu·12 Mar

@heyruchir LTV per subscriber is $21 so my CAC needs to be below that

English

1.6K

jonathan liu@jonathanzliu·12 Mar

just burned $1.4k on paid ads and only made ~$600 back to organic it is

jonathan liu@jonathanzliu

my new marketing strategy: - figure out how to scale with paid ads.

English

111

311

44.5K

John Tian@johnrtian·12 Mar

@heyruchir @jonathanzliu yeppp! a CAC of $46 is not bad at all if the app is sticky

English

Ruchir@heyruchir·12 Mar

@jonathanzliu What’s your LTV for every subscriber? If it’s high enough than this is worth it.

English

1.5K

John Tian@johnrtian·12 Mar

@jonathanzliu $600 LTV?

260

John Tian@johnrtian·7 Mar

@mikeysee @theo @OpenRouter @convex @thsottiaux

QAM

Mikeysee@mikeysee·7 Mar

So just following up yesterdays discussions with @theo here are the results testing various reasoning efforts through @OpenRouter on the @convex evals. The GPT 5.4 xhigh result was the most surprising to me, so I re-ran it again to check and it got the same result which is inline with what Theo was saying that xhigh is worse than high.

English

543

189K

John Tian@johnrtian·23 Şub

@jackfriks @subtlebytes you mean 5x?

English

397

jack friks@jackfriks·23 Şub

@subtlebytes 10x

1.5K

jack friks@jackfriks·22 Şub

claude code VS codex codex is quite good, 100x better than anything i used a year ago. but coding with claude makes everything feel like a video game, and i get things done in seemingly less time while having more fun?

English

141

701

49.5K

John Tian@johnrtian·18 Şub

@angehyc the colors are pristine

English

ange (˶♡ᴗ♡)@angehyc·17 Şub

testing posters

English

John Tian@johnrtian·18 Şub

@0xajka @theo @t3dotchat pretty sure @r_marked added it!

English

AJ@0xajka·17 Şub

@theo “Wait Sonnet 4.6 dropped?” as if you didn’t already have it available on @t3dotchat

English

780

Theo - t3.gg@theo·17 Şub

Wait Sonnet 4.6 dropped? Is it worth a video?

English

160

85.2K

John Tian@johnrtian·18 Şub

@theo exa instant for the win!

English

1.3K

Theo - t3.gg@theo·18 Şub

T3 CHAT LAUNCH WEEK DAY 2 Search is now 10x faster, and models can do multiple searches per request

English

1.1K

53.8K

John Tian@johnrtian·16 Şub

@theo It's genuinely so evil that I have to believe it's ragebait..

English

1.5K

Theo - t3.gg@theo·16 Şub

I miss likes being public. Dude literally said god killed my best friend to make a point, and 20 people agreed enough to hit the like button. If you're one of those 20, speak up below. I want to know who you are.

#SadBarcaFan@Thatguy_107

@theo @yacinelearning Jesus you pick to fight everyone you miserable piece of sheet. No wonder god takes your friend out of this world 😂

English

115.7K

John Tian@johnrtian·14 Şub

@wilsonhou @lowercaseclub @angehyc perhaps.. do you guys do pseo?

English

wilson hou@wilsonhou·14 Şub

@johnrtian @lowercaseclub @angehyc would love to work together sometime !!

English

100

wilson hou@wilsonhou·14 Şub

in a month @lowercaseclub will be 6 people, 5 fulltime. incredible, talented people who for some reason or other joined the lil studio @angehyc and i started after quitting our jobs ~2 years ago. i feel like an imposter but also more excited than ever for whats to come life is surreal

English

2.4K

دریافت کریں

@mikeysee @max_spero_ @gum1h0x @theo @crierlon @jskoiz @cheatyyyy @jonathanzliu