tristan

484 posts

tristan

tristan

@tris_does_stuff

Stuff doesn't do itself | https://t.co/a9DbnTaLE7 | https://t.co/LyCMGrPjQB

Germany Katılım Eylül 2025
270 Takip Edilen30 Takipçiler
tristan
tristan@tris_does_stuff·
@HeiligGerh36381 @itsolelehmann No you won't. You'll feel better at a comfortable temperature. If it's 35°C outside there are no clever tricks to make it feel 20 inside. You need a heat pump to pump heat out of the building.
English
0
1
1
4
Gerhard K. Heilig
Gerhard K. Heilig@HeiligGerh36381·
@itsolelehmann I use the AC in my house when it gets extremely hot - but building houses more sensibly is far better than relying on active cooling. You feel much more comfortable in a naturally cool house than in the freezing-cold AC nonsense you often find in the US.
English
0
0
0
122
tristan
tristan@tris_does_stuff·
@athcanft Come live with me in Munich bro it's expensive as fuck, 35°C in summer and -10 in winter.
English
0
0
0
10
Will
Will@athcanft·
i’ve lived in bangkok for nearly 2 years now i’m thinking about italy / portugal / spain next any recommendations as a nomad?
English
99
0
88
15.5K
Sam
Sam@sammgrowth·
one of these is a real ugc creator the other is fully ai found a viral tiktok yesterday and rebuilt it with seedance 2.0 in like 20 minutes same hook. same script. same bones. swapped the character. swapped the setting. thats it this is the unlock most brands still dont see you dont need to find the next viral concept you take whats already proven and clone it with ai then spin 50 variations until one fucking booms infinite shots on goal for the price of one shoot real creators cost $200-500 a video and take a week this took 20 minutes and cost cents drop "clone" and ill dm you the full stack (must follow so i can dm) if you're a brand and want this for your products launch a campaign on Affiliate Network below link.affiliatenetwork.com/sam
English
429
64
601
48.2K
tristan
tristan@tris_does_stuff·
@deadbeef333 @lostbutlucky No, the person was making a joke referencing the time a different maths problem was solved on 4chan. The problem solved by OpenAI was unsolved.
English
1
0
31
2.9K
deadbeef333
deadbeef333@deadbeef333·
@lostbutlucky Wait, so the proof that a few days ago was being touted as "totally original, new work" by OpenAI was actually in the training data after all? And it was from a 4chan post?
English
2
0
75
15.8K
lostbutlucky
lostbutlucky@lostbutlucky·
This is genuinely hilarious. Some anonymous person on 4chan, responding to an anime watch order question, posted a proof that later turned out to be mathematically correct and significant. - It was posted in under an hour after the question. - The poster basically said, “please check for loopholes.” - It sat mostly unnoticed for seven years. - Later, actual mathematicians checked it and were like: yeah, this is legit. - The formal paper literally lists the author as “Anonymous 4chan Poster.”
Aaron Gokaslan@SkyLi0n

@VictorTaelin @OpenAI For those who don’t know: quantamagazine.org/sci-fi-writer-…

English
40
807
12.4K
925.6K
tristan
tristan@tris_does_stuff·
@justinsunyt Look at the error margins. They all scored the same.
English
0
0
0
30
justin
justin@justinsunyt·
Codex is also not the best GPT-5.5 harness! Capy scores higher on TerminalBench, alongside 3 other harnesses People often ask us why we don't just build on top of other coding harnesses like Codex or Claude Code. The reason is simple: the harness makes a BIG difference for both performance and more importantly UX @capydotai we optimize our agents to excel not only at coding, but also behaviors like planning, user communications, and multi-agent orchestration, which happen to be very important for multiplayer/async interfaces like Slack and Linear We also do it for every frontier model so you can bring your Codex/Copilot subscriptions and enjoy a SOTA background agent with any combination of models + reasoning efforts without a harness "tax"
justin tweet media
Theo - t3.gg@theo

Can't stop thinking about how Claude Code is in LAST PLACE on TerminalBench for harnesses using Opus 4.6. There are TEN separate harnesses that use Opus better than Claude Code

English
27
9
233
57.8K
Dr. Mike Israetel
Dr. Mike Israetel@misraetel·
I 100% used AI to help me smooth over some of the prose for my upcoming book ‘The Aesthetic Revolution.’ This was before a whole team of editors from the publishing company further smoothed the prose over. They were made aware of this. I’m stating this now, well before the book comes out, so that there is no controversy if and when someone detects that it has elements of AI writing styles in it. I will also be using AI in the future unapologetically to improve my work in various ways. All of the structure and ideas and content of the book is 100% my own work. If you find it disconcerting that I did this, I wish you the best, and I wouldn’t ever in a million years try to push you to read the book against your interests. Sarcastically, maybe you also find that I used a PC and the internet to help me write the book. Heavens! ;)
English
50
6
224
17.6K
0xSero
0xSero@0xSero·
Munich, where it all started
0xSero tweet media
English
15
0
91
4.6K
tristan
tristan@tris_does_stuff·
@jturntdev Nope mine's been going down far slower today, had a couple of /goals running.
English
0
0
0
42
J J
J J@jturntdev·
You will all realise that even after their “Limit Resets” Your Weekly Quota still drains 2x as fast, for half the amount of work. Since they failed to respond. This will be permanent. Very Anthropic of them.
J J@jturntdev

OpenAI have secretly adjusted our limits. Last week before limit reset. I was using Xhigh all day. 5 day straight i couldn’t get my usage below 55% weekly usage. Since Yesterday, I’ve done 40% of my quota, out of nowhere. So whats going on ? @thsottiaux @sama @OpenAIDevs

English
33
5
208
32.3K
Dalton (Analyze & Optimize)
Dalton (Analyze & Optimize)@Outdoctrination·
Vitamin C shrinks arterial plaques in clinical trial. The picture below shows improvements in under 4 months. 500 mg 3 times daily in people with heart disease - 6/10 people had reduced plaques, while none in control did. Vitamin C has several cardioprotective effects: ➞ Antioxidant ➞ Anti-inflammatory ➞ Collagen supporting ➞ Cholesterol lowering 1954. The old forgotten studies often have the best gems.
Dalton (Analyze & Optimize) tweet media
Dalton (Analyze & Optimize)@Outdoctrination

An incredible study showed major reductions in arterial plaques simply by shining a form of red light onto the body, more so than even statins. (🧵1/7)

English
31
411
2K
182.1K
Jack Mitchell
Jack Mitchell@jack_mitchell01·
Launched my app 2 weeks ago and got 0 sales started grinding: - Reddit posts - AI UGC - Slide shows 0 users and 0 sales 3 days ago I manned up and started posting cringe TikTok’s of myself Got 20 users in 2 days and 2 trials
Jack Mitchell tweet media
English
79
4
244
14.3K
Hans Amato
Hans Amato@HansAmato·
My top 5 highest ROI interventions nobody in the fitness industry talks about: > ASPIRIN: Lowers cortisol, raises T3 uptake, drops ferritin, improves glucose oxidation in neurons. Pro-metabolic. Anti-stress. 40 cents. > ACTIVATED CHARCOAL: Binds endotoxin before it reaches your bloodstream and damages Leydig cells. Brain quiets within hours. Almost nobody uses it correctly. > WHITE BUTTON MUSHROOMS: Aromatase inhibitor that outperformed pharmaceutical AI on my bloodwork. $3 a can. > MILK: Dropped my cortisol further in one week than any adaptogen I've ever run. Also eradicated Klebsiella pneumoniae from my stool test without a single antibiotic. > KESTOSE: Moved my estrogen more than any aromatase inhibitor I've used. Through gut bacteria. Not receptor blocking. The supplement industry made billions selling you complexity. The highest ROI interventions are boring, cheap, and mechanistically sound.
English
33
62
947
55K
Expo
Expo@expo·
🙋‍♂️We'd like to connect with developers who have built production grade mobile apps with Expo + Claude Code OR Codex. Can you give us a shout if this is you?
English
296
17
506
56K
tristan
tristan@tris_does_stuff·
Codex: Sure, I've adjusted the padding by 5px as requested. Now let me run the full test suite of 538 tests as well as linting and type check.
English
0
0
1
18
tristan
tristan@tris_does_stuff·
@MrAhmadAwais @CommandCodeAI Awesome, good move. I'll check it out when BYOK is available, I have too many subscriptions as it is.
English
1
0
0
18
Ahmad Awais
Ahmad Awais@MrAhmadAwais·
how did we make deepseek outperform opus 4.7? i've been thinking about why "open model bad at tool calling" is almost always a harness problem, not a model problem. context: spent the two days looking at billions of tokens in @CommandCodeAI (tb open source ai cli) using deepseek. I ended up writing a tool-input repair layer. the trigger was watching deepseek-flash fail on the simplest /review run, every shellCommand and readFile call bouncing back with a raw zod issues blob, the model unable to recover because the error wasn't in a form it could read. by the end deepseek v4 pro was beating opus 4.7 6/10 times on our internal evals. a few things i learned that feel general: 1/ the failure modes aren't random they're a small finite compositional set. across deepseek-flash, deepseek v4 pro, glm, qwen, the same four mistakes repeat almost exactly: - sending `null` for an optional field instead of omitting it - emitting `["a","b"]` as a json *string* instead of an actual array - wrapping a single arg in `{}` where the schema expected an array (an "empty placeholder") - passing a bare string where an array was expected (`"foo"` instead of `["foo"]`) four repairs, ~30-100 lines each, ordered carefully (json-array-parse must run before bare-string-wrap or `'["a","b"]'` becomes `['["a","b"]']`). that is the whole catalogue. when i hear "this open source model can't do tool calls" i now assume one of those four, and so far that's been right ~90% of the time. 2/ the funniest failure mode is also the most revealing. deepseek-flash, when asked to edit or write a file, sometimes emits the path as a *markdown auto-link*: filePath: "/Users/x/proj/[notes.md](http://notes. md)" our writeFile tool obediently trued creating files literally named `[notes.md](http://notes .md)` until we caught it. this is not a hallucination. it's the post-training chat distribution leaking through the tool boundary the model has been rewarded for auto-linking in conversational output, and is applying that prior in a context where it makes no sense. the fix is two regex lines that unwrap only the degenerate case where link text equals url-without-protocol real markdown like `[click](https://x .com)` passes through untouched. this is also conditioning of their own tools during RL which were different from all other tools we write and ofc can't predict. "tool confusion" is a more useful frame than "capability gap." the model knows how to format a path. it just hasn't been told clearly enough that this path is going to fopen, not into a chat bubble. so we encode that hint at the schema level `pathString()` instead of `z.string()` and the leak is plugged for every path field at once. 3/ the design choice that mattered was inverting preprocess-then-validate to validate-then-repair. my first attempt was the obvious one: a preprocessing pass that normalized inputs (strip nulls, parse stringified arrays, etc.) before zod ever saw them. it broke immediately, writeFile content that *happened* to be json-shaped got rewritten before it hit disk. silent corruption, easy to miss in a smoke test. then i made it less greedy - parse the input as-is. if it succeeds, ship it. valid inputs are never touched. - on failure, walk the validator's own issue list. for each issue path, try the four repairs in order until one applies. - parse again. on success, log `tool_input_repaired:${toolName}`. on failure, log `tool_input_invalid:${toolName}` and return a model-readable retry message. the structural insight here is: when you preprocess, you encode a prior about what's broken. when you let the validator complain first, the schema is the prior, and you only spend repair budget at the exact paths the schema actually disagreed at. the validator is doing the work of localizing the bug for you. it's the same shape as cheap-then-careful everywhere else try the fast path, fall back on evidence. (this also gives you per-tool telemetry for free. you can watch repair rates per (model, tool) and notice when a model regresses on a specific contract before users do.) 4/ shape invariants and relational invariants need different fixes. the four repairs above all handle shape problems wrong type, missing key, wrong container. but read_file had a *relational* invariant: "if you provide offset, you must also provide limit, and vice versa." deepseek kept calling `readFile({ absolutePath, limit: 30 })` and getting an `ERROR:` back. you can't fix this with input repair, because each field is independently valid the bug is in the relationship between them. so i taught the function the model's intent instead. `limit` alone → `offset = 0`. `offset` alone → `limit = 2000` (matches common read tool ops default). then surfaced the decision back to the model in the result: "Note: limit was not provided; defaulted to 2000 lines. To read more or fewer lines, retry with both offset and limit." no `Error:` prefix, so the tui doesn't paint it red. the model sees what we picked and can self-correct on the next turn if our guess was wrong. transparency over silent magic wins big. repair where you can. extend semantics where you can't. surface the choice either way. zoom out: a lot of what looks like model capability is actually contract design. a strict schema is a choice with a cost it filters out noise, but it also filters out recoverable noise from any model that hasn't memorized the exact json contract you happened to pick. the largest commercial models eat that cost invisibly and are linient on tool calling because they've seen enough of every contract during pretraining; open models pay it loudly and get dismissed for it. the harness is where you mediate between distributions. four small repairs (i'm sure more to follow as we have three more merging today), two regex lines for auto-links, one relational default, one prefix change. the model didn't change. the contract got more forgiving in exactly the places it needed to be. deepseek v4 pro now beats opus 4.7 6/10 times on our internal evals. imo "skill issue" applies to the harness more often than the model.
Ahmad Awais@MrAhmadAwais

Wow I just made DeepSeek V4 Pro beat Opus 4.7 6/10 times in our internal evals by auto repairing many of its quirks in tool calling. It’s performing super solid for such a cheap model.

English
57
137
1.4K
347.8K
Inquiring Minds
Inquiring Minds@TiffaniMarie483·
When I’m using quotes but have to end the sentence, where do I put the period? Would it be “quote”. Or “quote.”
English
124
2
75
176K
Siim Land
Siim Land@siimland·
After 21 hours of sleep deprivation, a single large dose of creatine (0.2 g/kg) can reverse the decline in cognitive performance. This is a follow-up to the previous study that tested a larger 0.35 g/kg dose, which yielded even larger effects. So, instead of needing to take 20-30g of creatine, you could get adequate results from half as much. Note: this dose might only have benefits for cognitive performance and vigilance after severe sleep deprivation, not for daily use. mdpi.com/2072-6643/18/8…
Siim Land tweet media
English
9
16
133
10.4K
tristan
tristan@tris_does_stuff·
@sonrcol @theo @shadcn You don't need a federal trademark registration for something to be considered a protectable trademark.
English
0
0
2
401
Theo - t3.gg
Theo - t3.gg@theo·
Hey @shadcn, happy to set you up with my lawyers and cover costs if you want to C&D this guy. Absolutely obnoxious behavior on his part. Not sure there's any other path that will work with trash like him sadly.
std dev@subproject_22

Hey @dok2001 @eastdakota @Cloudflare - my free product is being attacked by Vercel, please send help 😭 Did you know you can’t use the “shadcn” name without his majesty’s approval?

English
65
3
1.4K
263.8K