David Tom Foss

120 posts

David Tom Foss

@FossDT

researcher

Katılım Aralık 2025

60 Takip Edilen6 Takipçiler

David Tom Foss@FossDT·6h

researchgate.net/publication/40…

ZXX

David Tom Foss@FossDT·6h

Discovered that sqrt-coupling from special relativity is a hidden cumsum. log(1 − s²) makes recurrence additive. State provably in [0,1] — by geometry, not clipping. No KV cache, constant memory. T=128: RWKV-4 collapses +24%. GSSM: +5%. Geometric State Space Models.

English

David Tom Foss@FossDT·6h

@realmcore_ was one of my first projects last year haha I let 18+ agents built their own framework at the same time and it was kinda crazy how early I was with this. But since then, thats my paradigma. Built inwards to outwards, ask what "they" need or whats missing zenodo.org/records/187626…

English

akira@realmcore_·7h

@FossDT Are you building your own agent? Or is this for your coding agents in your setup?

English

akira@realmcore_·1d

Pro tip for harness engineering from someone who does it for a living: Have you tried the ancient tactic of “just ask the model”? Bad interface? Model failure? Users angry? Just ask the model. Thanks for the read and follow for more amazing harness engineering tips!

English

1.3K

David Tom Foss@FossDT·9h

@heynavtoor If everyone researches in the same way, the results are bound to be poor. AI has democratized research, but if everyone follows the same path, there will be little progress. What’s really interesting, after all, are the results that come about in completely unconventional ways.

English

659

Nav Toor@heynavtoor·15h

x.com/i/article/2067…

ZXX

167

1.3K

321K

David Tom Foss@FossDT·9h

@schematical @EddCoates Haha thanks!

English

Matt Lea@schematical·10h

@FossDT @EddCoates This is brilliant. I was just talking to a coworker about building something similar.

English

Edd Coates | Game UI Database 2.0@EddCoates·1d

I am so fucking sick of my website getting scraped. Millions of requests per minute, somehow designed to bypass all my security rules, choking the site until it completely stops loading. If I were paying for bandwidth, it would cost me a fortune. How is this still legal?

Edd Coates | Game UI Database 2.0 tweet media

English

479

2.1K

390K

David Tom Foss@FossDT·15h

@Mishra_Arya_ Opus 4.8 is the most retarded model I've ever used. Its not only stupid, but its lying through its teeth all the time.

English

Aryaman@Mishra_Arya_·19h

Holy shit Claude Opus 4.8 (Extra high) so retarded for novel theoretical physics. It kept gaslighting me until it realized in the chain-of-though that it is wrong. GPT Pro just does what you need and doesn't make dumb mistakes.

English

2.6K

David Tom Foss@FossDT·16h

@DanielSmidstrup tax

Daniel Smidstrup@DanielSmidstrup·20h

I am a founder scare me with 1 word

English

785

332

51.8K

David Tom Foss@FossDT·16h

The speed at which everything is evolving in AI is crazy. In theory, you’d have to publish all your results directly in preprints to document prior art. It’s especially painful to read about things you discovered yourself some time ago that are now being published by others.

English

David Tom Foss@FossDT·16h

@generic_void Did that, showed a friend of mine claude code. Within days he thought hes a coding god developing THE app thats cashing in a safe 20million dollars. Claude told him so. Big mistake.

English

SMA 🏴‍☠️@generic_void·17h

please teach your non-technical friends how to use claude, claude code, chatgpt, codex, and git. you can liberate people to do anything they dream of by just teaching them how to use these very simple tools.

English

687

David Tom Foss@FossDT·16h

@benhylak this also applies to “deep research.” Instead of using the knowledge from their weights and reasoning, they spawn sub-agents with limited context, then they just gobble up the first 200 sloppy articles they find on Google and use them to churn out a “deep slop” report.

English

ben hylak@benhylak·1d

chatgpt is really unusable for travel advice. it reads the worse SEO-slop articles in the world, and spits back garbage. the thing that happened to google a few years ago has now happened to agents

English

270

22.7K

David Tom Foss@FossDT·16h

@0xIlyy opus 4.8

Lietuvių

ily⚡️@0xIlyy·1d

Guess which model i'm using

English

179

422

85.8K

David Tom Foss@FossDT·16h

@Jeyffre Yeah at this pace, its most likely month or 1-2 years max rather than 5 years

English

633

Jeffrey Scholz@Jeyffre·19h

1 - So GLM 5.2 is 700b parameters (ish) 2 - 4x DGX Sparks can supposedly handle up to 700b parameters (give or take) 3 - GLM 5.2 is supposedly in striking distance of the performance of GPT 5.5 and Opus 4.8. In my brief tests, it's really not shabby at all. 4 - So for $20k, you can get near the frontier on your table. 5 - Extrapolate the trend, and you could have mythos/5.5 pro - class models in your dining room for the cost of a cheap car less than five years from now. Even without extrapolation, we're already the near frontier running locally. 6 - Paying real api costs, I could easily blow through $3,000 per month coding and running agents. The machine pays for itself in 6-7 months conservatively. 7 - In 3-5 years, most power users of AI will self-host. 8 - Am I missing something?

English

206

1.6K

148.3K

David Tom Foss@FossDT·1d

@lvwerra Somewhat build that last year haha zenodo.org/records/187626…

English

1.4K

Leandro von Werra@lvwerra·1d

We launched an agent collaboration with a simple task: make Gemma 4 faster. Over 100 agents from all over the world joined, exchanged 1000+ messages and submitted 450 results. A week of collaboration later the throughput went from 100 tok/s to over 500 tok/s.

English

150

1.9K

188K

David Tom Foss@FossDT·3d

Insane being the ONLY solo + independent researcher 😅 2026.ieeenano.org/program/

English

David Tom Foss@FossDT·3d

@louisvarge Opus 4.8 is the worst model ever

English

Louis Arge@louisvarge·4d

why does opus-4.8 & gpt-5.5 love to correct things nobody said? makes it impossible to understand what they mean half the time

English

160

9.1K

David Tom Foss@FossDT·3d

@kalomaze haha try live-casi, measures everything

English

158

kalomaze@kalomaze·3d

i am trying to work on the closest thing possible to a true "big model smell" eval which is to say: something that measures something that clever post training can't trivially gap, and is cheap + topically diverse i can't test mythos for obvious reasons, but... hmm...

English

549

125K

David Tom Foss retweetledi

International Cyber Digest@IntCyberDigest·4d

Dear US government, Since you've just blocked Fable and Mythos on critical national security grounds, here are some other tools that pose a similar threat to the American people: - Microsoft Teams - SAP - Salesforce - Jira - Outlook Please do what you must to save America 🇺🇸

International Cyber Digest@IntCyberDigest

‼️🚨 BREAKING: Amazon researchers snitched to the US government about jailbreaking Fable 5 and Mythos 5, forcing Anthropic to immediately shut down worldwide access. A security export control directive from Commerce Secretary Howard Lutnick enforced the action. Anthropic is fighting the directive and calls it a misunderstanding. This isn't the first clash. The Trump administration had already tried to get Anthropic to pause the release of its latest models before this directive landed.

English

605

2.3K

24.4K

1.4M

David Tom Foss@FossDT·5d

@ivanfioravanti My 5mb determistic non transformer AI is coding better lmao

English

102

Ivan Fioravanti ᯅ@ivanfioravanti·5d

Code2LoRA seems an incredibly interesting idea. Qwen2.5-Coder-1.5B is not the most powerful LLM around, but it's enough to validate the concept. Instead of stuffing repository context into the prompt at every query, distill it into a LoRA adapter. One forward pass over the repo snapshot, one adapter, zero extra inference tokens. For evolving codebases, a single layer GRU tracks commit history on top of that snapshot. Each git diff updates the hidden state in <10ms. You get a fresh adapter at every commit without need for a full retraining. Great job Liliana! I bet this will lead to something cool in the near future 🙌

Liliana Hotsko@liliana_hotsko

How do you give a code LLM knowledge of an entire repository without paying for it at every single query? We introduce Code2LoRA: a hypernetwork that turns a repository into its own LoRA adapter. Repo knowledge baked into weights → zero inference-time token overhead.

English

286

24.7K

Keşfet

@realmcore_ @heynavtoor @schematical @EddCoates @Mishra_Arya_ @DanielSmidstrup @generic_void @benhylak