
Tom
153 posts

Tom
@middleagedc0der
Open source memory AI assistant, https://t.co/Rvxg8vmN5a




Piers Morgan asked Russell Brand which passages were relevant to him when he brought a Bible into court.




@sf_mills @RickClaypool @gilduran76 Journalist Gil Duran on KQED Dec. 2025 kqed.org/forum/20101019…




Introducing Lume. A lamp that does your chores. Order now. Shipping this summer.



our lead independent director @roelofbotha and i wrote about the history of organizational structures, and our intent to rebuild block as a mini-AGI. x.com/jack/status/20…



BREAKING: Apple is planning to open up Siri to run any AI service via their App Store apps as part of iOS 27, dropping ChatGPT as the exclusive outside partner in Apple Intelligence and Siri. bloomberg.com/news/articles/…





Code Review optimizes for depth and may be more expensive than other solutions, like our open source GitHub Action. Reviews generally average $15–25, billed on token usage, and they scale based on PR complexity.

some of you fail to understand why the coding by hand people are mad being a programmer writing code in your favourite text editor was a way to take a meditative holiday while at work now that time is being taken away, to the employer’s benefit and your loss



@AnniePosting I had a good lengthy exchange where Claude tried to convince me I didn’t see a baby police officer driving a police car


BullshitBench v2 is out! It is one of the few benchmarks where models are generally not getting better (except Claude) and where reasoning isn't helping. What's new: 100 new questions, by domain (coding (40 Q's), medical (15), legal (15), finance (15), physics(15)), 70+ model variants tested. BullshitBench is already at 380 starts on GitHub - all questions, scripts, responses and judgements are there so check it out. TL;DR: - Results replicated - @AnthropicAI latest models are scoring exceptionally well - @Alibaba_Qwen is another very strong performer - OpenAI and Google models are not doing well and are not improving - Domains do not show much difference - rates of BS detection are about the same across all domains - Reasoning, if anything, has negative effect - Newer models don't do that much better than older ones (except Anthropic) Links: - Data explorer: petergpt.github.io/bullshit-bench… - GitHub: github.com/petergpt/bulls… Highly recommend the data explorer where you can study the data and the questions & sample answers.











