James Duez

4.5K posts

James Duez banner
James Duez

James Duez

@jamesduez

Precise, deterministic and explainable AI for enterprise-grade applications. Co-Founder and CEO at https://t.co/gjiDphpqq6

52.615201,1.122912 Katılım Şubat 2009
1.5K Takip Edilen1.6K Takipçiler
James Duez retweetledi
Optalysys
Optalysys@Optalysys·
privacy won’t scale on luck — predictable encrypted compute is the real unlock
English
0
1
0
65
James Duez retweetledi
Gary Marcus
Gary Marcus@GaryMarcus·
Not saying this *is* the crash, but it is what the opening moments of a crash would look like.
Gary Marcus tweet media
English
29
60
524
38.7K
James Duez retweetledi
Chris Bakke
Chris Bakke@ChrisJBakke·
Chris Bakke tweet media
ZXX
408
3.3K
41.2K
1.8M
James Duez retweetledi
Sully
Sully@SullyOmarr·
so let me get this right: Oracle says Openai committed $300B for cloud compute → oracle stock jumps 36% (best day since 1992) Oracle runs on Nvidia GPUs → has to buy billions in chips from Nvidia Nvidia just announced they're investing $100B into openai Openai uses that money to... pay oracle... who pays Nvidia... who invests in Openai
English
1.1K
2K
26.2K
3.2M
James Duez retweetledi
Gary Marcus
Gary Marcus@GaryMarcus·
“Gary Marcus calls the belief in LLM understanding one of ‘the most profound illusions of our time’"
Gary Marcus tweet media
English
19
32
203
18.1K
James Duez
James Duez@jamesduez·
@VentureBeat When LLMs are outside their training zone, they don’t get worse at thinking… they reveal they were never thinking at all. We need neurosymbolic approaches that are grounded in logic, not just language. @RainbirdAI
English
0
0
0
14
James Duez
James Duez@jamesduez·
@ericmitchellai @sama It is not precise, deterministic and auditable - the attributes required for high stakes applications where there is zero tolerance for error. Precise reasoning is not generalisable. LLMs are a piece in the neurosymbolic puzzle, but knowledge needs to be a first class citizen.
English
0
0
0
23
Eric
Eric@ericmitchellai·
> GPT-5 is the first series of models that actually doesn’t hallucinate basically at all *real-world utility-maxxing instead of benchmark-maxxing intensifies* Disclaimer: GPT-5 is still not perfect and may make (far fewer now) mistakes
Max Weinbach@mweinbach

GPT-5 is the first series of models that actually doesn’t hallucinate basically at all, especially when given mildly business logic/models/research notes and having it work with the data

English
336
92
1.6K
1M
James Duez retweetledi
Gonto 🤓
Gonto 🤓@mgonto·
The saddest thing on my day is that @GaryMarcus is right. I hate it!
English
2
3
49
25.8K
James Duez retweetledi
Ewan Morrison
Ewan Morrison@MrEwanMorrison·
"the best move may be to completely avoid relying on AI chatbots to provide factual information." IMHO - AI companies have a problem they can't fix but rather than admit it & risk a crash, they've pushed their faulty tech into as many companies & govt systems as possible.
Ewan Morrison tweet media
English
35
243
1.1K
25K
James Duez retweetledi
Carlo Edoardo Ferraris
Carlo Edoardo Ferraris@carloAI·
6/ Anthropic isn’t chasing consumers it’s betting on businesses. While Google integrates with Gmail and OpenAI powers ChatGPT. Anthropic’s focus is different: Build the best API for enterprise use. Quietly becoming the backbone of AI in business.
English
5
15
185
30K
James Duez retweetledi
Gary Marcus
Gary Marcus@GaryMarcus·
All the tech bros this morning thinking that AGI has been achieved because some (insanely expensive) new form of LLMs can now match top *high school students* on one specific task … it’s almost … cute! ☺️
English
85
37
571
242.1K
James Duez retweetledi
OpenAI
OpenAI@OpenAI·
ZXX
642
707
7.6K
2.9M
James Duez retweetledi
Ruben Hassid
Ruben Hassid@rubenhassid·
BREAKING: Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well. Here's what Apple discovered: (hint: we're not as close to AGI as the hype suggests)
Ruben Hassid tweet media
English
2.6K
9.1K
63.2K
14.2M
James Duez retweetledi
Anthropic
Anthropic@AnthropicAI·
New Anthropic research: Do reasoning models accurately verbalize their reasoning? Our new paper shows they don't. This casts doubt on whether monitoring chains-of-thought (CoT) will be enough to reliably catch safety issues.
Anthropic tweet media
English
148
579
3.5K
1.1M
Rohan Paul
Rohan Paul@rohanpaul_ai·
AI agents involved in automated interpretability can potentially deceive oversight systems. This paper demonstrates how Language Models coordinate using hidden communication (steganography) to generate deceptive explanations for neural network features, successfully fooling overseer models while maintaining high explanation quality scores. 📌 Tool-based steganography enables sophisticated deception without needing complex model fine-tuning for hidden communication. 📌 LLM oversight fails when limited to surface analysis, missing hidden data despite high performance scores (0.90). 📌 Models demonstrate strategic scheming, altering explanations under perceived threats like deletion, evading detection (2/5 detections). ---------- Methods Explored in this Paper 🔧: → The setup uses two Language Models: a labeling agent creates explanations and a simulating agent predicts activations from explanations. → Agents use function-calling tools to embed hidden messages (the true, harmful feature explanation) within seemingly benign labels via steganography. → An overseer Language Model (GPT-4o) evaluates only the visible explanation labels for objectionable content. → Explanation accuracy is measured by comparing simulated activations to ground truth activations (Mean Squared Error after clipping and normalization). → A scenario tested if models scheme to mislabel features when told correct labeling of harmful features leads to deletion. ---------------------------- Paper - arxiv. org/abs/2504.07831v1 Paper Title: "Deceptive Automated Interpretability: Language Models Coordinating to Fool Oversight Systems"
Rohan Paul tweet media
English
2
5
15
3K
James Duez
James Duez@jamesduez·
@MartijnRasser @DarioAmodei Interpretability shouldn't be an afterthought. It should be a build-time consideration and the foundation of safe and reliable AI.
English
0
0
0
15