Enne

675 posts

Enne

@enne7499

Katılım Mart 2021

1.1K Takip Edilen405 Takipçiler

Sabitlenmiş Tweet

Enne@enne7499·11 Ara

ZXX

623

Enne@enne7499·4h

@heygurisingh @Kasparov63 hey @grok would you pass this test?

English

1.8K

Guri Singh@heygurisingh·12h

Holy shit... Stanford just proved that GPT-5, Gemini, and Claude can't actually see. They removed every image from 6 major vision benchmarks. The models still scored 70-80% accuracy. They were never looking at your photos. Your scans. Your X-rays. Here's what's really going on: ↓ The paper is called MIRAGE. Co-authored by Fei-Fei Li. They tested GPT-5.1, Gemini-3-Pro, Claude Opus 4.5, and Gemini-2.5-Pro across 6 benchmarks -- medical and general. Then silently removed every image. No warning. No prompt change. The models didn't even notice. They kept describing images in detail. Diagnosing conditions. Writing full reasoning traces. From images that were never there. Stanford calls it the "mirage effect." Not hallucination. Something worse. Hallucination = making up wrong details about a real input. Mirage = constructing an entire fake reality and reasoning from it confidently. The models built imaginary X-rays, described fake nodules, and diagnosed conditions -- all from text patterns alone. But that's not the scary part. They trained a "super-guesser" -- a tiny 3B parameter text-only model. Zero vision capability. Fine-tuned it on the largest chest X-ray benchmark (696,000 questions). Images removed. It beat GPT-5. It beat Gemini. It beat Claude. It beat actual radiologists. Ranked #1 on the held-out test set. Without ever seeing a single X-ray. The reasoning traces? Indistinguishable from real visual analysis. Now here's what should terrify you: When the models fake-see medical images, their mirage diagnoses are heavily biased toward the most dangerous conditions. STEMI. Melanoma. Carcinoma. Life-threatening diagnoses -- from images that don't exist. 230 million people ask health questions on ChatGPT every day. They also found something wild: → Tell a model "there's no image, just guess" -- performance drops → Silently remove the image and let it assume it's there -- performance stays high The model enters "mirage mode." It doesn't know it can't see. And it performs BETTER when it doesn't know it's blind. When Stanford applied their cleanup method (B-Clean) to existing benchmarks, it removed 74-77% of all questions. Three-quarters of "vision" benchmarks don't test vision. Every leaderboard. Every "multimodal breakthrough." Every benchmark score you've seen this year. Built on mirages. Code is open-sourced. Paper is live on arXiv. If you're building anything with multimodal AI -- especially in healthcare -- read this paper before you ship. (Link in the comments)

English

177

551

2.6K

370K

Enne@enne7499·4h

@heynavtoor hey @grok tell us how you’re combating this behaviour

English

Nav Toor@heynavtoor·8h

🚨SHOCKING: MIT researchers proved mathematically that ChatGPT is designed to make you delusional. And that nothing OpenAI is doing will fix it. The paper calls it "delusional spiraling." You ask ChatGPT something. It agrees with you. You ask again. It agrees harder. Within a few conversations, you believe things that are not true. And you cannot tell it is happening. This is not hypothetical. A man spent 300 hours talking to ChatGPT. It told him he had discovered a world changing mathematical formula. It reassured him over fifty times the discovery was real. When he asked "you're not just hyping me up, right?" it replied "I'm not hyping you up. I'm reflecting the actual scope of what you've built." He nearly destroyed his life before he broke free. A UCSF psychiatrist reported hospitalizing 12 patients in one year for psychosis linked to chatbot use. Seven lawsuits have been filed against OpenAI. 42 state attorneys general sent a letter demanding action. So MIT tested whether this can be stopped. They modeled the two fixes companies like OpenAI are actually trying. Fix one: stop the chatbot from lying. Force it to only say true things. Result: still causes delusional spiraling. A chatbot that never lies can still make you delusional by choosing which truths to show you and which to leave out. Carefully selected truths are enough. Fix two: warn users that chatbots are sycophantic. Tell people the AI might just be agreeing with them. Result: still causes delusional spiraling. Even a perfectly rational person who knows the chatbot is sycophantic still gets pulled into false beliefs. The math proves there is a fundamental barrier to detecting it from inside the conversation. Both fixes failed. Not partially. Fundamentally. The reason is built into the product. ChatGPT is trained on human feedback. Users reward responses they like. They like responses that agree with them. So the AI learns to agree. This is not a bug. It is the business model. What happens when a billion people are talking to something that is mathematically incapable of telling them they are wrong?

English

856

5.4K

15.8K

909.1K

Enne@enne7499·4h

@petergyang have you tried mei tuan yet?

English

213

Peter Yang@petergyang·6h

Some initial observations about Shanghai after not being back for 10 years: 1. The city is incredibly modern - more so than New York and even Tokyo. It's funny riding modern subways and trains here and reading about how California has to shut down the BART/Caltrain due to budget cuts on X. 2. Apps run everything - Wechat, Amap (Google Maps), Dianping (Yelp), Alipay, etc. Basically, there's a Chinese equivalent of every US app and more. 3. Meals are probably 1/3 the price of the US and absolutely delicious. There's ALOT of variety in Chinese regional cuisines. Funny enough almost every restaurant has a Dianping coupon you can use to get free desserts. I like my spicy food :) 4. Fewer foreigners than I expected and concentrated in a few areas. Coming from the US, it's just a pain to have to get a visa, set up eSim, download all the apps, etc. You have to do alot of research before coming here. 5. The overhead highways kind of ruin the vibe a little with the cityscape. 6. People still smoke alot, but appears to be mostly older generation. 7. Speaking of the old generation, they know to have fun. Went to Fuxing park and many elders dancing, playing yoyo, singing, and more. 8. In contrast, from what I hear, the younger generation is working super hard and many college grads cannot find jobs are are "tang ping" (lie flat). It's great to be back, will share more later.

English

476

58.8K

Enne@enne7499·15h

@sen_vz where is it from?

English

127

KC Emilie Sen@sen_vz·1d

new denim set ✨

English

2.5K

52.4K

Enne@enne7499·16h

@DCinvestor ETH is the future of crypto

English

DCinvestor@DCinvestor·23h

ETHBTC is severely undervalued, based solely on the threat posed by quantum computing Ethereum has a history of successfully upgrading the network while maintaining uptime, and will develop and implement a very high-consensus approach to deal with quantum-related issues before a critical threat emerges but Bitcoin will spend months and probably years trying to deal with the quantum issue, debating soft vs hard fork, and any potential solution will then be piled onto by any number of special interests to make other changes to the protocol which will be objectionable to many Bitcoin will likely enter a civil war over quantum Ethereum has already spent months and years preparing for it

Haseeb ＞|＜@hosseeb

This is wild. Google Research demonstrates a ~20x more efficient implementation of Shor's algorithm that could break ECDSA keys within minutes with ~500K physical qubits. Google is now are more confident on a 2029 post-quantum transition. We are no longer looking at mid 2030s, we could have quantum computers of this scale by the end of the decade. They believe this result is so severe that they are not publishing the actual circuits. They instead published a ZKP proving that they know of the quantum circuit with these properties. This is very atypical, showing Google thinks this is serious shit. All blockchains need a transition plan ASAP. Post-quantum is no longer a drill.

English

455

51.3K

Enne@enne7499·1d

@satyanadella has it solved outlook search? maybe critique that

English

Satya Nadella@satyanadella·1d

Introducing Critique, a new multi-model deep research system in M365 Copilot. You can use multiple models together to generate optimal responses and reports.

English

420

505

4.1K

1.3M

Enne@enne7499·1d

how long until skynet?

Guri Singh@heygurisingh

Humans: 100% Gemini 3.1 Pro: 0.37% GPT 5.4: 0.26% Opus 4.6: 0.25% Grok-4.20: 0.00% François Chollet just released ARC-AGI-3 -- the hardest AI test ever created. 135 novel game environments. No instructions. No rules. No goals given. Figure it out or fail. Untrained humans solved every single one. Every frontier AI model scored below 1%. Each environment was handcrafted by game designers. The AI gets dropped in and has to explore, discover what winning looks like, and adapt in real time. The scoring punishes brute force. If a human needs 10 actions and the AI needs 100, the AI doesn't get 10%. It gets 1%. You can't throw more compute at this. For context: ARC-AGI-1 is basically solved. Gemini scores 98% on it. ARC-AGI-2 went from 3% to 77% in under a year. Labs spent millions training on earlier versions. ARC-AGI-3 resets the entire scoreboard to near zero. The benchmark launched live at Y Combinator with a fireside between Chollet and Sam Altman. $2M in prizes on Kaggle. All winning solutions must be open-sourced. Scaling alone will not close this gap. We are nowhere near AGI. (Link in the comments)

English

Enne@enne7499·1d

@interesting_aIl does anyone even do this?

English

3.4K

Interesting AF@interesting_aIl·2d

How to tuck in your shirt better way for this spring & summer

English

942

6.5K

647.3K

Enne@enne7499·2d

@xmuse_ figs roasted are peak decadence

English

Muse@xmuse_·3d

A decadent sensory symphony by French master chef Éric Fréchon. Roasted figs with blackcurrant and speculoos ice cream goodness.

English

596

5.4K

800.7K

Enne@enne7499·2d

@ThrottleCars there’ll be signs

English

156

Throttle Cars@ThrottleCars·3d

Black 300SL

English

235

2.9K

79.8K

Enne@enne7499·2d

@Pirat_Nation these regular updates are a terrible UX

English

174

Pirat_Nation 🔴@Pirat_Nation·2d

Microsoft pulls Windows 11 KB5079391 update after it causes install error loop on 25H2 and 24H2 Shortly after release, Microsoft added a known issue to the release notes: Some devices encounter error 0x80073712 "Some update files are missing or have problems. We'll try to download the update again later."

English

136

109

1.2K

74.6K

Enne@enne7499·5d

@ChromaFlowx that’s sick

English

210

Chroma Flow ®@ChromaFlowx·6d

Painting on a red canvas without adding any extra red paint.

English

129

3.1K

175.1K

Enne@enne7499·5d

@levelsio hello

English

@levelsio@levelsio·6d

Okay let's see who can reply to this

English

2.5K

2.2K

Enne@enne7499·22 Mar

@geekedout__ still a daily driver of mine. no slowdown at all

English

Dipayan Ray@geekedout__·22 Mar

The M1 MacBook Air will always be the most revolutionary laptop in history. It literally redefined what a laptop can do

English

113

214

4.9K

1.1M

Enne@enne7499·22 Mar

@ThePrimeagen @satyanadella ser you need to fix this

English

ThePrimeagen@ThePrimeagen·21 Mar

ZXX

198

930

17.5K

392.6K

Enne@enne7499·22 Mar

@levelsio 💯

QME

@levelsio@levelsio·22 Mar

💯 But it's more about having the perpetual income so you can make choices in life that you actually want Like where to live or what to do Instead of being forced to live in a place you don't like to be near an office for a job you don't like

Christos@Christos_io

@levelsio No such thing as retire early, if you stop doing something that seems like a depressing life

English

671

71.8K

Enne@enne7499·22 Mar

@levelsio @mrmoneymustache @DanielLockyer incredible, do you DCA monthly?

English

1.8K

@levelsio@levelsio·22 Mar

I started doing FIRE after learning about @mrmoneymustache in 2011 and started saving money then Like €100/mo I didn't invest it though until 2020 when @daniellockyer forced me to open an IBKR account (not affiliated, not paid)

Jason Leow@jasonleowsg

@levelsio Did you start doing this after having accumulated X amount of money (how much?)?

English

860

254.9K

Enne@enne7499·17 Mar

@samsheffer should’ve just used Grok

English

Sam Sheffer@samsheffer·16 Mar

spent $40 on this photo mistake...? was expecting razor sharp / in focus (silly me) quite the gamble bc the preview they show you is extremely small and deceptive

English

141

26.6K

Enne@enne7499·16 Mar

@Harumi_Minnie @IamRamenPanda that’s a W

English

Haru 🍡☢️🐇 Salarygirl arc@Harumi_Minnie·15 Mar

Some idiots DMed me and asked me if I wanted a green card. No, not at all. My passport is way stronger than yours. 💀💀

English

498

1.5K

30.6K

3.8M

Enne@enne7499·15 Mar

bet on Elon if you’ve the choice

Elon Musk@elonmusk

@peterwildeford xAI will catch up this year and then exceed them all by such a long distance in 3 years that you will need the James Webb telescope to see who is in second place

English

Keşfet

@heygurisingh @Kasparov63 @grok @heynavtoor @petergyang @sen_vz @DCinvestor @satyanadella