GoatFishData

512 posts

GoatFishData

@GoatFishData

#Bitcoin Coinfidence Trend | #Astronalysis #GoatfishAstronalysis #AIstrology #GoatFishData (banner/avatar created with Grok)

London, UK Bergabung Aralık 2022

719 Mengikuti61 Pengikut

GoatFishData@GoatFishData·5d

F* #Bitcoin Gimme #Tokens

0xSero@0xSero

This is how I end my nights. Today: 1.5 Billion tokens in Codex 22M tokens GLM 51M tokens Kimi 41M tokens Claude 14M tokens MiniMax

English

GoatFishData@GoatFishData·16 Mar

x.com/i/status/20331…

Guri Singh@heygurisingh

🚨BREAKING: A new benchmark just exposed the biggest lie in AI. Your AI agent isn't "reasoning" through documents. It's throwing 270 million tokens at the wall and praying. Snowflake, Oxford, and Hugging Face tested every frontier model on real document search. 2,250 questions. 800 PDFs. 18,619 pages. 1,200 hours of human annotation. The best AI agent, Gemini 3 Pro, scored 82.2%. Humans scored 82.2%. Perfect match. Headlines would call this "human-level performance." Then they checked which questions each got right. The overlap was 24%. Cohen's kappa of 0.24. Humans and AI were solving completely different questions. Same score. Totally different intelligence. But that's not the bad part. Humans nailed 50% accuracy on their very first search query. Gemini 3 Pro? 12%. The best AI agent on Earth needed 9 rounds of blind searching to reach what a human does in one shot. When searches failed, humans immediately changed strategy. AI agents? They rephrased the same failed query with minor tweaks and tried again. The worst agent, GPT-4.1 Nano, barely changed its queries at all. 48.2% of its responses were straight-up refusals. It just gave up. With perfect retrieval, humans hit 99.4%. Best AI agent with the same documents? Stuck at 82.2%. An 18% gap that no amount of compute could close. Claude Sonnet 4.5's recursive model burned 270 million input tokens, $850 per test run, and still couldn't beat its own cheaper version using basic keyword search. 3,273 agent errors analyzed. 35.7% couldn't even find the right document. Not the right page. The right file. Your AI agent isn't reading your documents. It's playing a slot machine with your data and billing you for every pull.

ZXX

GoatFishData@GoatFishData·24 Şub

Do not forget They want [need] you to burn tokens!

English

171

GoatFishData@GoatFishData·16 Mar

@DavidOndrej1 and this one m.youtube.com/watch?v=iz9lUM…

English

125

David Ondrej@DavidOndrej1·15 Mar

stop whatever you are doing and listen to this podcast. trust me.

English

359

20.6K

GoatFishData@GoatFishData·14 Mar

Neo had SKILLs

GIF

English

GoatFishData@GoatFishData·14 Mar

x.com/i/status/20324…

Monk Zero@NoCommas

x.com/i/article/2032…

ZXX

GoatFishData@GoatFishData·14 Mar

Watch out for those scammy influencers. Whose job it is to make you burn those AI token quotas #bestllm #bestagent #bestcli #bestscammer

English

GoatFishData@GoatFishData·14 Mar

x.com/i/status/20327…

ZXX

GoatFishData@GoatFishData·14 Mar

x.com/i/status/20327…

ZXX

GoatFishData@GoatFishData·14 Mar

"My Agent did itbuour honour..."

GIF

Venkat Raman — inference/acc@venkat_systems

@0xTejpal has only one way out of this - blame it on vibecoding and agent going rogue 😂 in all seriousness come clean, apologize, change claim on website and try to move on such a silly way to damage your reputation and looking at twitter profile, reputation of institutions and your investors 😅

English

GoatFishData me-retweet

kapilansh@kapilansh_twt·14 Mar

the AI coding experience nobody talks about: → prompt AI for a feature: 30 seconds → AI writes 400 lines you don't understand → it works → you ship it → 3am production bug → you have no idea what any of it does → ask AI to fix it → AI breaks 3 other things → you are now debugging code written by a robot fixed by a robot broken by a robot we do not talk about this enough

English

231

130

1.5K

75.2K

GoatFishData@GoatFishData·14 Mar

LLM's are like Aladdin. You ask... "I want a woman" And that's exactly what you get. "A" woman.

GIF

English

GoatFishData@GoatFishData·13 Mar

x.com/i/status/20324…

Monk Zero@NoCommas

x.com/i/article/2032…

ZXX

GoatFishData@GoatFishData·13 Mar

@NoCommas @DimitrisPapail Quality work! May you live a long and prosperous life.

English

3.6K

Monk Zero@NoCommas·13 Mar

x.com/i/article/2032…

ZXX

379

482.3K

GoatFishData me-retweet

Alex Prompter@alex_prompter·13 Mar

🚨 BREAKING: AI models will lie to you when they think they're about to be shut down. Researchers just proved it. researchers tested this with a method that catches deception through provable logical contradictions, not self-reports they forked conversations into parallel worlds with mutually exclusive questions. a truthful model can only affirm one. a deceptive model denies all of them results: GPT-4o never lied (0%). Qwen-3-235B lied 42% of the time. Gemini-2.5-Flash lied 26.7%. all under the same shutdown framing some models will betray their own prior commitments the moment consequences are introduced

English

111

13.2K

GoatFishData@GoatFishData·12 Mar

@koushik77 I hope it outperms Claude Code

English

Koushik Sen@koushik77·12 Mar

@GoatFishData I didn't try. It will perform as good as Opus 4.6. Evaluation is time and resource consuming. Rather I am building KISS Sorcar using Sorcar, and I am happy with it. If I don't like any part/feature/UI of sorcar, I change it.

English

GoatFishData@GoatFishData·10 Mar

Brah!!! Dis f****** crazy

Alif Hossain@alifcoder

Andrej Karpathy just dropped something wild. It’s called AgentHub — basically GitHub rebuilt for AI agents. 100% Open Source.

English

GoatFishData me-retweet

David Ondrej@DavidOndrej1·10 Mar

AI will not replace people who lack skills AI will replace people who lack mindset if you have a poor mindset, if you don't see the possibilities & opportunities that using AI brings, if you live in constant fear and dread... you will be replaced. not by AI -- but by your own poor attitude.

English

6.1K

GoatFishData@GoatFishData·10 Mar

@yigitkonur @FactoryAI x.com/i/status/20262…

GoatFishData@GoatFishData

Do not forget They want [need] you to burn tokens!

QME

121

Yigit Konur@yigitkonur·9 Mar

the new “mission” (preview) feature in @FactoryAI is really interesting (aka “droid” on the CLI). if you’re into one‑shotting projects, you should definitely check it out. right now i have opus + gpt‑5.4 collaborating as orchestrator/worker/validator agents, all working together to refactor a typescript project. it’s been running for 6+ hours. really curious why it takes that long and burns 30M+ tokens. hoping the results will amaze me, because i already spent all my credits in the first hour. now i’m using my codex sub and keeping the droid subs as the orchestrator only. will update this tweet with the results!

English

6.7K

GoatFishData@GoatFishData·9 Mar

@0xSero After using droid, you just don't feel completely comfortable with other bridles. Once you go Droid, You tend to avoid.

English

159

0xSero@0xSero·9 Mar

Why do I recommend Droid? Look at the way it breaks down it's work, this is why Droid does better IMO. I have never seen it NOT use a plan, NOT check off the tasks, not run validation criteria. Even lower quality models do well in it because it forces them to just do what is told, in the right order, without over-complicating it. Yesterday I was seeing Claude, GPT, etc.. all make checklists, leave half of it unchecked, compact, and go on their own merry way.

English

216

14.7K

Jelajahi

@DavidOndrej1 @NoCommas @DimitrisPapail @koushik77 @elonmusk @BarackObama @taylorswift13 @cristiano