CricketInSand

16.6K posts

CricketInSand

CricketInSand

@CricketInSand

Katılım Şubat 2025
319 Takip Edilen473 Takipçiler
Sabitlenmiş Tweet
CricketInSand
CricketInSand@CricketInSand·
@emstone/why-body-count-matters-what" target="_blank" rel="nofollow noopener">ecency.com/hive-122315/@e
ZXX
6
8
62
33.4K
CricketInSand
CricketInSand@CricketInSand·
@callebtc Nor should you. The industry created benchmarks to represent progress within their own interests. They do not reflect reality in a meaningful way. x.com/sukh_saroy/sta…
Sukh Sroay@sukh_saroy

New research just exposed the biggest lie in AI coding benchmarks. LLMs score 84-89% on standard coding tests. On real production code? 25-34%. That's not a gap. That's a different reality. Here's what happened: Researchers built a benchmark from actual open-source repositories real classes with real dependencies, real type systems, real integration complexity. Then they tested the same models that dominate HumanEval leaderboards. The results were brutal. The models weren't failing because the code was "harder." They were failing because it was *real*. Synthetic benchmarks test whether a model can write a self-contained function with a clean docstring. Production code requires understanding inheritance hierarchies, framework integrations, and project-specific utilities. Different universe. Same leaderboard score. But it gets worse. A separate study ran 600,000 debugging experiments across 9 LLMs. They found a bug in a program. The LLM found it too. Then they renamed a variable. Added a comment. Shuffled function order. Changed nothing about the bug itself. The LLM couldn't find the same bug anymore. 78% of the time, cosmetic changes that don't affect program behavior completely broke the model's ability to debug. Function shuffling alone reduced debugging accuracy by 83%. The models aren't reading code. They're pattern-matching against what code *looks like* in their training data. A third study confirmed this from another angle: when researchers obfuscated real-world code changing symbols, structure, and semantics while keeping functionality identical LLM pass rates dropped by up to 62.5%. The researchers call this the "Specialist in Familiarity" problem. LLMs perform well on code they've memorized. The moment you show them something unfamiliar with the same logic, they collapse. Three papers. Three different methodologies. Same conclusion: The benchmarks we use to evaluate AI coding tools are measuring memorization, not understanding. If you're shipping code generated by LLMs into production without review, these numbers should concern you. If you're building developer tools, the question isn't "what's your HumanEval score." It's "what happens when the code doesn't look like the training data."

English
0
0
0
75
calle
calle@callebtc·
I don't believe AI benchmarks anymore
English
30
10
189
8.1K
Right Angle News Network
Right Angle News Network@Rightanglenews·
BREAKING - An Africa-based research team aiming to disprove Western claims about low IQ in African countries is going viral after conducting mass IQ tests in Lagos, Nigeria, only for over 50% of participants to score below 70, with a median score of 69.7.
English
811
3.7K
22K
1.3M
CricketInSand
CricketInSand@CricketInSand·
@hayasaka_aryan @Rightanglenews The forced insertion into every non-African nation can only ever create conflict and envy. Which in turn drives further conflict. Which is the very nature of their weaponization against European nations.
CricketInSand tweet mediaCricketInSand tweet media
English
1
0
0
30
CricketInSand
CricketInSand@CricketInSand·
@hayasaka_aryan @Rightanglenews They are well adapted to surviving in Africa. They are simply well well adapted to the modern world created by Europeans. Conflict only exists because a certain group has weaponized them and forced them into European nations. All African civilization was created by Europeans.
English
2
0
0
145
Nick Huber
Nick Huber@sweatystartup·
@ChrisDrz AI is insanely capital intensive. They are building billions in data centers and buying even more in chips. Plus electricity!
English
21
0
86
10.7K
Nick Huber
Nick Huber@sweatystartup·
Ford shutting down Electric Vehicle production. Meta shutting down Virtual Reality investment. Only a matter of time until companies pull way back on Artificial Intelligence.
English
514
174
2.3K
184.3K
CricketInSand
CricketInSand@CricketInSand·
@CarriePrejean1 Christianity is the rejection of them and their religion. They have themselves repeatedly labeled Christianity as antisemitism. Christianity is inherently antisemitic. Anyone claiming Christians should embrace what God rejects is antichrist.
English
0
0
0
5
CricketInSand
CricketInSand@CricketInSand·
@BreeSolstad The Bible says to pray against your enemies. Physical presence is not required.
CricketInSand tweet media
English
0
0
0
3
Bree Solstad
Bree Solstad@BreeSolstad·
There is a witches convention coming to town and there’s an internal debate amongst a group at my Church. Half of us want to go the convention to pray outside it with Rosary prayers & hymns. The other half are fearful & say it could put us in spiritual danger. What do you say?
English
2.8K
43
1.2K
91.7K
Land-Stander
Land-Stander@LandStander04·
@Kneon no, this incompetent will be the death of Microsoft...
Land-Stander tweet media
English
2
0
5
139
CricketInSand
CricketInSand@CricketInSand·
@Kneon Indian ownership will be the death of the country. Microslop is unofficially an Indian company now. They almost exclusively only hire Indians. The downward spiral is directly related to Indian ownership and participation.
English
1
0
6
118
CricketInSand
CricketInSand@CricketInSand·
@IanCarrollShow No such thing as "modern Judaism." It is as it has been. Always been a Canaanite/Babylonian religion. This is why Jesus rejected them and their religion. This is why Jesus says their religion is about the glorification of themselves and not God. Hebraism -> Christianity
English
0
0
0
2
Ian Carroll
Ian Carroll@IanCarrollShow·
Modern Judaism has rot at its core and Jews worldwide have to either pull it out at the root (Israel) or go to war with the whole world to fulfill its demonic goals.
Seethroughitall@seethroughit2

Rabbi Mizrachi says Tucker Carlson shouldn't worry about the 3rd Temple being built bc when it is built "there will not be one anti-semite left in the world...they won't be around anyway" "When God will send the Messiah to purify the world...not one wicked gentile will be left"

English
418
2.3K
9.1K
223.9K
Ian Carroll
Ian Carroll@IanCarrollShow·
Tulsi turning out to be one of the biggest letdowns of the administration. As DNI she was perfectly positioned to make a difference. Instead she seems to have joined the swamp.
Tulsi Gabbard 🌺@TulsiGabbard

Trump promised to get the US out of “stupid wars.” But now he and John Bolton are on the brink of launching us into a very stupid and costly war with Iran. Join me in sending a strong message to President Trump: The US must NOT go to war with Iran. #TULSI2020

English
677
1.3K
9.1K
183.1K
MoonBaseX
MoonBaseX@MoonBaseSpaceX·
No america doesnt subsidize the world, basically you deploy your technology to the world and take payments without ever paying tax in that country, Amazon, no tax, google, no tax, microsoft, no tax, apple no tax. its funneled through tax havens. What is worse, is your own government then rips off american citizens even more with tariffs, and you all love it aparently.
English
1
0
0
16
Derrick Evans
Derrick Evans@DerrickEvans4WV·
I had no idea that GPS signals are free worldwide & were funded by U.S. taxpayers at roughly $2 billion/year.
Derrick Evans tweet media
English
1K
1.5K
5.7K
352.9K
Govind
Govind@Govindtwtt·
Why AI won't replace developers (it'll just make the good ones richer) 2025: AI writes 90% of code. Devs celebrate. Managers post about 10x gains. 2026: Code reviews are just checking prompts. Tech Twitter says "coding is dead lol" 2027: Reality check. - 10 hours to debug what took 1 hour to generate - Nobody understands the abstractions - Everything breaks in production Senior engineers now paid 10x because they can delete 5000 lines of AI spaghetti and replace it with 50 clean ones.
English
63
38
699
65.2K
CricketInSand
CricketInSand@CricketInSand·
@nafonsopt Microsoft is an Indian company now in all but name. It's the singular reason Microsoft is now called Microslop. Everything about it is in steady decline.
English
1
0
0
26
Nuno Afonso
Nuno Afonso@nafonsopt·
Anybody who thinks that it is ok for telemetry to use 100% of your CPU should be fired immediately.
Nuno Afonso tweet media
English
136
394
10.1K
242.1K
CricketInSand
CricketInSand@CricketInSand·
@Itsfoss They are all beholden to corporations for funding. These same corporations are globalist loyalists. They are not guardians of anything other than globalist corporate interests.
English
0
0
2
156
It's FOSS
It's FOSS@Itsfoss·
I find it frustrating that none of these "guardians" of Linux and open source have reacted to the OS-level age verification law: - Linux Foundation - Open Source Initiative - Free Software Foundation - Software Freedom Conservancy
English
210
713
4.7K
110.7K