alexusadays

773 posts

alexusadays

@alexusadays

Learn Quality Assurance from Scratch https://t.co/fNTFKJVQh3

Katılım Aralık 2022

629 Takip Edilen159 Takipçiler

alexusadays retweetledi

TFTC@TFTC21·1d

Jensen Huang: "If that $500,000 engineer did not consume at least $250,000 worth of tokens, I am going to be deeply alarmed. This is no different than a chip designer who says 'I'm just going to use paper and pencil. I don't think I'm going to need any CAD tools.'"

English

421

552

7.4K

2.2M

alexusadays retweetledi

Coder girl 👩‍💻@dev_maims·4d

POV: you’re a developer in 2026😂

English

207

2.1K

224.7K

alexusadays retweetledi

Priyanka Vergadia@pvergadia·4d

🤯BREAKING: Alibaba just proved that AI Coding isn't taking your job, it's just writing the legacy code that will keep you employed fixing it for the next decade. 🤣 Passing a coding test once is easy. Maintaining that code for 8 months without it exploding? Apparently, it’s nearly impossible for AI. Alibaba tested 18 AI agents on 100 real codebases over 233-day cycles. They didn't just look for "quick fixes"—they looked for long-term survival. The results were a bloodbath: 75% of models broke previously working code during maintenance. Only Claude Opus 4.5/4.6 maintained a >50% zero-regression rate. Every other model accumulated technical debt that compounded until the codebase collapsed. We’ve been using "snapshot" benchmarks like HumanEval that only ask "Does it work right now?" The new SWE-CI benchmark asks: "Does it still work after 8 months of evolution?" Most AI agents are "Quick-Fix Artists." They write brittle code that passes tests today but becomes a maintenance nightmare tomorrow. They aren't building software; they're building a house of cards. The narrative just got honest: Most models can write code. Almost none can maintain it.

English

486

1.9K

9.4K

1.7M

alexusadays retweetledi

Lukasz Olejnik@lukOlejnik·10 Mar

Amazon is holding a mandatory meeting about AI breaking its systems. The official framing is "part of normal business." The briefing note describes a trend of incidents with "high blast radius" caused by "Gen-AI assisted changes" for which "best practices and safeguards are not yet fully established." Translation to human language: we gave AI to engineers and things keep breaking? The response for now? Junior and mid-level engineers can no longer push AI-assisted code without a senior signing off. AWS spent 13 hours recovering after its own AI coding tool, asked to make some changes, decided instead to delete and recreate the environment (the software equivalent of fixing a leaky tap by knocking down the wall). Amazon called that an "extremely limited event" (the affected tool served customers in mainland China).

English

975

3.3K

19K

29.8M

alexusadays@alexusadays·10 Mar

@elonmusk @karpathy @farzyness Thank god, when we can start receiving our UBI and figure out our real purpose 😂

English

Elon Musk@elonmusk·10 Mar

@karpathy @farzyness We are in the Singularity

English

900

874

8.2K

648.1K

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

961

2.1K

19.3K

3.5M

alexusadays@alexusadays·10 Mar

@brockpierson Like hello sir, how are you, kind of normal?

English

⭕ Brock Pierson@brockpierson·10 Mar

One of the best growth hacks on this platform: Actually being a normal human in replies. It’s shockingly rare.

English

149

265

11.4K

alexusadays@alexusadays·10 Mar

@_falsi1ke Hello?

English

Ja Leto@_falsi1ke·9 Mar

No account should be under 1K followers Say hello, I will boost you

English

6.3K

439

5.1K

632.3K

alexusadays@alexusadays·9 Mar

@Grummz Are you at GDC this year?

English

241

Grummz@Grummz·9 Mar

Good afternoon

English

1.3K

27.5K

alexusadays@alexusadays·9 Mar

If any one is interested in #AI testing in gaming, checkout #NodeMori (booth 439) at #GDC

English

152

alexusadays@alexusadays·9 Mar

@sylviawangv NodeMori building ai for testing

English

Sylvia Wang@sylviawangv·9 Mar

Is AI interactive content just gaming? And a very small sub-stack of it? My biggest question lately and leaning more and more towards yes after what I see at GDC on day 1. #gdc

English

170

alexusadays retweetledi

Volodymyr Zelenskyy / Володимир Зеленський@ZelenskyyUa·4 Mar

I held a meeting on the situation in the Middle East and the Gulf region – the challenges for Ukraine and our partners, as well as our capacity to help protect lives, prevent the war from expanding, and stabilize global markets. The Iranian regime, which is striving to survive at any cost, poses clear threats to all states in the region and to global stability. No country close to Iran can feel secure. Shipping through the Strait of Hormuz has practically stopped. So far, the Iranian regime has shown no genuine intent for honest diplomacy or fundamental change. Ukraine is consulting with partners in Europe and the United States, as well as with countries neighboring Iran. Yesterday, I spoke with the leaders of the UAE and Qatar. Today, I held discussions with the leaders of Jordan and Bahrain. There will also be talks with Kuwait and other countries in the region. All of them face a serious challenge and speak openly about it: Iranian attack drones are the same “shaheds” that have been striking our cities, villages, and our Ukrainian infrastructure throughout this war. In just a few days, Iran has launched over 800 missiles of various types and more than 1,400 attack drones. It is Iranian drones and missiles that pose the main threat to free navigation, destabilizing global prices for oil, petroleum products, and gas. Ukraine can contribute to protecting lives and stabilizing the situation. Partners are reaching out to us. I have tasked Ukraine’s Minister of Foreign Affairs, together with intelligence agencies, the Minister of Defense, our military command, and the NSDC Secretary, to present options for assisting the relevant countries and to provide aid in a way that does not weaken our own defense here in Ukraine. Our military possesses the necessary capabilities. Ukrainian experts will operate on-site, and teams are already coordinating these efforts. And we are ready to help protect lives, defend civilians, and support real efforts to stabilize the situation and, in particular, restore safe navigation in the region. We expect the European Union, European countries, and the G7 to take active measures both in dismantling the Iranian regime’s terrorist capabilities and in protecting lives in the region and global stability. We will continue to coordinate with our partners. Glory to Ukraine!

Volodymyr Zelenskyy / Володимир Зеленський tweet media

English

596

2.1K

9.5K

1.3M

alexusadays retweetledi

kitze 🛠️ tinkerer.club@thekitze·28 Şub

me and claude code all day every day

English

413

1.9K

21.7K

4.5M

alexusadays retweetledi

Aakash Gupta@aakashgupta·26 Şub

The real story is the $25 million per mile price tag they’re betting on. Nashville’s own 2018 light rail plan priced at $200 million per mile. New York’s East Side Access cost $3.5 billion per mile. The LA Metro expansion is running $1 billion per mile. The Boring Company says it can build 13 miles of twin tunnels through Nashville for $240-300 million total. That’s a 95% cost reduction from the industry average. If the number holds, it rewrites the economics of every transit project in America. If it doesn’t, a few hundred million in private capital evaporates and taxpayers lose nothing. That risk asymmetry explains why Tennessee said yes when LA, Chicago, Baltimore, and DC all said no. The engineering gamble is wild. 12-foot diameter tunnels instead of 28-foot. Fully electric Prufrock machines that mine continuously instead of stopping every 5 feet to install lining segments. Zero people in the tunnel during operations. A machine that “porpoises” into the ground from a truck instead of requiring million-dollar launch pits and cranes. Every one of those innovations has worked in Las Vegas sand. None have been tested in karst limestone, the geology that creates sinkholes, caves, and underground streams. Their own CEO said at the unveiling that Nashville would not be their choice if they were optimizing for easiest places to tunnel. This tells you everything about what The Boring Company is actually trying to prove. Nashville is where the thesis meets the hardest possible geology. 50 inches of annual rainfall versus Vegas’s 4. Rock that creates underground caves and streams. They just signed a construction contract in Dubai too, meaning they need Nashville to work before the next project launches. The internal memo from the governor’s office estimates 1 mile per month. The Boring Company’s website claims 1 mile per week. That 4x gap between political planning and corporate marketing will determine whether this finishes in 2027 or 2030. Week 7, when Prufrock-MB2 arrives, is when this gets real. Two machines boring simultaneously through Tennessee limestone will answer the question the entire tunneling industry has been debating for a decade: whether a startup can actually outrun the physics that made infrastructure the slowest-moving sector in construction.

The Boring Company@boringcompany

Tunneling has begun in Nashville - we are 2.5 feet in! Looking ahead: - Weeks 1-3: Prufrock-MB1 launches and undergoes a series of tests and calibrations (low production) - Weeks 4-6: scale to high production - Week 7: Prufrock-MB2 arrives

English

673

1.2K

10.5K

37M

alexusadays retweetledi

NIK@ns123abc·23 Şub

🚨 META’s head of AI safety and alignment gets her emails nuked by OpenClaw >be director of AI Safety and Alignment at Meta >install OpenClaw >give it unrestricted access to personal emails >it starts nuking emails >“Do not do that” >*keeps going* >“Stop don’t do anything” >*gets all remaining old stuff and nukes it aswell* >“STOP OPENCLAW” >“I asked you to not do that” >“do you remember that?” >“Yes I remember. And I violated it.” >“You’re right to be upset” LMAOOOOOOOO

English

1.1K

2.6K

29.9K

2.9M

alexusadays retweetledi

Proudofus.uk@ProudofusUK·16 Şub

Britain took on fifty nations. African kings. Arab sultans. Bombarded ports. Deposed rulers who refused. Lost 1,600 men. Spent 40% of the entire Treasury. A debt so big it wasn't paid off until 2015. You or your parents were still paying for it. Not for land. Not for gold. To end the slave trade. They captured 1,600 ships. Freed 150,000 people. Patrolled 3,000 miles of coastline for sixty years. And none of it was the government's idea. It was 400,000 ordinary people who signed petitions and 300,000 families who refused to buy sugar. They forced Parliament's hand. Your ancestors changed the world and nobody told you. If you think this should be taught in schools, help us reach more people: proudofus.co.uk/support Be part of us Be Proud Of Us 🇬🇧

English

826

8.6K

31.6K

1.2M

alexusadays retweetledi

Juan@7uanF·13 Şub

HBO nos dió el final de Game Of Thrones que ellos quisieron y mucha gente enloqueció seedance 2.0 (IA) nos dió el final que mucha gente quería, véanlo como una demostración de lo que la IA es capaz

Español

576

1.5K

18.2K

2.6M

alexusadays@alexusadays·13 Şub

@xAIMemphis @elonmusk @xai Why not put them under ground?

English

639

xAI Memphis@xAIMemphis·13 Şub

From Day One, @xAI has been upgrading grid infrastructure and ensuring ratepayers don’t pick up our tab. Today, we are continuing our commitment by installing our own power lines to power our MACROHARD facilities.

English

514

1.1K

7.9K

941.1K

alexusadays retweetledi

Elon Musk@elonmusk·12 Şub

🤨

DogeDesigner@cb_doge

BREAKING: Anthropic's Claude AI has shown in testing that it's willing to blackmail and kill in order to avoid being shut down. Elon Musk was right about everything. 💀

ART

5.9K

25.5K

133.8K

23.5M

alexusadays retweetledi

tetsuo@tetsuoai·11 Şub

21 people are controlling computers with their thoughts. Not in a research demo. In their daily lives. Neuralink's Telepathy trial has nearly doubled since September. The implant reads your motor cortex and translates neural signals into digital commands. No voice. No eye tracking. No hands. One participant hit able-bodied cursor speeds in his first week. Others are typing at 40 words per minute using imagined finger movements. A med student uses it 17 hours a day to study. A woman who hadn't controlled a computer in 20 years is making art. A father with ALS rigged a 360 camera to his wheelchair and controls it telepathically so he can watch his kids. Next up: 3x the electrodes, a new trial targeting real-time speech at 140 WPM, and zero serious device-related adverse events across every participant so far. Brain-computer interfaces are no longer theoretical.

English

503

1.2K

6.1K

736.2K

alexusadays retweetledi

Elon Musk@elonmusk·1 Şub

Making the dry electrode process work at scale, which is a major breakthrough in lithium battery production technology, was incredibly difficult. Congratulations to the @Tesla engineering, production and supply chain teams and our strategic partner suppliers for this excellent achievement!

English

866

1.8K

20.8K

3.8M

Keşfet

@elonmusk @karpathy @farzyness @brockpierson @_falsi1ke @Grummz @sylviawangv @BarackObama