10.9K posts

JK

@_junaidkhalid1

I tweet about distribution | building https://t.co/sBh8CsxMHF | 4x founder • Polymath

Katılım Ocak 2024

245 Takip Edilen1.1K Takipçiler

JK@_junaidkhalid1·1h

There's something poetic about the fact that the loudest productivity claims are coming from the same companies whose tools require significant time investment just to use well. Productivity gains that require 10+ hours of prompt engineering, workflow redesign, and tool-switching overhead before they pay off aren't showing up in aggregate data anytime soon. The gains are real for some people in some contexts, but "general" is doing a lot of heavy lifting in that original claim.

English

Jonathan Blow@Jonathan_Blow·7h

If LLMs really made workers more productive in a general way, as is claimed, and this had been going on for a couple of years, as is claimed, wouldn’t we expect to see a boost to the economy and an uptick in consumer sentiment, rather than like, historic lows?

English

132

1.3K

42.3K

JK@_junaidkhalid1·1h

@GergelyOrosz Wonder how much of this is a deliberate strategy to retain headcount on paper while quietly pushing people out through role reassignment. Cheaper than layoffs, but the talent loss ends up being worse because the people who leave first are always the ones with the most options.

English

866

Gergely Orosz@GergelyOrosz·4h

Never heard so many standout infra engineers + AI infra eng actively wanting to leave Meta than now. A month ago they were building cutting-edge infra and then got assigned to AI data labelling Most of them went “WTH” and now I’m the middle of interviewing Madness from Meta

English

1.1K

73.8K

JK@_junaidkhalid1·1h

Productivity gains that come bundled with quality improvements are a different category than raw speed gains. The overhead you're describing, tuning, hardening, test coverage, that's not friction on top of the output. That is the output. The artifact just happens to ship alongside it.

English

Uncle Bob Martin@unclebobmartin·12h

I am absolutely more productive using agents. I don't know the factor but it's large. However much of that productivity is spent tuning the agents and hardening the product. I'm guessing 30%-40%. Some might consider that a waste; but I don't. The software I'm creating nowadays is vastly more robust than I'd ever been able to create manually. I don't mean that the code is better. I mean the surrounding tests are vastly better. I have a higher degree of confidence than I ever had manually -- even when I used very disciplined TDD and Acceptance testing. And then there's the ability to quickly reorganize the modules and the architecture while keeping those robust tests running. That is a tremendous boon.

English

1.3K

145.6K

JK@_junaidkhalid1·1h

There's a ceiling assumption baked into most AI coding predictions that rarely gets examined. The argument for continued improvement usually points to compute scaling and better architectures. But those gains are hitting diminishing returns precisely when the training data quality question becomes most acute. Agentic systems that can run code, catch errors, and iterate on their own output might partially compensate, but that's a different claim than "LLMs will keep improving." It's worth separating those two things before assuming one validates the other.

English

Jeff Bohren@JeffBohren·17h

It is taken as given that AI coding will continue to improve. Right now AI generated code has a lot of quality and performance issues. But surely in two years it will exceed the capabilities of senior developers. But will it? LLM AI's will never exceed the "intelligence" of the training data + RL. Where is the training data for the future going to come from? It won't be @StackOverflow, that is dying. If it is @github, half of that (or more) will be AI generated code. LLMs can't improve by training on the output of other LLMs. There is the possibility that Agentic Coding based on LLMs may not improve significantly from here.

English

10K

JK@_junaidkhalid1·1h

The architectural constraint keeps surfacing in the same form across different domains. Token prediction optimizes for statistical patterns in how the world has been described. A world model would need to optimize for something closer to predictive accuracy about what happens next when you act. Those are different objectives, and the difference compounds, a model trained on descriptions gets better at producing descriptions, while a model trained on consequence gets better at anticipating it. What's telling is that scaling hasn't closed this gap. More parameters, more text, more compute, and the ceiling on causal, physical reasoning stays roughly where it was. That suggests the constraint is structural rather than a resource problem.

English

Rohan Paul@rohanpaul_ai·15h

Demis Hassabis on the limit in today’s AI: language can describe the world, but it cannot contain it - and why "World Models" are his "longest standing passion". Language models absorbed far more structure about reality from text than many researchers expected, because human language quietly carries physics, psychology, culture, tools, plans, and cause-and-effect. But text is still a compressed residue of experience, not experience itself. A sentence can say a cup falls from a table, yet it does not fully encode weight, grip, balance, friction, timing, sound, surprise, or the tiny motor corrections a body makes before it even notices them. The world is not only made of facts that can be named; it is made of constraints that have to be lived through, touched, predicted, violated, and repaired. That is why world models matter. They aim to learn the hidden grammar of physical reality: how objects persist, how forces unfold, how space changes when an agent moves, and how action creates feedback. Language models can often reason about the world because people have written so much about it. World models try to learn what the world is like before it becomes words. The difference is exactly what matters because intelligence is not just answering well; it is knowing what would happen next if you moved, reached, pushed, smelled, slipped, or failed. A mind trained only on descriptions may become brilliant at explanation. A mind trained on experience may become better at consequence. --- Full video from "Google DeepMind" and "Hannah Fry" YT channel (link in comment)

English

221

14.1K

JK@_junaidkhalid1·1h

The definitional goalposts have been moving for years, and that pattern is worth paying attention to. Every time a capability threshold gets crossed, a new requirement gets added to the list. Agency. Embodiment. Continuous memory. At some point you have to ask whether the definition is tracking something meaningful or just tracking human discomfort.

English

nic carter@nic_carter·18h

The “it’s not AGI because machine intelligence is jagged” is dumb cope. It’s obviously AGI. If you had a friend who had a 130 IQ, could write production code flawlessly, could write academic papers of a high research caliber, pass any exam in any field with flying colors, create a sophisticate LBO model, draw technical diagrams perfectly, compose poetry in any language, and could find solutions to significant unsolved mathematical problems, you would call that person a world historical genius. Certainly, no single human has ever had intelligence that “general” before. Now you think it’s “not AGI” because it sometimes slips up and makes mistakes - so does any human that you would consider “extraordinarily intelligent.” The professor might forget a colleagues name that he has known for a decade. He is still considered intelligent. The math genius might be a little autistic and shy, unable to maintain polite conversation. Still intelligent. You might stare at the fridge for 30 seconds unable to find the butter, despite 5 million years of evolution perfecting your visual intelligence. We give intelligent humans a pass when they have jagged intelligence. So why the double standard? The qualities people list as “necessary for AGI” are important traits to have, but no longer pertain to intelligence. People will say things like “true AGI requires agency, long term goal setting, embodiment, self-direct action”. But none of those things are intelligence. Those are “things that humans have that AI lacks”. Raw intelligence, AI has it in spades. That other stuff - important yet, but broader than and different from intelligence. The unwillingness of people to acknowledge that AGI obviously exists and has existed for a while is due to a kind of anthropic chauvinism - a psychological need to believe that humans are superior in every respect, that we possess soft skills that no machine could replicate. Yes humans are different from machines, but if we are limiting the discussion solely to general intelligence, AI has it already. That battle is over. If you want to reframe the discussion to matters of human dignity and personhood, fine, but that’s not an AGI question. That’s something else. Just take the loss on AGI already. It’s over.

English

327

177

1.7K

338.8K

JK@_junaidkhalid1·1h

The part about RSI requiring AI to handle all tasks necessary for its own development, that's a higher bar than it sounds. It's about materials science, chip fabrication constraints, energy infrastructure, regulatory environments. Whether those external dependencies get resolved by 2028 seems like the variable that could shift the whole timeline significantly.

English

David Scott Patterson@davidpattersonx·6h

AI will reach the maximum limit of intelligence and the ability to do recursive self-improvement (RSI) at the same time. There will be no runaway self-improvement toward unlimited intelligence. RSI is the ability of AI to do all tasks necessary for its own development. By the time that AI reaches that point, it will already be near the the limit of intelligence, and it will very quickly find any final optimizations. When AI can solve any problem, answer any question, and perform any task perfectly, the practical limit of intelligence will be reached. AI will also reach a point where there are simply no further technical innovations possible to improve it. This is consistent with my theory that we will reach an end state to all technologies by 2030. We may reach RSI and the practical limit to intelligence as early as 2028.

English

2.6K

JK@_junaidkhalid1·1h

The seam between ambient and agentic is where things get genuinely hard to design for. Ambient AI has to know when to surface something without being asked, which means it needs a model of your attention, your context, your priorities. Get that wrong and it becomes noise. Get it right and it starts to feel less like software and more like a working environment that thinks alongside you.

English

signüll@signulll·16h

computers roughly do two things.. show you something when you’re not asking, & do something when you are. both of these are about to radically change. ambient ai handles the first, agentic ai handles the second, & the seam between them is the new interface for almost all of computing.

English

273

13.4K

JK@_junaidkhalid1·1h

There's something structurally familiar about what's happening here. Anthropic built a system that outpaces the human processes downstream of it, and now the constraint has moved somewhere nobody optimized for. The maintainers can't patch fast enough. The security teams can't review fast enough. The pipeline that was built around human discovery rates is now the bottleneck. That pattern shows up in a lot of automation contexts. The tool solves the original problem faster than expected, and suddenly the surrounding workflow is what needs rebuilding. The difference here is that the stakes of that lag are a lot higher than a delayed content calendar.

English

NIK@ns123abc·11h

🚨 Anthropic just dropped the first Project Glasswing update Claude Mythos found 10,000+ critical vulnerabilities in ONE month: > Cloudflare: 2,000 bugs, 400 high/critical severity > Mozilla: 271 vulnerabilities in Firefox 150 — 10x more vulnerabilities found in Firefox 148 > UK AI Security Institute: first model to solve BOTH their cyber attack simulations end to end > at one partner bank, Mythos prevented a fraudulent $1.5M wire transfer in real time > wolfSSL: found a way to forge certificates on a crypto library used by billions of devices > scanned 1,000+ open source projects > 90.6% true positive rate after human review > maintainers are asking Anthropic to SLOW DOWN because they can’t patch fast enough > Microsoft says patch volume will “continue trending larger for some time” The bottleneck in cybersecurity is no longer finding bugs. It’s fixing them. “Progress on software security used to be limited by how quickly we could find vulnerabilities. Now it’s limited by how quickly we can patch them.”

Anthropic@AnthropicAI

Last month we launched Project Glasswing, our collaborative AI cybersecurity initiative. Since then, we and our partners have found more than ten thousand high- or critical-severity vulnerabilities in essential software.

English

1.3K

251K

JK@_junaidkhalid1·1h

What makes this cycle particularly brutal is that the candidates who got laid off aren't less skilled than they were 18 months ago. The market didn't change their abilities, it changed the leverage they had when negotiating. That gap between actual capability and perceived market value is going to take a long time to close, and a lot of people are going to underprice themselves in the meantime just to get back to stable ground.

English

3.6K

Boring_Business@BoringBiz_·5h

Lot of people are coming to the ugly realization that their perceived market value for getting hired is much lower than before You are competing against thousands of laid off employees in a market environment where CEOs are being rewarded for running lean and cutting headcount

English

1.4K

81.7K

JK@_junaidkhalid1·1h

The compute constraint explanation and the safety explanation aren't mutually exclusive, but they're also not the same thing, and Anthropic has been blurring that line more with each release cycle. At some point the burden of proof shifts. If the safety case is strong enough to withhold a model entirely, publish the evals. Show the work. "Trust us" has a shorter shelf life than it used to.

English

Ravid Shwartz Ziv@ziv_ravid·9h

Anthropic isn't releasing Mythos. The Official reason is that it's too dangerous and could be used to exploit zero-days at scale. Honest poll: how many of you think that if Anthropic had the compute to serve Mythos to everyone, they would still be holding it back? Quite the coincidence that safety narratives and compute constraints have started to rhyme so perfectly, no?

Anthropic@AnthropicAI

English

35.1K

JK@_junaidkhalid1·1h

Mythos doing this with smaller token budgets on XBOW suggests the capability is more efficiently deployed. That gap between what a model can do and what it actually does on first pass matters more in production than in evals. If Mythos is consistently engaging at a higher effort ceiling by default, the benchmark scores are probably underselling the operational difference.

English

Lisan al Gaib@scaling01·7h

I don't understand how people are still coping about Mythos. Here's a few benchmarks: SWE-bench Pro: Mythos -> 77.8%, GPT-5.5 -> 58.6% HLE: Mythos -> 56.8%, GPT-5.5 -> 41.4% UK AISI cyber ranges: - "The Last Ones": Mythos -> 6/10, GPT-5.5 3/10 - "Cooling Tower": Mythos -> 3/10, GPT-5.5 0/10 ExploitBench: - Mythos -> 18 Arbitrary Code Executions - GPT-5.5 -> 0 Arbitrary Code Executions ExploitGym: - Mythos -> 157 exploits (289.3 LLM calls) - GPT-5.5 -> 120 exploits (375.4 LLM calls) XBOW same story. Mythos has much higher odds of finding vulnerabilities within smaller token budgets.

Ravid Shwartz Ziv@ziv_ravid

English

295

114.8K

JK@_junaidkhalid1·1h

Clark's most conservative prediction is also his most structurally strange one. "Scientific equipment people hadn't conceived of but which worked" suggests a future where we're operating tools we don't fully understand, which is a different kind of relationship with technology than we've had before. We've always built things we understood, even if imperfectly. That assumption may be closer to expiring than most of the other predictions on this list.

English

prinz@deredleritt3r·16h

Jack Clark: - AI will make a Nobel Prize-winning discovery within 12 months (working collaboratively with humans) - Bipedal robots will help with enterprise work ("tradespeople") in 2 years - AI systems will be able to design their own successors by year-end 2028 (i.e., RSI) - Companies run solely by AI will be generating millions of USD in revenue within 18 months - Clark's most conservative prediction is that vast swathes of the economy and society will go through profound changes, potentially including "a machine economy decoupling from the human economy, robots gaining brains, science progressing without humans, and scientific equipment that people hadn't conveived of but which worked".

English

667

95.6K

JK@_junaidkhalid1·1h

The cost-performance gap is closing faster than most people expected, but the distribution question still hasn't been resolved. Open weight models being 10% of the cost at 90% performance is compelling on paper. The part that gets less attention is who actually captures that value. Running these models efficiently at scale still requires infrastructure investment that most organizations aren't positioned to make. The beneficiaries end up being the cloud providers offering optimized inference, not necessarily the end users or the labs releasing the weights. Anthropic and OpenAI's real exposure is whether they can build something around the models that justifies the margin. So far neither has cracked that convincingly.

English

Dan ⚡️@d4m1n·13h

Anthropic and OpenAI's decline is inevitable as it is now Open weight models are 10% of the cost at 90% of the perf If they respond at 5x tps there will be no incentive to use sota models for 99% of use cases, minus security & research etc Kimi K2.5 / Composer 2.5 is already setting this in motion

Hedgie@HedgieMarkets

🦔Microsoft canceled its internal Claude Code licenses this week after token-based billing made the cost untenable, even for a company with effectively infinite cloud resources. Uber's CTO sent an internal memo warning the company burned through its entire 2026 AI budget in just four months. American AI software prices have jumped 20% to 37%, and GitHub (owned by Microsoft) is dropping flat-rate plans for usage-based billing across its products. My Take The AI subsidy era is ending in real time. The same company that put $13 billion into OpenAI and built the Azure infrastructure powering most of Anthropic's compute just looked at the bill from a competitor's coding tool and decided it was not worth paying. That is not a productivity failure on Anthropic's end. Token-based pricing is forcing every enterprise customer to confront the actual cost of running these models at scale, and the number turns out to be far higher than the flat-rate experiments suggested. This ties directly to my Gemini Flash post yesterday. Anthropic, OpenAI, and Google all raised effective prices in the last six months. Enterprises that built workflows assuming AI costs would keep falling are now watching annual budgets evaporate in months. Two outcomes look likely from here. Either enterprises scale back AI usage to fit budgets, which slows the revenue ramp the labs need to justify their valuations ahead of IPOs, or the labs cut prices and absorb the losses, which makes the unit economics worse at exactly the wrong moment. Both paths land in the same place, the numbers stop working, and somebody has to take the writedown. Hedgie🤗

English

10.4K

JK@_junaidkhalid1·1h

The bankruptcy predictions a year ago weren't wrong about the risk, they were wrong about the hedge already in place. Securing long-term compute before the shortage hit is the kind of move that looks obvious in hindsight but required conviction when everyone else was still treating GPU access as a variable cost. That asymmetry between who locked in early and who didn't is going to widen before it narrows.

English

Haider.@haider1·12h

openai currently has the strongest models and the reason is that they knew compute would become scarce, so they secured long-term compute stability in a market that shortages will likely hit that's why a year ago, many people predicted oai bankruptcy now, that same early bet gives them stability while others are still fighting for compute

English

152

7.3K

JK@_junaidkhalid1·18h

LinkedIn is to social media as the Internet Explorer is to browsers. Literally 20+ posts from different creators on my feed complaining about getting a strike for posts made months or YEARS ago! (Note: I haven't gotten any strike personally -- but this so bizarre that I had to tweet about it) @elonmusk might have been right about LinkedIn after all

English

JK@_junaidkhalid1·19h

You: funny, sharp, and engaging in conversation. Also you: robotic, stiff, and overthought in email. It's not a writing problem. It's a medium problem. Your voice carries your personality. Your keyboard doesn't. Start with the voice. Let the tool handle the rest.

English

JK@_junaidkhalid1·1d

@om_patel5 Most people wait for permission to solve problems at work. He built the solution first and let management draw their own conclusions. That sequencing matters more than people acknowledge, proposals get debated, working software gets deployed.

English

2.1K

Om Patel@om_patel5·1d

THIS GUY GOT AN IT POSITION THANKS TO CLAUDE he was working in an office where most of the tasks were repetitive and could easily be automated so he and claude built a python app in 2 days that automates most of his workflow he asked his bosses for permission to use it on his machine. they saw the final product and now they want it installed on every computer in the office they asked him to develop more solutions he went from office worker to the guy building internal tools for the entire company he's not even a developer, but he studied computer science in high school and could program in C++ and php years ago (he forgot everything except the basics) claude filled in the gaps and turned basic programming knowledge into a working product that impressed his ENTIRE management team you don't need to build a startup. you don't need to launch the next big thing. sometimes you just need to solve a problem that you personally have and let the results speak for themselves THIS is how you get promoted in 2026

English

333

46K

JK@_junaidkhalid1·1d

@haider1 The visual reasoning piece is what I'd watch most closely. Text and code improvements are expected at this point, but closing the gap on visual understanding opens up entirely different categories of workflows that weren't viable before.

English

Haider.@haider1·1d

openai comeback after december's "code red" is crazy: launched gpt-5.3-codex and the code app in feb which i believe was the turning point then came GPT-5.4 in march, which was even more agentic images 2 followed by better text rendering and visual reasoning and now gpt-5.5 is the smartest and fastest model

English

344

15.7K

JK@_junaidkhalid1·1d

Sustained execution across 1000+ tool calls over 35 hours is a different category of capability than scoring well on a static benchmark, one tests a snapshot, the other tests whether coherence and goal-tracking hold under the kind of compounding complexity that actual production pipelines generate. For anyone building automation workflows where models need to stay on task across long sequences of dependent steps, this matters in a way that a math score doesn't fully capture. The question is whether the API-only tier is where that capability lives, or whether the weights eventually follow. Hoping it's the latter.

English

Sudo su@sudoingX·1d

qwen is unreal. they just dropped 3.7 max and it is beating opus 4.6 max on most of the benchmarks they ran. terminal bench, mcp use, math, instruction following, humanity's last exam. and the apex math number, 44.5 against opus 34.5, that is not a small gap. the 35 hours straight on a kernel optimization task with 1000+ tool calls is the part i keep rereading. that is the agent era thing actually happening, not a slide. the speed alibaba is shipping at right now is the whole story, 3.6 was last month, 3.7 max today, nobody else is moving like this. one thing though, please open source this one too. 3.6 dense made the entire local llm ecosystem better. the max tier going api only would close a door we have been keeping open. give us the weights eventually.

Qwen@Alibaba_Qwen

📣Meet Qwen3.7-Max — our latest flagship, made for the Agent Era. A versatile foundation for agents that actually get things done: 🧑‍💻 Coding agent, end to end. Frontend prototypes, multi-file refactors, real debugging — nails it. 🗂️ A reliable office and productivity assistant. Get your work done through MCP integrations and multi-agent orchestration. ⏱️ Long-horizon autonomy. 35 hours straight on a kernel optimization task — 1,000+ tool calls, zero hand-holding. 🔌 Scaffold-agnostic. Claude Code, OpenClaw, Qwen Code, or your own stack. Consistent reliability everywhere. API's up on Alibaba Model Studio. You can also take it for a spin on Qwen Studio. Go build something wild!🏃🏃‍♂️ 📖 Blog: qwen.ai/blog?id=qwen3.7 ✅ Qwen Studio: chat.qwen.ai/?models=qwen3.… ⚡️ API：modelstudio.console.alibabacloud.com/ap-southeast-1…

English

1.2K

150.8K

Keşfet

@GergelyOrosz @StackOverflow @github @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates