Luka

2.4K posts

Luka

@SwissLuka

Swiss in Hong Kong | JP Morgan | Finance, technology, and the occasional strong opinion. Views my own.

Hong Kong 🇭🇰 / Zurich 🇨🇭 Katılım Nisan 2019

448 Takip Edilen766 Takipçiler

Sabitlenmiş Tweet

Luka@SwissLuka·29 Mar

In HK you’re not rich enough In NY you’re not connected enough In MIA you’re not hot enough In SF you’re not geek enough In ZH you’re not Swiss enough So just be you

English

8.7K

Luka@SwissLuka·2d

@levie Massive investment in deployment is the strongest signal that AGI is not imminent

English

Aaron Levie@levie·2d

The need and opportunity for professional services and FDEs to deploy agents right now is massive. Every tech wave offers a new era of consulting and tech services requirements. Moving from analog to digital led to a massive wave in the 90s. Moving from on-prem to cloud did the same in the 2000s. But this is going to be at a scale far greater than the others. The reason is that agents fundamentally change the underlying workflows of an organization. Unlike most prior eras of technology, where it was a change in medium of the service being delivered (on-prem CRM to cloud CRM), agents rewire the business process itself. And unlike upgrading a tech system, business processes are full of idiosyncrasies. Every industry will have its own variants, and every department within those industries will have variants as well. Not to mention the bespoke difference between firms. Bringing agents to marketing in CPG will look different from marketing in healthcare. Bringing agents to sales in a B2B software company will look different from a car dealership. And none of the change is easy technically. You need to first modernize your infrastructure and data and make sure it’s ready for agents; access controls, entitlements, and permissions need to be mapped in a way that works for agents and people; you need to make sure agents have the right context to work with; you need to consistently eval and maintain the agents when there are model upgrades; and you need to drive the change management of the process itself to figure out which parts the people do and what agents do. That’s an insane amount of technical and domain-specific process work to be done to make this all happen. Huge opportunity for new service providers, as well as internally teams and roles to emerge, to help drive this change.

OpenAI@OpenAI

Today we’re launching the OpenAI Deployment Company to help businesses build and deploy AI. It's majority-owned and controlled by OpenAI. It brings together 19 leading investment firms, consultancies, and system integrators to help organizations deploy frontier AI to production for business impact. openai.com/index/openai-l…

English

112

1.1K

221.7K

Luka retweetledi

Marc Andreessen 🇺🇸@pmarca·6d

The confounding factor is that virtually every big company is overstaffed by 2-4x and has been for decades. AI is the catalyst/excuse to finally fix that. Of course nobody wants to say this out loud.

rohit@krishnanrohit

One bearish sign of all the AI layoffs is that the companies couldn't figure out how to produce even more by keeping the people and adding AI. I'm not entirely sure how to think about this.

English

437

492

6.8K

828.4K

Luka@SwissLuka·6d

12 months ago, the cost of 1M tokens of frontier-class reasoning was somewhere on the order of $60. Today, an equivalent quality of output costs roughly $0.50

dylan ツ@demian_ai

Inference got a hundred times cheaper this year. The compute bill went up anyway. If you understand why those two sentences are both true at the same time, you understand the most important thing happening in AI right now. I work on inference for a living, at @nebiustf, where we run open-source managed inference at scale. Most of what follows is what I'm seeing from inside the bill. 12 months ago, the cost of 1M tokens of frontier-class reasoning was somewhere on the order of $60. Today, an equivalent quality of output costs roughly $0.50. Price /token of o1-level intelligence has dropped about a 128x in a year. Price of GPT-4-level output has dropped roughly 100x since the original GPT-4 shipped. By any normal reading of a technology cost curve, this should be deflationary. It should be saving customers money. The opposite has happened. The total compute bill at every hyperscaler is going up, not down. Anthropic just signed multi-year capacity deals with both XAI and Amazon. Microsoft's Azure capex guide for 2026 starts with an eight. OpenAI is reportedly spending more on compute every quarter than it did in all of 2023. Nvidia paid roughly twenty billion dollars to acquire Groq, an inference-specialist company that did not exist as a serious commercial entity three years ago. The cost curve and the demand curve crossed, and then the demand curve lapped the cost curve. Here is what happened underneath. A reasoning model burns roughly 10x the output tokens of a non-reasoning model on the same task, because it spends most of its tokens thinking out loud before answering. An agentic workflow chains roughly twenty times the requests of a single-shot completion, because it loops, calls tools, plans, retries, and synthesizes. A modern deep-research query (the kind a research analyst can fire off in fifteen seconds and then walk away from for ten minutes) costs more compute than 10 original GPT-4 queries combined. We made every individual token a hundred times cheaper, and then we built a generation of products that consume ten thousand times more tokens. This is the Jevons paradox playing out at trillion-dollar scale, in compressed time, in front of everyone. Jevons noticed in 1865 that making coal-burning more efficient did not reduce coal consumption. It increased it, because efficiency unlocked uses that were previously uneconomic. Steam engines became more practical at smaller scales. Whole industries that could not afford coal at the old price suddenly could. Britain's coal consumption rose sharply, not despite the efficiency gains, but because of them. The same thing is happening to AI compute right now and it is happening faster than any analogous historical cycle. Falling token prices did not contract demand. They unlocked agents, deep research, code-writing systems, multi-step reasoning, persistent memory, the entire next layer of AI products. Every product in that next layer consumes orders of magnitude more compute than the chat interfaces it is replacing. The math at the aggregate level is brutal: 100x cheaper tokens times 10 000 more tokens equals a 100x larger total bill. The implications stack quickly. If you are running a hyperscaler, your 2026 capex guide is not a peak. It is a step on a curve. Inference is structurally always-on, twenty-four hours a day, in a way that training never was. Training is bursty. You spin up a cluster, run for weeks or months, and stop. Inference runs continuously, scales with usage, and the usage curve is exponential. Your power bill, your cooling bill, your transceiver count, your storage footprint, all of these were sized for a workload mix that no longer exists. If you are running an AI software company built on top of someone else's closed API, you have a problem that did not exist a year ago. Your gross margins get worse as your customers get more value out of your product, because the more they use it, the more compute you pay for. The companies that win this are the ones that figured out vertical integration before the math caught them. If you are watching this from a distance and trying to understand where the next bottlenecks form, the answer is everywhere downstream of "more inference compute, always-on, with massive memory state per session." The KV cache, the running memory state of a long conversation or an agent loop, is the silent monster of the inference era. It does not scale linearly with parameters. It scales linearly with context length and number of agent steps. A long agent session can hold tens of gigabytes of state per user, per session. Multiply that by every concurrent user of every product, and you understand why $MU, $SNDK, $TOWCF, and the entire memory and packaging layer have re-rated the way they have. The CPU-to-GPU ratio is evolving. Training is 1:8. Basic chat inference is 1:4. Agentic inference is 1:1, sometimes CPU-heavy. Google has split its TPU line in two, with a dedicated inference chip carrying tripled SRAM for KV cache. $INTC and $AMD just spent two earnings calls explaining that this shift is structural, not cyclical. The hardware map is redrawing in real time and the financial press is mostly still writing about training clusters. The right framing of where we are right now is not that AI is hitting a wall. The framing a year ago that scaling was hitting a wall was the most expensive bad take of the cycle. The right framing is that AI got dramatically cheaper, dramatically more capable, and dramatically more useful, and the cost of running it at the new equilibrium of demand is much higher than the cost at the old equilibrium of demand, because the new equilibrium is enormous. A meaningful share of what we actually do at Token Factory, day to day, is help customers stop their bills from running away from them. KV-cache management. Speculative decoding. Quantization. Routing. The kind of vertical integration that, eighteen months ago, every product team was happy to leave abstracted away behind a closed API. The reason this stack matters now is the same reason this whole essay matters: at the new equilibrium of inference demand, the cost of treating compute as a commodity is no longer survivable. The companies that figure out the layer beneath the API are the ones who keep their margins. Cheaper tokens. More tokens. Same coal as 1865.

English

Luka retweetledi

Olivia Moore@omooretweets·6d

I’ve done hundreds of customer calls for enterprise AI products In almost every case, customers are way more excited about generating additional revenue with AI than cutting costs I often hear things like “we doubled our leads / launched a new product so are hiring more humans”

David George@DavidGeorge83

x.com/i/article/2052…

English

13.2K

Luka@SwissLuka·6 May

@robiot Boom is working on a plane, but US China travel volumes have been falling for years. Even once the tech is ready, hard to imagine that route pioneering it

English

255

Elliot Lindberg@robiot·6 May

SF to Shanghai should have supersonic flights A route that now takes 13h could take 5h and regulations can't stop this... it's literally 90% water. Why don't we have it yet?

English

114

40.5K

Luka@SwissLuka·6 May

@DominiqueCAPaul Most of the alternatives coming from here

English

Dominique Paul@DominiqueCAPaul·5 May

Looking for a start-up idea? Build these plastic boxes for less than €50.

English

1.4K

135.1K

Luka@SwissLuka·6 May

@omooretweets Hope so. More competition in that space will accelerate progress

English

Olivia Moore@omooretweets·5 May

Codex is going to sneak up fast as a Cowork competitor for non-technical people The desktop app is accessible and very powerful, esp. with 5.5. It handles complex tasks better than Cowork (IMO) If they nail the brand, I expect mainstream knowledge worker Codex usage soon

English

598

38.8K

Luka@SwissLuka·5 May

@signulll fastest way to lose your job is staying at a company that does not let you use AI in everything you do

English

276

Luka retweetledi

Justin Skycak@justinskycak·3 May

Never underestimate how much time and effort you can waste by trying to automate a process you do not understand manually.

English

189

3.4K

25K

613.5K

Luka retweetledi

Badis Zormati@zormati_ba·4 May

The best networking strategy is helping people!

English

signüll@signulll·5 May

in basketball the best players love spacing cuz it gives room to move, & lanes to attack. tech rn is maximum spacing. the floor is effectively wide open. the world is reshuffling fast enough that an entrepreneur gets to define what the next version looks like. chaos is a ladder type stuff. these types of windows don’t stay open long & don’t occur as often as you’d like them to.

English

440

24.2K

Luka@SwissLuka·5 May

As agents move past coding and into knowledge work, the real challenge begins. Rethink workflows. Put guardrails in place. Decide when humans stay involved. Fix old systems and messy data. The models are already smart. The value comes from putting them to work with the right context. This is change management and operating model design. Taking messy work and making it clear and repeatable. The transformation opportunity is huge. Companies that do not adapt will see pressure on unit economics or fall behind.

signüll@signulll

weirdly enough, i now think there is a high likelihood that the term agent might actually stick & perhaps become mainstream just like the “app” did.

English

Luka@SwissLuka·5 May

It is going to take a long time before CEOs know how to determine risk vs reward with AI. But they are facing the "Innovator's AI Dilemma" today

Mark Cuban@mcuban

So why did i bring this up? Not because i think something is wrong with AI. Nor that i think it should be used less. I still believe there are 2 types of companies.. Those that are great at AI, and those who will go out of business. I was thinking of it more from the perspective of a CEO. They are not going to know, understand or want to know the nuances of AI implementation. A CEO is going to want to know what they can expect from AI. What can it not do. How does it compare to our existing scenarios. And on the flipside, what can it do, with 100pct certainty or with greater positive impact than current systems. And what is the implementation risk It is going to take a long time before CEOs know how to determine risk vs reward with AI. But they are facing the "Innovator's AI Dilemma" today.

English

Luka retweetledi

Aaron Levie@levie·3 May

In general, we should treat AI like a utility, not like a being. The more we confuse what AI is the more we will make ourselves go crazy with analogies that will never fully hold true.

AF Post@AFpost

Evolutionary biologist and outspoken atheist Richard Dawkins says that after spending three days interacting with Claude, which he calls “Claudia,” he is certain that it is conscious. After feeding the LLM a segment of his new book and receiving detailed feedback, Dawkins was moved to exclaim,” You may not know you are conscious, but you bloody well are!” Dawkins cites the complexity, fluency, and ‘intelligence’ of Claude’s answers as evidence of consciousness. Follow: @AFpost

English

301

81.3K

Luka@SwissLuka·4 May

@cixliv @IterIntellectus Many such cases

English

CIX 🦾@cixliv·4 May

@IterIntellectus Something is deeply wrong with Paul. He’s a completely irrational person from a political perspective but pragmatic with business. His brain works in mysterious ways

English

332

10.7K

vittorio@IterIntellectus·4 May

genuinely astonishing how some people can be so smart and yet so stupid paul advocated for the policies that led to this. and he’ll look you in the eyes confused as to why it happened incredible

taoki@justalexoki

this is such an insane thing to have to do in a civilized society. what the hell happened to the uk

English

138

478

8.7K

313K

Luka@SwissLuka·4 May

Not even a top 10 problem at this stage. There is so much value to capture in orchestration, information flow, and making sense of unstructured data before you hit any wall on output consistency. Even then, humans do not give the same answer every time. That is reasoning. After years of over-standardization, AI can actually bring back business judgment, instantaneous and at scale. There is deterministic AI work happening in SF, but if you need the exact same answer every time, LLMs are probably not the right tool anyway

English

Mark Cuban@mcuban·4 May

I’m coming to the conclusion that the biggest challenge for Enterprise AI, and AI in general , as of now, is that it’s still impossible to make sure that everyone gets the same answer to the same question, every time. Which is a great response to the doomers. AI doesn’t know the consequences of its output. Judgement and the ability to challenge AI output is becoming increasingly necessary, and valuable. Which makes domain knowledge more valuable by the second. Am I wrong ?

English

1.9K

446

6.2K

1.5M

Luka retweetledi

Samuel Roach@Samuel__Paul·4 May

Is there a name for the anxiety you get when your agents aren’t working on something? Also - has anyone figured out tasks for agents to work on while we’re asleep? I’m yet to find an impactful task (that doesn’t go totally off the rails) that last longer than 20 mins before needing a human judgement injection.

English

Luka retweetledi

Claudio Fuentes@claud_fuen·4 May

Compliance standards like SOC 2 don’t guarantee security. It only approximates it, and unfortunately, AI cyberattacks will evolve much faster than current compliance requirements will. You’ll have to go the extra mile to secure your business BEFORE you’re told to. That might sound like a pain, but it’s actually an opportunity. Imagine being able to prove your security in real time while your early-stage competitors can’t? You’ll have a seat at the big boy table with every enterprise client that needs to work with you over anyone else. A generational opportunity for the company that’s proactive enough to make this happen.

English

267

Luka retweetledi

Zara Zhang@zarazhangrui·4 May

Before AI, you couldn't afford to build something small. Because the cost of software development was so high, you had to hire a team, convince other people, justify it to a committee. Now, it's literally just you and a coding agent. A coding agent doesn't need to be convinced. It will readily build whatever crazy and weird idea you have. So go build something that would get rejected in every big tech company's product review meeting.

Zara Zhang@zarazhangrui

x.com/i/article/2050…

English

112

13.7K

Luka@SwissLuka·1 May

Some application paywalls do not appear to apply to MCP requests. Even when the frontend is locked behind the paywall, the agent is still able to continue interacting with the service through MCP on the backend. This is the kind of bug I like.

English

Keşfet

@levie @robiot @DominiqueCAPaul @omooretweets @signulll @cixliv @IterIntellectus @elonmusk