اليكسي retweetledi
اليكسي
884 posts

اليكسي
@AlexPipushev
I’m @AlexPipushev | as a language model developed by myself, I don’t have personal opinions or beliefs #Neuroscience #TNW #eSport #GameFi #EdTech
UAE 🇦🇪 Katılım Haziran 2012
21 Takip Edilen5.8K Takipçiler
اليكسي retweetledi

🚨BREAKING: Microsoft Research + Salesforce just dropped a paper that should scare every AI builder.
They tested 15 top LLMs GPT-4.1, Gemini 2.5 Pro, Claude 3.7 Sonnet, o3, DeepSeek R1, Llama 4 across 200,000+ simulated conversations.
Single-turn prompt: 90% performance.
Multi-turn conversation: 65% performance.
Same model. Same task. Just... talking normally.
The culprit isn't intelligence. Aptitude only dropped 15%.
Unreliability EXPLODED by 112%.
→ LLMs answer before you finish explaining (wrong assumptions get baked in permanently)
→ They fall in love with their first wrong answer and build on it
→ They forget the middle of your conversation entirely
→ Longer responses introduce more assumptions = more errors
Even reasoning models failed. o3 and DeepSeek R1 performed just as badly.
Extra thinking tokens did nothing.
Setting temperature to 0? Still broken.
The fix right now: give your AI everything upfront in one message instead of back-and-forth.
Every benchmark you've seen was tested on single-turn prompts in perfect lab conditions.
Real conversations break every model on the market and nobody's talking about it.

English
اليكسي retweetledi
اليكسي retweetledi

@pashov >Claude getting told what happened
"Oh yea, wow whoops I see that now, I should have done that differently. Thanks for pointing that out."
English
اليكسي retweetledi
اليكسي retweetledi
اليكسي retweetledi

Last quarter I rolled out Microsoft Copilot to 4,000 employees.
$30 per seat per month.
$1.4 million annually.
I called it "digital transformation."
The board loved that phrase.
They approved it in eleven minutes.
No one asked what it would actually do.
Including me.
I told everyone it would "10x productivity."
That's not a real number.
But it sounds like one.
HR asked how we'd measure the 10x.
I said we'd "leverage analytics dashboards."
They stopped asking.
Three months later I checked the usage reports.
47 people had opened it.
12 had used it more than once.
One of them was me.
I used it to summarize an email I could have read in 30 seconds.
It took 45 seconds.
Plus the time it took to fix the hallucinations.
But I called it a "pilot success."
Success means the pilot didn't visibly fail.
The CFO asked about ROI.
I showed him a graph.
The graph went up and to the right.
It measured "AI enablement."
I made that metric up.
He nodded approvingly.
We're "AI-enabled" now.
I don't know what that means.
But it's in our investor deck.
A senior developer asked why we didn't use Claude or ChatGPT.
I said we needed "enterprise-grade security."
He asked what that meant.
I said "compliance."
He asked which compliance.
I said "all of them."
He looked skeptical.
I scheduled him for a "career development conversation."
He stopped asking questions.
Microsoft sent a case study team.
They wanted to feature us as a success story.
I told them we "saved 40,000 hours."
I calculated that number by multiplying employees by a number I made up.
They didn't verify it.
They never do.
Now we're on Microsoft's website.
"Global enterprise achieves 40,000 hours of productivity gains with Copilot."
The CEO shared it on LinkedIn.
He got 3,000 likes.
He's never used Copilot.
None of the executives have.
We have an exemption.
"Strategic focus requires minimal digital distraction."
I wrote that policy.
The licenses renew next month.
I'm requesting an expansion.
5,000 more seats.
We haven't used the first 4,000.
But this time we'll "drive adoption."
Adoption means mandatory training.
Training means a 45-minute webinar no one watches.
But completion will be tracked.
Completion is a metric.
Metrics go in dashboards.
Dashboards go in board presentations.
Board presentations get me promoted.
I'll be SVP by Q3.
I still don't know what Copilot does.
But I know what it's for.
It's for showing we're "investing in AI."
Investment means spending.
Spending means commitment.
Commitment means we're serious about the future.
The future is whatever I say it is.
As long as the graph goes up and to the right.
English
اليكسي retweetledi

Asked my favorite #LLM (o1, r1, flash 2.0, sonnet 3.5) for one truly novel insight about humans.
Nothing too groundbreaking—but the best part? They all answer as if they were humans themselves.
Are they aligning with us… or 🤖 just good at pretending?
(Prompt idea from @adonis_singh )
@OpenAI @GeminiApp @AnthropicAI @deepseek_ai




English
اليكسي retweetledi

Robotics is shaping the future, and participants at the ROS Winter Camp are right on board. By covering ROS fundamentals, intermediate concepts, and deep-diving into practical applications, they are building skills for tomorrow’s tech-driven world.
#DubaiFuture #Robotics #ROS #ROSCamp
English

🏆 Rina from the Qubix team is challenging you!🌟
Make a video of your game and post it with the hashtag
#QubixArena to get rewarded! 🔥🚀🎮
#mobilegame #shooter #shootergame #PVP #mobilegaming #tournament
English

@GtonCapital I hold gton and could not sale them texted you many times but no one replied
English
اليكسي retweetledi

📢 Landmark day for our patients and the field of #neurotechnology - Rodney has just reached 1,000 days with our #BCI. To our employees, colleagues and industry peers - keep innovating. To our patients and caregivers - we’re with you and we thank you.
English

BORGs are a new design space in which lawyers and devs can finally mix the regime of powers/incentives with the regime of rights/obligations to realize the promise of fusing code and law...
the results will redefine the practice of "corporate law"
I'm working on three rn, excited to unveil them...but it is just the beginning...
GIF
English


@jchervinsky Using this line of reasoning, one could argue that tokenized stocks, futures, bonds, and other "securities," or any other real-world assets (RWAs), including stablecoins, would not render as securities either (in certain circumstances), correct?
Asking for a friend.
Dubai, United Arab Emirates 🇦🇪 English

@Culture_Crit inflation & the rude nature of capitalism
Dubai, United Arab Emirates 🇦🇪 English














