Bill Leoutsakos

2.1K posts

Bill Leoutsakos banner
Bill Leoutsakos

Bill Leoutsakos

@Bi11Leou

Computer Engineering @cambridge_uni | ex-ML Engineer @CosineAI | Eurotech Fellow

Cambridge, England Katılım Kasım 2024
779 Takip Edilen288 Takipçiler
Sabitlenmiş Tweet
Bill Leoutsakos
Bill Leoutsakos@Bi11Leou·
Just built an open-source real-time AI assistant using the newest gpt-realtime API from @OpenAI (watch the demo below) Check out the repo and feel free to reach out for any questions & take ideas from it to build SOTA real-time voice agents! github.com/BillLeoutsakos… Here are some of the challenges I faced while building it, how I solved them and what I learned: 👇 (also pls like / comment / repost, it's my first launch on Twitter and I wanna go a bit viral 😁)
English
12
5
33
6.2K
Bill Leoutsakos
Bill Leoutsakos@Bi11Leou·
@AndrewCurran_ The thing is... gpt 4.5 was a huge model but the result underperformed. Maybe the methods have advanced a lot since then
English
0
0
2
190
Andrew Curran
Andrew Curran@AndrewCurran_·
Three weeks ago there were rumors that one of the labs had completed its largest ever successful training run, and that the model that emerged from it performed far above both internal expectations and what people assumed the scaling laws would predict. At the time these were only rumors, and no lab was attached to them. But in light of what we now know about Mythos, they look more credible, and the lab was probably Anthropic. Around the same time there were also rumors that one of the frontier labs had made an architectural breakthrough. If you are in enough group chats, you hear claims like this constantly, and most turn out to be nothing. But if Anthropic found that training above a certain scale, or in a certain way at that scale, produces capabilities that sit far above the prior trendline, then that is an architectural breakthrough. I think the leaked blog post was real, but still a draft. Mythos and Capybara were both candidate names for the new tier, though Mythos may now have enough mindshare that they end up keeping it. The specific rumor in early March was that the run produced a model roughly twice as performant as expected. That remains unconfirmed. What is confirmed is that Anthropic told Fortune the new model is a 'step change,' a sudden 2x would certainly fit the definition. We will find out in April how much of this is true. My own view is that the broad shape of this is correct even if some of the numbers are wrong. And if it is substantially accurate, then it also casts OpenAI's recent restructuring in a new light. If very large training runs are about to become essential to staying in the game, then a lot of their recent decisions, like dropping Sora, make even more sense strategically. For the public, this would mean the best models in the world are about to become much more expensive to serve, and therefore much more expensive to use. That will put pressure on rate limits, pricing, and subscription plans that are already subsidized to some unknown degree. Instead of becoming too cheap to meter, frontier intelligence may be about to become too expensive for most of humanity to afford. Second-order effects; compute, memory, and energy are about to become much more important than they already are. In the blog they describe the new model as not just an improvement, but having 'dramatically higher scores' than Opus 4.6 in coding and reasoning, and as being 'far ahead' of any other current models. If this is the new reality, then scale is about to become king in a whole new way. It would also mean, as usual, that Jensen wins again.
English
171
306
3.9K
836.7K
Bill Leoutsakos
Bill Leoutsakos@Bi11Leou·
How me and the boys be chilling in 🇬🇷 before big release
Bill Leoutsakos tweet media
English
0
0
1
54
NIK
NIK@ns123abc·
Dario Amodei’s sister loved stuffed animals so much that her fiancé proposed via a movie of her dolls coming to life Dario wore a panda suit to their wedding Their clique at openai then became “the pandas”
NIK tweet mediaNIK tweet media
English
58
31
1.1K
111.2K
Bill Leoutsakos
Bill Leoutsakos@Bi11Leou·
"Getting in is the hard part" is what everyone including me thought after i got in. Its utter nonsense. The university exams and content are 100x harder than any test you ll take befpre uni. In highschool I was hitting 90 and 100% with modest preparation, now i am grinding and hoping for a 60 or a 70%. If you are in a stem course in a good uni, doing the course is a pain in the ass.
English
0
0
0
254
vas
vas@vasuman·
There is no alpha in graduating from college. Getting in is the hard part; pretty much everyone graduates. Therefore the optimal path is to get into the best college you can, then drop out immediately while using that name brand as signal and getting real world experience at a very fast paced and high growth job opportunity. Or idk just have fun while making friends and memories like a normal college student. Just a thought.
English
27
3
316
76.7K
Jasper Dekoninck
Jasper Dekoninck@j_dekoninck·
Last year, models miserably failed on USAMO 2025. This year, GPT-5.4 scores an amazing 95%, essentially saturating the benchmark. Yes, LLMs still make many mistakes, but overall, one can be nothing but amazed at what they are achieving and how steep progress in AI4Math is.
Jasper Dekoninck tweet media
English
27
67
578
65.4K
Bill Leoutsakos
Bill Leoutsakos@Bi11Leou·
@creepydotorg If they put 1% of this creativity in a job interview they would have been employed a long time ago
English
0
0
3
239
Creepy.org
Creepy.org@creepydotorg·
German climate activists stood on melting blocks of ice with ropes around their necks, warning that “time is running out.”
Creepy.org tweet media
English
3.1K
1.1K
28K
9.4M
vas
vas@vasuman·
> the rumors are not true > *comments off*
Karun Kaushik@karunkaushik_

Over the past week, you may have seen an anonymous post about Delve. While we responded to it in a day, we want to provide more details about what’s true, what's not, and some changes we’ve made. There’s one question behind everything: did Delve fabricate compliance evidence or issue fraudulent audit reports? No. We did not. → Delve is an AI compliance platform that connects customers with independent auditors. We are not an auditor, just as tax preparation software is not an accountant. We have never signed an audit report. → Using default templates for our customers, just like any other compliance platform, is not “faking evidence.” These are meant to serve as a starting point for customers. → Delve does have automation in the platform, with 600+ automated integration tests, an AI Copilot to guide customers through compliance, AI code scanning, and more. -- We built Delve to accelerate innovation by bringing AI to compliance. In doing that, we pushed hard on automation. However, we now realize we didn’t provide enough clarity about what is automated, what is customer-provided, and what is independently audited. We have been working relentlessly to make improvements over the last week. -- On our auditor network: Delve connects customers with independent auditors. Some customers choose their own auditors, but many use firms in our network. Questions have been raised about some of those firms, including ones used by other platforms. Going forward we will set a higher bar in how our auditor relationships are structured and how the process is experienced by customers. Delve is rebuilding our auditor network, removing firms that don’t meet our standards, and offering complimentary re-audits and penetration tests to every customer. On platform templates for our customers: Delve provides default templates, just like many other platforms, for policies, board meetings, risk assessments, and more. These are designed to be starting points only. We should have been more explicit about how they are meant to be reviewed and customized by customers. We are making that indisputably clearer within the platform. On draft audit reports: Third-party auditors are responsible for independently reviewing all evidence and issuing final reports. We built automation that interacts closely with independent audit workflows to help expedite the process on behalf of our customers. However, this contributed to confusion about where automation ends and independent judgment begins. From now on, Delve will no longer automate these parts of the process. Furthermore, customers have a direct line of communication with their auditor to enhance transparency in any audit communications. -- We started Delve because we went through compliance ourselves and saw how slow, expensive, and manual it was. To anyone that wants to sit down and discuss our product philosophy and improvements, please reach out and let’s chat about it.

English
15
16
1.1K
96.8K
Bill Leoutsakos
Bill Leoutsakos@Bi11Leou·
Why did bro limit the answers tho 🤔
Karun Kaushik@karunkaushik_

Over the past week, you may have seen an anonymous post about Delve. While we responded to it in a day, we want to provide more details about what’s true, what's not, and some changes we’ve made. There’s one question behind everything: did Delve fabricate compliance evidence or issue fraudulent audit reports? No. We did not. → Delve is an AI compliance platform that connects customers with independent auditors. We are not an auditor, just as tax preparation software is not an accountant. We have never signed an audit report. → Using default templates for our customers, just like any other compliance platform, is not “faking evidence.” These are meant to serve as a starting point for customers. → Delve does have automation in the platform, with 600+ automated integration tests, an AI Copilot to guide customers through compliance, AI code scanning, and more. -- We built Delve to accelerate innovation by bringing AI to compliance. In doing that, we pushed hard on automation. However, we now realize we didn’t provide enough clarity about what is automated, what is customer-provided, and what is independently audited. We have been working relentlessly to make improvements over the last week. -- On our auditor network: Delve connects customers with independent auditors. Some customers choose their own auditors, but many use firms in our network. Questions have been raised about some of those firms, including ones used by other platforms. Going forward we will set a higher bar in how our auditor relationships are structured and how the process is experienced by customers. Delve is rebuilding our auditor network, removing firms that don’t meet our standards, and offering complimentary re-audits and penetration tests to every customer. On platform templates for our customers: Delve provides default templates, just like many other platforms, for policies, board meetings, risk assessments, and more. These are designed to be starting points only. We should have been more explicit about how they are meant to be reviewed and customized by customers. We are making that indisputably clearer within the platform. On draft audit reports: Third-party auditors are responsible for independently reviewing all evidence and issuing final reports. We built automation that interacts closely with independent audit workflows to help expedite the process on behalf of our customers. However, this contributed to confusion about where automation ends and independent judgment begins. From now on, Delve will no longer automate these parts of the process. Furthermore, customers have a direct line of communication with their auditor to enhance transparency in any audit communications. -- We started Delve because we went through compliance ourselves and saw how slow, expensive, and manual it was. To anyone that wants to sit down and discuss our product philosophy and improvements, please reach out and let’s chat about it.

English
0
0
3
766
Cursor
Cursor@cursor_ai·
Earlier this week, we published our technical report on Composer 2. We're sharing additional research on how we train new checkpoints. With real-time RL, we can ship improved versions of the model every five hours.
Cursor tweet media
English
97
122
1.6K
454.1K
Bill Leoutsakos
Bill Leoutsakos@Bi11Leou·
@scaling01 Claude 4.5 was already phenomenal... imagine what Claude 5 is gonna be. Its over for openai and google
English
0
0
0
304
Lisan al Gaib
Lisan al Gaib@scaling01·
APRIL IS GOING TO BE SICK GPT-5.5 CLAUDE 5 MYTHOS DEEPSEEK-V4
English
120
154
3.8K
158.5K
Bill Leoutsakos
Bill Leoutsakos@Bi11Leou·
Chatgpt reachout final boss 🤦‍♂️
Bill Leoutsakos tweet media
English
0
0
0
62
Bill Leoutsakos
Bill Leoutsakos@Bi11Leou·
@DEhnts All the growth beforehand came from building houses anyway. After the 70s we never invested in anything important in tech or otherwise.
English
0
0
0
89
Dirk Ehnts
Dirk Ehnts@DEhnts·
Greece used to be at 95 percent of the EU average when it comes to GDP per capita. Now it stands at 68%. Those that talk about a "Greek recovery" need to get their facts straight. Greece stabilized at a low level of economic activity, then grew a bit, but it did not "recover".
Dirk Ehnts tweet media
EU_Eurostat@EU_Eurostat

The preliminary 2025 results show that gross domestic product (GDP) per capita — expressed in purchasing power standards — ranged between 68% of the EU average in 🇬🇷Greece and 🇧🇬Bulgaria and 239% in 🇱🇺Luxembourg. Read more 👉link.europa.eu/94N43x

English
52
389
1.2K
57K
Bill Leoutsakos
Bill Leoutsakos@Bi11Leou·
@chatgpt21 Now they will see what it has, put it in the training data, and in a year from now we ll all be like wow arc agi 3 is saturated thats insane
English
0
0
0
107
Chris
Chris@chatgpt21·
WOW! Models preform HORRIBLY on ARC AGI 3 Gemini 3.1 pro 0.37% GPT 5.4 (High) 0.26% Opus 4.5 (Max) 0.25% I wonder how long It’ll take for this benchmark to be solved
Chris tweet media
English
145
85
1.7K
165.5K
Financial Times
The billionaire’s legal team said that because Chancellor Kathaleen McCormick had liked a post celebrating his recent legal defeat and thereby created ‘a perception of bias against Mr. Musk in these cases, recusal is necessary and warranted’. ft.trib.al/6h9vFAH?
Financial Times tweet media
English
95
135
987
261.3K
Bill Leoutsakos
Bill Leoutsakos@Bi11Leou·
Pov: You moved to athens but the weather is the same as Cambridge 🤦‍♂️. When is it gonna start heatmaxxing here, when i was a kid it had like 25 degrees by the start of april at least.
Bill Leoutsakos tweet media
English
1
0
1
48