Aleksandr Chuklin

795 posts

Aleksandr Chuklin

@varphi

Research Engineer @GoogleDeepMind · Gemini post-training #LLM #RLHF · PhD from @irlab_amsterdam

Aarau, Switzerland 加入时间 Ağustos 2011

375 关注634 粉丝

置顶推文

Aleksandr Chuklin@varphi·11 Oca

When watching the "3-body problem" on Netflix, I was sure that it was an LLM that Evans interacted with. Not to spoil much, but the actual plot is more boring. Is it even feasible? Turns out I was not the first to think about it. arstechnica.com/information-te…

English

215

Aleksandr Chuklin@varphi·25 Şub

(Right now all savings/investments are treated as family property for wealth and income tax purposes. If the half-baked law were to pass, one should have them registered in the name of the lowest earner: an accounting exercise with little effect on marital property split.)

English

Aleksandr Chuklin@varphi·25 Şub

A vote on individual taxation is coming soon. What I don't like about it: - Voting on March 8 is symbolic. It should have been avoided - A narrative that "only ~15% will pay more" is misleading. We don't yet know what happens for the cantonal/Gemeinde taxes 🧵

English

Aleksandr Chuklin@varphi·25 Şub

- The fact that the cantons bend over the knee is not very federalistic (which is why the cantonal Referendum was called) - The requirement to fill two declarations could have been avoided - The "fake divorce" one needs to do to avoid overpaying is ridiculous

English

Aleksandr Chuklin 已转推

Jeff Dean@JeffDean·18 Kas

Gemini 3 also scores well on the lmarena leaderboards, ranking #1 across all the major @arena leaderboards.

Arena.ai@arena

🚨BREAKING: @GoogleDeepMind’s Gemini-3-Pro is now #1 across all major Arena leaderboards 🥇#1 in Text, Vision, and WebDev - surpassing Grok-4.1, Claude-4.5, and GPT-5 🥇#1 in Coding, Math, Creative Writing, Long Queries, and nearly all occupational leaderboards. Massive gains over Gemini-2.5: 🔸WebDev in Code Arena: 1487 (+280 pts vs 2.5) 🔸Text: 1501 (+50 pts) 🔸Vision: 1328 (+70 pts) 🔸Arena Expert: Top-3 (just 3 pts behind #1) Huge congrats to the @GoogleDeepMind team on this breakthrough! 👏

English

547

50K

Aleksandr Chuklin@varphi·16 Nis

Our media says, "we can repeat history." We can't. But we can keep making the same tragic mistakes. When German media says "this time is different," I remember 22.06.1941 – air raids hitting my hometown during graduation celebrations. Were they saying it was different then, too?

English

Aleksandr Chuklin@varphi·16 Nis

Reading a German leader call for German weapons to strike Crimea & the Kerch Bridge, my home, reopens painful wounds. My great-grandfather crossed those same straits to fight the Germans in WWII. He died on German soil. Countless German ancestors lie buried in Crimean soil...

English

129

Aleksandr Chuklin@varphi·3 Mar

@lifan__yuan Hey, I really enjoyed your talk last week. Could you, maybe, share the slides?

English

10.6K

Lifan Yuan@lifan__yuan·2 Oca

How to unlock advanced reasoning via scalable RL? 🚀Introducing PRIME (Process Reinforcement through Implicit Rewards) and Eurus-2, trained from Base model to surpass Qwen2.5-Math-Instruct using only 1/10 of the data. We're still scaling up - w/ 3x more training data to go! 🧵

English

174

238.5K

Aleksandr Chuklin 已转推

Lex Fridman@lexfridman·2 Mar

We need to invest in peace not war. We need to invest in innovation, in our education system, and in building epic things that benefit humanity.

English

4.1K

29.9K

2.7M

Aleksandr Chuklin@varphi·28 Şub

@elonmusk Depends on which country.

English

Elon Musk@elonmusk·26 Şub

This is a good question. What would you do?

DogeDesigner@cb_doge

You wake up tomorrow as the President. What’s the first change you would make?

English

61.2K

6.8K

76.9K

57.7M

Aleksandr Chuklin@varphi·18 Şub

Why do some articles claim that Grok 3 is available for free users? All I see is an invitation to purchase Premium, so that I can access Grok 2. Thanks, but, no, thanks.

English

233

Aleksandr Chuklin@varphi·8 Şub

Wow! Being able to use the "blankes" gates for crossing the border makes a huge difference! Thank you, Switzerland, for removing this one segregation of residents.

English

146

Aleksandr Chuklin@varphi·7 Şub

Why would anyone who buys a Tesla Model X drive for Uber? Maybe I'm missing something, but this just doesn't make sense to me...

English

119

Aleksandr Chuklin 已转推

Xin Eric Wang@xwang_lk·4 Şub

DeepSeek did something no one else has done before—it made my parents show true interest in what I am working on for the first time. Really.

English

1.3K

70.9K

Aleksandr Chuklin 已转推

Thomas Wolf@Thom_Wolf·31 Oca

Finally took time to go over Dario's essay on DeepSeek and export control and to be honest it was quite painful to read. And I say this as a great admirer of Anthropic and big user of Claude* The first half of the essay reads like a lengthy attempt to justify that closed-source models are still significantly ahead of DeepSeek. However, it mostly refers to internal unpublished evals which limit the credit you can give it, and statements like « DeepSeek-V3 is close to SOTA models and stronger on some very narrow tasks » transforming in a general conclusion « DeepSeek-V3 is actually worse than those US frontier models — let’s say by ~2x on the scaling curve » left me generally doubtful. The same applies to the takeaway that all discoveries and efficiency improvements of DeepSeek have been discovered long ago by closed-models companies, this statement mostly resulting from a comparison of DeepSeek openly published $6M training numbers with some vague « few $10M » on Anthropic side without providing much more details. I have no doubts the Anthropic team is extremely talented and I’ve regularly shared how impressed I am with Sonnet 3.5 but this longwinded comparison of open research with vague closed research and undisclosed evals has left me less convinced of their lead than I was before I reading it. Even more frustrating was the second half of the essay which dive into the US-China race scenario and totally misses the point that the DeepSeek model is open-weights, and largely open-knowledge due to its detailed tech report (and feel free to follow Hugging Face’s open-r1 reproduction project for the remaining non-public part: the synthetic dataset). If both DeepSeek and Anthropic models had been closed source, yes the arm-race interpretation could have make sense but having one of the model freely widely available for download and with detailed scientific report renders the whole « close-source arm-race competition » argument artificial and unconvincing in my opinion. Here is the thing: open-source knows no border. Both in its usage and its creation. Every company in the world, be it in Europe, Africa, South-America or the USA can now directly download and use DeepSeek without sending data to a specific country (China for instance) or depending on a specific company or server for running the core part of its technology. And just like most open-source library in the world are typically built by contributors from all over the world, we’ve already seen several hundred derivative models on the Hugging Face hub created everywhere in the world by teams adapting the original model to their specific use cases and explorations. What's more, with the open-r1 reproduction and the DeepSeek paper, the coming months will clearly see many open-source reasoning models being released by teams from all over the world. Just today, two other teams, AllenAI in Seattle and Mistral in Paris both independently released open-source base models (Tülu and Small3) which are already challenging the new state-of-the-art (with AllenAI indicating that its Tülu model surpasses the performance of DeepSeek-V3). And the scope is even much broader than this geographical aspect. Here is the thing we don’t talk nearly enough about: open-source will be more and more essential for our… safety! As AI becomes central to our lives, resiliency will increasingly become a very important element of this technology. Today we’re dependent on internet access for almost everything. Without access to the internet, we lose all our social media/news feeds, can’t order a taxi, book a restaurant, or reach someone on WhatsApp. Now imagine an alternate world to ours where all the data transiting through the internet would have to go through a single company’s data centers. The day this company suffers a single outage, the whole world would basically stop spinning (picture the recent CrowdStrike outage magnified a millionfold). Soon, as AI assistants and AI technology permeate our whole life to simplify many of our online and offline tasks, we (and companies using AI) will start to depend more on more on this technology for our daily activities and we will similarly start to find annoying or even painful any downtime in these AI assistants from outages. The most optimal way to avoid future downtime situations will be to build resilience deep in our technological chain. Open-source has many advantages like shared training costs, tunability, control, ownership, privacy but one of its most fundamental virtue in the long term –as AI becomes deeply embedded in our world– will likely be its strong resilience. It is one of the most straightforward and cost-effective ways to easily distribute compute across many independent providers and to even run models locally and on device with minimal complexity. More than national prides and competitions, I think it’s time to start thinking globally about the challenges and social changes that AI will bring everywhere in the world. And open-source technology is likely our most important asset for safely transitioning to a resilient digital future where AI is integrated into all aspects of society. *Claude is my default LLM for complex coding. I also love its character with hesitations and pondering, like a prelude to the chain-of-thoughts of more recent reasoning models like DeepSeek generations.

English

109

484

2.8K

393.9K

Aleksandr Chuklin@varphi·29 Oca

@tsarnick I thought better of this guy and his company...

English

Tsarathustra@tsarnick·27 Oca

Anthropic CEO Dario Amodei says while DeepSeek may be able to smuggle 50,000 H100s, it would be very difficult to smuggle the hundreds of thousands or millions of chips required to continue to compete with American companies in AI

English

301

140

1.4K

662.8K

Aleksandr Chuklin 已转推

AshutoshShrivastava@ai_for_success·27 Oca

Google is the only company that isn’t hyping even their good stuff. The new Google 2.0 Flash Thinking model is absolutely amazing and insanely cheap too. It's very close to DeepSeek R1 in performance too.

English

762

66.7K

Aleksandr Chuklin@varphi·23 Oca

Congrats to the colleagues who worked on the thinking model and to the entire Gemini team!

Arena.ai@arena

New Gemini-2.0-Flash-Thinking is now #1 in Chatbot Arena⚡🤔 Highlights: - Scores highest, overtaking Gemini-Exp-1206 - +17 pts boost over the previous 1219 checkpoint - #1 across all domains (hard, coding, creativity) except style control Congrats to the @GoogleDeepMind team keep pushing the frontier! Stay tuned for the next release and check out more analysis below👇

English

185

Aleksandr Chuklin 已转推

Google Gemini@GeminiApp·11 Ara

We're excited to introduce Gemini 2.0 - our most capable AI model yet - with 2.0 Flash Experimental. Starting today, all Gemini users can now try out a chat-optimized version of Gemini 2.0 Flash Experimental, with enhanced performance on a number of key benchmarks and speed. With this new model, Gemini 2.0 will unlock an even more helpful Gemini assistant. Visit gemini.google.com and select it from the model drop-down to get started, and learn more about today's updates here: goo.gle/4fcwlNB

English

239

1.5K

121.1K

Aleksandr Chuklin@varphi·11 Ara

I'm proud to have contributed to the model we are releasing today. It improves along so many dimensions! Try it out in the model dropdown under gemini.google.com

Google DeepMind@GoogleDeepMind

Welcome to the world, Gemini 2.0 ✨ our most capable AI model yet. We're first releasing an experimental version of 2.0 Flash ⚡ It has better performance, new multimodal output, @Google tool use - and paves the way for new agentic experiences. 🧵 goo.gle/gemini-2

English

178

发现

@arena @lifan__yuan @elonmusk @tsarnick @BarackObama @taylorswift13 @cristiano @BillGates