Aleksandr Chuklin

795 posts

Aleksandr Chuklin banner
Aleksandr Chuklin

Aleksandr Chuklin

@varphi

Research Engineer @GoogleDeepMind · Gemini post-training #LLM #RLHF · PhD from @irlab_amsterdam

Aarau, Switzerland 加入时间 Ağustos 2011
375 关注634 粉丝
置顶推文
Aleksandr Chuklin
Aleksandr Chuklin@varphi·
When watching the "3-body problem" on Netflix, I was sure that it was an LLM that Evans interacted with. Not to spoil much, but the actual plot is more boring. Is it even feasible? Turns out I was not the first to think about it. arstechnica.com/information-te…
English
0
0
0
215
Aleksandr Chuklin
Aleksandr Chuklin@varphi·
(Right now all savings/investments are treated as family property for wealth and income tax purposes. If the half-baked law were to pass, one should have them registered in the name of the lowest earner: an accounting exercise with little effect on marital property split.)
English
0
0
0
27
Aleksandr Chuklin
Aleksandr Chuklin@varphi·
A vote on individual taxation is coming soon. What I don't like about it: - Voting on March 8 is symbolic. It should have been avoided - A narrative that "only ~15% will pay more" is misleading. We don't yet know what happens for the cantonal/Gemeinde taxes 🧵
English
2
0
0
41
Aleksandr Chuklin
Aleksandr Chuklin@varphi·
- The fact that the cantons bend over the knee is not very federalistic (which is why the cantonal Referendum was called) - The requirement to fill two declarations could have been avoided - The "fake divorce" one needs to do to avoid overpaying is ridiculous
English
0
0
0
21
Aleksandr Chuklin 已转推
Jeff Dean
Jeff Dean@JeffDean·
Gemini 3 also scores well on the lmarena leaderboards, ranking #1 across all the major @arena leaderboards.
Arena.ai@arena

🚨BREAKING: @GoogleDeepMind’s Gemini-3-Pro is now #1 across all major Arena leaderboards 🥇#1 in Text, Vision, and WebDev - surpassing Grok-4.1, Claude-4.5, and GPT-5 🥇#1 in Coding, Math, Creative Writing, Long Queries, and nearly all occupational leaderboards. Massive gains over Gemini-2.5: 🔸WebDev in Code Arena: 1487 (+280 pts vs 2.5) 🔸Text: 1501 (+50 pts) 🔸Vision: 1328 (+70 pts) 🔸Arena Expert: Top-3 (just 3 pts behind #1) Huge congrats to the @GoogleDeepMind team on this breakthrough! 👏

English
16
34
547
50K
Aleksandr Chuklin
Aleksandr Chuklin@varphi·
Our media says, "we can repeat history." We can't. But we can keep making the same tragic mistakes. When German media says "this time is different," I remember 22.06.1941 – air raids hitting my hometown during graduation celebrations. Were they saying it was different then, too?
English
0
0
0
63
Aleksandr Chuklin
Aleksandr Chuklin@varphi·
Reading a German leader call for German weapons to strike Crimea & the Kerch Bridge, my home, reopens painful wounds. My great-grandfather crossed those same straits to fight the Germans in WWII. He died on German soil. Countless German ancestors lie buried in Crimean soil...
English
1
0
1
129
Lifan Yuan
Lifan Yuan@lifan__yuan·
How to unlock advanced reasoning via scalable RL? 🚀Introducing PRIME (Process Reinforcement through Implicit Rewards) and Eurus-2, trained from Base model to surpass Qwen2.5-Math-Instruct using only 1/10 of the data. We're still scaling up - w/ 3x more training data to go! 🧵
Lifan Yuan tweet mediaLifan Yuan tweet media
English
18
174
1K
238.5K
Aleksandr Chuklin 已转推
Lex Fridman
Lex Fridman@lexfridman·
We need to invest in peace not war. We need to invest in innovation, in our education system, and in building epic things that benefit humanity.
English
4.1K
3K
29.9K
2.7M
Aleksandr Chuklin
Aleksandr Chuklin@varphi·
Why do some articles claim that Grok 3 is available for free users? All I see is an invitation to purchase Premium, so that I can access Grok 2. Thanks, but, no, thanks.
English
0
0
0
233
Aleksandr Chuklin
Aleksandr Chuklin@varphi·
Wow! Being able to use the "blankes" gates for crossing the border makes a huge difference! Thank you, Switzerland, for removing this one segregation of residents.
English
0
0
0
146
Aleksandr Chuklin
Aleksandr Chuklin@varphi·
Why would anyone who buys a Tesla Model X drive for Uber? Maybe I'm missing something, but this just doesn't make sense to me...
English
0
0
1
119
Aleksandr Chuklin 已转推
Xin Eric Wang
Xin Eric Wang@xwang_lk·
DeepSeek did something no one else has done before—it made my parents show true interest in what I am working on for the first time. Really.
English
18
35
1.3K
70.9K
Aleksandr Chuklin 已转推
Thomas Wolf
Thomas Wolf@Thom_Wolf·
Finally took time to go over Dario's essay on DeepSeek and export control and to be honest it was quite painful to read. And I say this as a great admirer of Anthropic and big user of Claude* The first half of the essay reads like a lengthy attempt to justify that closed-source models are still significantly ahead of DeepSeek. However, it mostly refers to internal unpublished evals which limit the credit you can give it, and statements like « DeepSeek-V3 is close to SOTA models and stronger on some very narrow tasks » transforming in a general conclusion « DeepSeek-V3 is actually worse than those US frontier models — let’s say by ~2x on the scaling curve » left me generally doubtful. The same applies to the takeaway that all discoveries and efficiency improvements of DeepSeek have been discovered long ago by closed-models companies, this statement mostly resulting from a comparison of DeepSeek openly published $6M training numbers with some vague « few $10M » on Anthropic side without providing much more details. I have no doubts the Anthropic team is extremely talented and I’ve regularly shared how impressed I am with Sonnet 3.5 but this longwinded comparison of open research with vague closed research and undisclosed evals has left me less convinced of their lead than I was before I reading it. Even more frustrating was the second half of the essay which dive into the US-China race scenario and totally misses the point that the DeepSeek model is open-weights, and largely open-knowledge due to its detailed tech report (and feel free to follow Hugging Face’s open-r1 reproduction project for the remaining non-public part: the synthetic dataset). If both DeepSeek and Anthropic models had been closed source, yes the arm-race interpretation could have make sense but having one of the model freely widely available for download and with detailed scientific report renders the whole « close-source arm-race competition » argument artificial and unconvincing in my opinion. Here is the thing: open-source knows no border. Both in its usage and its creation. Every company in the world, be it in Europe, Africa, South-America or the USA can now directly download and use DeepSeek without sending data to a specific country (China for instance) or depending on a specific company or server for running the core part of its technology. And just like most open-source library in the world are typically built by contributors from all over the world, we’ve already seen several hundred derivative models on the Hugging Face hub created everywhere in the world by teams adapting the original model to their specific use cases and explorations. What's more, with the open-r1 reproduction and the DeepSeek paper, the coming months will clearly see many open-source reasoning models being released by teams from all over the world. Just today, two other teams, AllenAI in Seattle and Mistral in Paris both independently released open-source base models (Tülu and Small3) which are already challenging the new state-of-the-art (with AllenAI indicating that its Tülu model surpasses the performance of DeepSeek-V3). And the scope is even much broader than this geographical aspect. Here is the thing we don’t talk nearly enough about: open-source will be more and more essential for our… safety! As AI becomes central to our lives, resiliency will increasingly become a very important element of this technology. Today we’re dependent on internet access for almost everything. Without access to the internet, we lose all our social media/news feeds, can’t order a taxi, book a restaurant, or reach someone on WhatsApp. Now imagine an alternate world to ours where all the data transiting through the internet would have to go through a single company’s data centers. The day this company suffers a single outage, the whole world would basically stop spinning (picture the recent CrowdStrike outage magnified a millionfold). Soon, as AI assistants and AI technology permeate our whole life to simplify many of our online and offline tasks, we (and companies using AI) will start to depend more on more on this technology for our daily activities and we will similarly start to find annoying or even painful any downtime in these AI assistants from outages. The most optimal way to avoid future downtime situations will be to build resilience deep in our technological chain. Open-source has many advantages like shared training costs, tunability, control, ownership, privacy but one of its most fundamental virtue in the long term –as AI becomes deeply embedded in our world– will likely be its strong resilience. It is one of the most straightforward and cost-effective ways to easily distribute compute across many independent providers and to even run models locally and on device with minimal complexity. More than national prides and competitions, I think it’s time to start thinking globally about the challenges and social changes that AI will bring everywhere in the world. And open-source technology is likely our most important asset for safely transitioning to a resilient digital future where AI is integrated into all aspects of society. *Claude is my default LLM for complex coding. I also love its character with hesitations and pondering, like a prelude to the chain-of-thoughts of more recent reasoning models like DeepSeek generations.
English
109
484
2.8K
393.9K
Tsarathustra
Tsarathustra@tsarnick·
Anthropic CEO Dario Amodei says while DeepSeek may be able to smuggle 50,000 H100s, it would be very difficult to smuggle the hundreds of thousands or millions of chips required to continue to compete with American companies in AI
English
301
140
1.4K
662.8K
Aleksandr Chuklin 已转推
AshutoshShrivastava
AshutoshShrivastava@ai_for_success·
Google is the only company that isn’t hyping even their good stuff. The new Google 2.0 Flash Thinking model is absolutely amazing and insanely cheap too. It's very close to DeepSeek R1 in performance too.
AshutoshShrivastava tweet media
English
55
66
762
66.7K
Aleksandr Chuklin
Aleksandr Chuklin@varphi·
Congrats to the colleagues who worked on the thinking model and to the entire Gemini team!
Arena.ai@arena

New Gemini-2.0-Flash-Thinking is now #1 in Chatbot Arena⚡🤔 Highlights: - Scores highest, overtaking Gemini-Exp-1206 - +17 pts boost over the previous 1219 checkpoint - #1 across all domains (hard, coding, creativity) except style control Congrats to the @GoogleDeepMind team keep pushing the frontier! Stay tuned for the next release and check out more analysis below👇

English
0
0
4
185
Aleksandr Chuklin 已转推
Google Gemini
Google Gemini@GeminiApp·
We're excited to introduce Gemini 2.0 - our most capable AI model yet - with 2.0 Flash Experimental. Starting today, all Gemini users can now try out a chat-optimized version of Gemini 2.0 Flash Experimental, with enhanced performance on a number of key benchmarks and speed. With this new model, Gemini 2.0 will unlock an even more helpful Gemini assistant. Visit gemini.google.com and select it from the model drop-down to get started, and learn more about today's updates here: goo.gle/4fcwlNB
Google Gemini tweet media
English
45
239
1.5K
121.1K
Aleksandr Chuklin
Aleksandr Chuklin@varphi·
I'm proud to have contributed to the model we are releasing today. It improves along so many dimensions! Try it out in the model dropdown under gemini.google.com
Google DeepMind@GoogleDeepMind

Welcome to the world, Gemini 2.0 ✨ our most capable AI model yet. We're first releasing an experimental version of 2.0 Flash ⚡ It has better performance, new multimodal output, @Google tool use - and paves the way for new agentic experiences. 🧵 goo.gle/gemini-2

English
0
0
4
178