Hasan Can

4.7K posts

Hasan Can

@HCSolakoglu

SWE & AI- News, Insights Posts in ENG&TUR Exploring AI

Proxima C B Katılım Temmuz 2020

2.6K Takip Edilen1.4K Takipçiler

Hasan Can@HCSolakoglu·1d

Still wild that Codex doesn’t seem to run regression checks on the metrics that actually matter: cache hit ratio context rot input/output tokens avg runtime tool-call stats/behavior SWE-bench Pro subset score Every big PR/release should answer one question: Did the model get worse? For a product used by millions, this should be table stakes.

Tibo@thsottiaux

Some of you noticed limits drained faster in Codex, we root caused it to an optimization that we rolled back that had an impact on cache hit rates when compacting across long running sessions. We fixed this and have now reset usage limits for all accounts. Enjoy the weekend.

English

151

19.4K

Hasan Can@HCSolakoglu·1d

@thsottiaux Yep, /slow mode would’ve been great for /goal tasks. It’d be even better if interactive coding and /goal /slow also ran at different speeds.

English

105

Tibo@thsottiaux·1d

Should we bring batch compute to codex? Aka /slow mode

English

1.1K

4.8K

230.8K

Hasan Can@HCSolakoglu·2d

Gemini 3.5 Flash is definitely much better in AI Studio than it is in Gemini app. I don’t know how Google manages it, but Gemini app consistently feels heavily constrained by its system and orchestration layer, to point where it performs noticeably worse than raw model.

English

782

79K

Hasan Can retweetledi

Peter Gostev@petergostev·2d

So average days between releases is 52 days, if we exclude the first long one, it is 40 days. So a couple of weeks at least is a reasonable bet. Could be a bit longer if it is a new pre-train and they need more time to adjust.

English

179

108.7K

Hasan Can retweetledi

DeepSeek@deepseek_ai·3d

We are making our discount permanent! 🎉 Enjoy building with DeepSeek-V4-Pro and bring your innovative ideas to life! 🚀

DeepSeek@deepseek_ai

The DeepSeek-V4-Pro discount has been extended until May 31, 2026, 15:59 UTC!

English

1.3K

2.7K

23.3K

6.4M

Hasan Can@HCSolakoglu·3d

Good. This still isn’t over until Gemini AI Pro limits become comparable to ChatGPT Plus. Limits are still behind Codex and ChatGPT limits. ChatGPT Plus gives around 3k weekly GPT-5.5 Thinking usage, and that model is extremely agentic. Google has power and resources to do this.

Varun Mohan@_mohansolo

Yesterday, we 3x’d limits on Antigravity and are seeing you build so much more. One thing we heard was people are worried about hitting their weekly limits after a couple work sessions. To give you more runway, we’re 3x’ing the weekly Gemini quotas AGAIN on all paid plans. We’ve also gone ahead and reset Gemini quotas on all paid plans. Don’t stop building!

English

616

Hasan Can@HCSolakoglu·3d

Lately, Google has been disappointing on multiple fronts. From models they’ve released to changes in their consumer apps, restrictive usage limits, and overall direction company seems to be heading in, it’s all been a major letdown.

Mechanize@MechanizeWork

We evaluated Gemini 3.5 Flash on GBA Eval. It could not build a working GBA emulator. On Piugba, the game just flashes on screen, unplayable and with no sound. Overall, it achieves a score of 6.7%.

English

1.5K

Hasan Can retweetledi

Ali Hatamizadeh@ahatamiz1·3d

Gated DeltaNet-2 is here. 🚀 🔥 New paper: Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention Gated DeltaNet-2 outperforms KDA and Mamba-3, the latest and best recurrent architectures, head to head at 1.3B. 🏆 💡 Here's the idea behind it: Linear attention squeezes an unbounded KV cache into a fixed-size recurrent state. The hard part isn't just what to forget, it's how to edit that memory without scrambling the associations already in it. Prior delta-rule models like Gated DeltaNet and KDA use one scalar gate to do two jobs at once: erasing old content and writing new content. But these two decisions act on different axes of the state, so tying them together is a real limitation. Gated DeltaNet-2 decouples them. ✂️ a channel-wise erase gate b_t picks which key-side coordinates to read and remove ✍️ a channel-wise write gate w_t picks which value-side coordinates to commit 🔁 recovers KDA when both gates collapse to a scalar, and Gated DeltaNet when the decay collapses too ⚡ still trains fast: chunkwise WY algorithm with gate-aware backward, fused in Triton 📊 Results: We train 1.3B models on 100B tokens of FineWeb-Edu, matched in recurrent state size, against Mamba-2, Gated DeltaNet, KDA, and Mamba-3. Best average on language modeling + commonsense reasoning, in both recurrent and hybrid settings Biggest gains on long-context RULER retrieval. S-NIAH-3 jumps from 63 to 90 over KDA, and multi-key needle retrieval climbs from 28 to 38 Joint work with @YejinChoinka and @jankautz. 📄 Paper: shorturl.at/AAlVb 💻 Code: github.com/NVlabs/GatedDe… #LinearAttention #StateSpaceModels #Mamba #LLM

English

644

180.5K

Hasan Can retweetledi

spidey@lochan_twt·4d

"Claude usage limit reached. Your limit will reset at 3:30 PM"

English

113

2.6K

27.1K

748.5K

Hasan Can retweetledi

Capybara@retroniccs·4d

How to fix the insane usage limits of new Gemini: Cancel the subscription and move to ChatGPT or Claude. ✌️

English

295

5.1K

Hasan Can retweetledi

ModelScope@ModelScope2022·4d

Tencent HY just open-sourced Hy-MT2, a multilingual translation model series with Dense and MoE variants. 🚀 🤖 modelscope.ai/collections/Te… 🌟 The standout: 1.8B with 1.25-bit quantization (via AngelSlim) fits in just 440MB and runs 1.5x faster than traditional 4-bit inference on Apple A15. Practical on-device translation without the usual storage or speed tradeoff. 🏆 Three variants across 33 languages and 5 Chinese dialects: - 1.8B: outperforms Microsoft Translate and other commercial APIs on FLORES-200 - 7B and 30B-A3B: beat DeepSeek-V4-Pro, reaching 97.9% and 98.6% of Gemini 3.1 Pro (Think) - All three hit 96%~99% of Gemini 3.1 Pro (Think) on real-world and domain benchmarks. IFMTBench (translation instruction-following eval) also open-sourced alongside.

English

6.4K

Hasan Can retweetledi

Anshul Ramachandran@_anshulr·4d

3x limit increase forever

Varun Mohan@_mohansolo

An update: we’re 3xing the rate limits for Gemini models across all paid tiers in Antigravity and resetting everyone’s Gemini quota for the week. We understand some people hit their rate limits quickly and wanted to respond fast. Lots more to come and enjoy building!

English

432

44.6K

Hasan Can retweetledi

Kushal Byatnal@kushalbyatnal·5d

we've been benchmarking Gemini 3.5 Flash internally after the release yesterday...and the results don't paint a great picture so far it barely edges out 3 flash in most cases on our long-horizon tasks, and when it does win, it's at the cost of completion rate It's a bit quicker, but the ~3x cost increase is hard to justify in production

English

113

9.6K

Hasan Can@HCSolakoglu·4d

Google made a huge mistake by killing Gemini CLI. In same way, it basically destroyed its consumer facing apps in a single day by turning paid subscriptions of millions of users into trash with extremely low usage limits. Using Google's models through API has become much more reasonable. At least it is not a recurring monthly fee you have to pay regularly. And models are not even that good. Even GPT-5.5 Instant is better than Gemini models.

English

3.8K

Hasan Can retweetledi

Dwayne@CtrlAltDwayne·4d

Google are still behind AI compared to other SOTA labs and yet they're acting like they're the ones winning. These new changes for Gemini and AI usage are actually even less generous and worse than Anthropic. Who is making these terrible decisions?

English

100

2.5K

Hasan Can retweetledi

Michael Truell@mntruell·5d

Gemini Flash 3.5 is now on CursorBench, our main coding agent eval. We’ll keep updating the leaderboard as new models come out. cursor.com/evals

English

104

1.3K

1.4M

Hasan Can@HCSolakoglu·5d

Another ridiculous pricing strategy from Google.

Ariel@Rob3rtWozny

Not only does Gemini 3.5 Flash cost more to run Artificial Analysis Index benchmark, but it's also dumber. You pay more for less intelligence, RiP. 😅

English

252

Hasan Can@HCSolakoglu·6d

It seems like Google's compute bottleneck has started to reflect in their pricing as well. I don't think this price increase is solely related to improvements in model intelligence.

Logan Kilpatrick@OfficialLoganK

Welcome to Gemini 3.5 Flash, our most powerful model to date. It pushes the frontier of intelligence, speed, and cost putting 3.5 Flash in a class of its own. We spent the last 6 months making sure Flash is great for real world use cases. It's available everywhere now!

English

166

Hasan Can@HCSolakoglu·16 May

@mweinbach Good news x.com/thsottiaux/sta…

Tibo@thsottiaux

We found and fixed two issues that could explain this degradation of the capability of GPT-5.5 in Codex over the last ~ 48 hours. We are monitoring over the coming hours to fully confirm and I will reset usage limits this evening. Apologies and now is the time for /fast maxxing.

English

Max Weinbach@mweinbach·16 May

@HCSolakoglu I don't like how Sam dismissed this the way he did

English

183

Max Weinbach@mweinbach·15 May

Pass rate dropped 6% today for GPT 5.5 from Margin Labs, something def changed Feels like we're back to GPT 5.4

Max Weinbach@mweinbach

I’m not trying to be that guy But GPT 5.5 feels significantly worse today. I don’t think I’m alone with that one.

English

857

144.4K

Hasan Can retweetledi

Tibo@thsottiaux·15 May

Codex team is aware of reports of GPT-5.5 performing worse for some users and investigating. We don't have anything conclusive yet and systems are healthy but we will share updates as we go.

English

629

167

5.5K

1.8M

Keşfet

@thsottiaux @YejinChoinka @jankautz @mweinbach @elonmusk @BarackObama @taylorswift13 @cristiano