Hasan Can

4.7K posts

Hasan Can banner
Hasan Can

Hasan Can

@HCSolakoglu

SWE & AI- News, Insights Posts in ENG&TUR Exploring AI

Proxima C B Katılım Temmuz 2020
2.6K Takip Edilen1.4K Takipçiler
Hasan Can
Hasan Can@HCSolakoglu·
Still wild that Codex doesn’t seem to run regression checks on the metrics that actually matter: cache hit ratio context rot input/output tokens avg runtime tool-call stats/behavior SWE-bench Pro subset score Every big PR/release should answer one question: Did the model get worse? For a product used by millions, this should be table stakes.
Tibo@thsottiaux

Some of you noticed limits drained faster in Codex, we root caused it to an optimization that we rolled back that had an impact on cache hit rates when compacting across long running sessions. We fixed this and have now reset usage limits for all accounts. Enjoy the weekend.

English
10
3
151
19.4K
Hasan Can
Hasan Can@HCSolakoglu·
@thsottiaux Yep, /slow mode would’ve been great for /goal tasks. It’d be even better if interactive coding and /goal /slow also ran at different speeds.
English
0
0
3
105
Tibo
Tibo@thsottiaux·
Should we bring batch compute to codex? Aka /slow mode
English
1.1K
60
4.8K
230.8K
Hasan Can
Hasan Can@HCSolakoglu·
Gemini 3.5 Flash is definitely much better in AI Studio than it is in Gemini app. I don’t know how Google manages it, but Gemini app consistently feels heavily constrained by its system and orchestration layer, to point where it performs noticeably worse than raw model.
English
25
22
782
79K
Hasan Can retweetledi
Peter Gostev
Peter Gostev@petergostev·
So average days between releases is 52 days, if we exclude the first long one, it is 40 days. So a couple of weeks at least is a reasonable bet. Could be a bit longer if it is a new pre-train and they need more time to adjust.
Peter Gostev tweet media
English
13
10
179
108.7K
Hasan Can
Hasan Can@HCSolakoglu·
Good. This still isn’t over until Gemini AI Pro limits become comparable to ChatGPT Plus. Limits are still behind Codex and ChatGPT limits. ChatGPT Plus gives around 3k weekly GPT-5.5 Thinking usage, and that model is extremely agentic. Google has power and resources to do this.
Varun Mohan@_mohansolo

Yesterday, we 3x’d limits on Antigravity and are seeing you build so much more. One thing we heard was people are worried about hitting their weekly limits after a couple work sessions. To give you more runway, we’re 3x’ing the weekly Gemini quotas AGAIN on all paid plans. We’ve also gone ahead and reset Gemini quotas on all paid plans. Don’t stop building!

English
0
0
3
616
Hasan Can
Hasan Can@HCSolakoglu·
Lately, Google has been disappointing on multiple fronts. From models they’ve released to changes in their consumer apps, restrictive usage limits, and overall direction company seems to be heading in, it’s all been a major letdown.
Mechanize@MechanizeWork

We evaluated Gemini 3.5 Flash on GBA Eval. It could not build a working GBA emulator. On Piugba, the game just flashes on screen, unplayable and with no sound. Overall, it achieves a score of 6.7%.

English
0
1
25
1.5K
Hasan Can retweetledi
Ali Hatamizadeh
Ali Hatamizadeh@ahatamiz1·
Gated DeltaNet-2 is here. 🚀 🔥 New paper: Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention Gated DeltaNet-2 outperforms KDA and Mamba-3, the latest and best recurrent architectures, head to head at 1.3B. 🏆 💡 Here's the idea behind it: Linear attention squeezes an unbounded KV cache into a fixed-size recurrent state. The hard part isn't just what to forget, it's how to edit that memory without scrambling the associations already in it. Prior delta-rule models like Gated DeltaNet and KDA use one scalar gate to do two jobs at once: erasing old content and writing new content. But these two decisions act on different axes of the state, so tying them together is a real limitation. Gated DeltaNet-2 decouples them. ✂️ a channel-wise erase gate b_t picks which key-side coordinates to read and remove ✍️ a channel-wise write gate w_t picks which value-side coordinates to commit 🔁 recovers KDA when both gates collapse to a scalar, and Gated DeltaNet when the decay collapses too ⚡ still trains fast: chunkwise WY algorithm with gate-aware backward, fused in Triton 📊 Results: We train 1.3B models on 100B tokens of FineWeb-Edu, matched in recurrent state size, against Mamba-2, Gated DeltaNet, KDA, and Mamba-3. Best average on language modeling + commonsense reasoning, in both recurrent and hybrid settings Biggest gains on long-context RULER retrieval. S-NIAH-3 jumps from 63 to 90 over KDA, and multi-key needle retrieval climbs from 28 to 38 Joint work with @YejinChoinka and @jankautz. 📄 Paper: shorturl.at/AAlVb 💻 Code: github.com/NVlabs/GatedDe… #LinearAttention #StateSpaceModels #Mamba #LLM
Ali Hatamizadeh tweet media
English
21
99
644
180.5K
Hasan Can retweetledi
spidey
spidey@lochan_twt·
"Claude usage limit reached. Your limit will reset at 3:30 PM"
English
113
2.6K
27.1K
748.5K
Hasan Can retweetledi
Capybara
Capybara@retroniccs·
How to fix the insane usage limits of new Gemini: Cancel the subscription and move to ChatGPT or Claude. ✌️
English
9
13
295
5.1K
Hasan Can retweetledi
ModelScope
ModelScope@ModelScope2022·
Tencent HY just open-sourced Hy-MT2, a multilingual translation model series with Dense and MoE variants. 🚀 🤖 modelscope.ai/collections/Te… 🌟 The standout: 1.8B with 1.25-bit quantization (via AngelSlim) fits in just 440MB and runs 1.5x faster than traditional 4-bit inference on Apple A15. Practical on-device translation without the usual storage or speed tradeoff. 🏆 Three variants across 33 languages and 5 Chinese dialects: - 1.8B: outperforms Microsoft Translate and other commercial APIs on FLORES-200 - 7B and 30B-A3B: beat DeepSeek-V4-Pro, reaching 97.9% and 98.6% of Gemini 3.1 Pro (Think) - All three hit 96%~99% of Gemini 3.1 Pro (Think) on real-world and domain benchmarks. IFMTBench (translation instruction-following eval) also open-sourced alongside.
ModelScope tweet media
English
2
12
94
6.4K
Hasan Can retweetledi
Kushal Byatnal
Kushal Byatnal@kushalbyatnal·
we've been benchmarking Gemini 3.5 Flash internally after the release yesterday...and the results don't paint a great picture so far it barely edges out 3 flash in most cases on our long-horizon tasks, and when it does win, it's at the cost of completion rate It's a bit quicker, but the ~3x cost increase is hard to justify in production
Kushal Byatnal tweet mediaKushal Byatnal tweet mediaKushal Byatnal tweet media
English
6
6
113
9.6K
Hasan Can
Hasan Can@HCSolakoglu·
Google made a huge mistake by killing Gemini CLI. In same way, it basically destroyed its consumer facing apps in a single day by turning paid subscriptions of millions of users into trash with extremely low usage limits. Using Google's models through API has become much more reasonable. At least it is not a recurring monthly fee you have to pay regularly. And models are not even that good. Even GPT-5.5 Instant is better than Gemini models.
English
4
2
53
3.8K
Hasan Can retweetledi
Dwayne
Dwayne@CtrlAltDwayne·
Google are still behind AI compared to other SOTA labs and yet they're acting like they're the ones winning. These new changes for Gemini and AI usage are actually even less generous and worse than Anthropic. Who is making these terrible decisions?
Dwayne tweet media
English
7
6
100
2.5K
Hasan Can retweetledi
Michael Truell
Michael Truell@mntruell·
Gemini Flash 3.5 is now on CursorBench, our main coding agent eval. We’ll keep updating the leaderboard as new models come out. cursor.com/evals
English
104
88
1.3K
1.4M
Hasan Can
Hasan Can@HCSolakoglu·
It seems like Google's compute bottleneck has started to reflect in their pricing as well. I don't think this price increase is solely related to improvements in model intelligence.
Logan Kilpatrick@OfficialLoganK

Welcome to Gemini 3.5 Flash, our most powerful model to date. It pushes the frontier of intelligence, speed, and cost putting 3.5 Flash in a class of its own. We spent the last 6 months making sure Flash is great for real world use cases. It's available everywhere now!

English
0
0
0
166
Hasan Can retweetledi
Tibo
Tibo@thsottiaux·
Codex team is aware of reports of GPT-5.5 performing worse for some users and investigating. We don't have anything conclusive yet and systems are healthy but we will share updates as we go.
English
629
167
5.5K
1.8M