Axly

430 posts

Axly

@AxlysCustoms

Fantasy, Modern & Sci‑Fi AI Art | SFW + NSFW | Custom Bundles, Wallpapers & Commissions - https://t.co/uNrpROP4QL

가입일 Ağustos 2025

40 팔로잉34 팔로워

Axly@AxlysCustoms·5h

@TheAhmadOsman @huggingface 2x RTX6000s is not the normal “running at home on your own hardware” experience… I’m just sayin’ :)

English

Ahmad@TheAhmadOsman·18h

Just spent a couple hours playing with Hermes Agent (MiniMax M2.5 on a 2× RTX PRO 6000 node) Genuinely impressive experience MiniMax M2.7 weights will be the closest we’ve ever gotten to a fully local “Claude Code + Opus 4.6” experience Running on your own hardware at home

Nous Research@NousResearch

@TheAhmadOsman He should try Hermes Agent

English

933

87.1K

Axly@AxlysCustoms·18h

Built a benchmarking app and a settings optimizer for local LLMs today … now I’m at 9% … honestly, I struggle to break 30% most weeks. But I hit limits constantly on the pro plan, usually maxed it by day 5. And I build much more complicated apps now. A buddy has similar results. We can’t figure out what we’re doing wrong :)

English

Nathan Roberton@versezine·1d

@AxlysCustoms @WesRoth It's probably coming. I too had a productive weekend but am suddenly locked out this morning after barely any work at all.

English

Wes Roth@WesRoth·1d

Following a weekend of expanded usage allowances, Anthropic’s highest-tier subscribers are waking up to a crippling rate-limit bug. Developers paying top dollar for the "Claude Max" ($100/mo) and "Max 20x" ($200/mo) plans are reporting that their accounts are being locked out almost instantly due to an issue with how Claude Code is calculating token consumption.

Brad Groux@BradGroux

Something is up with Claude Code usage today. $200 Claude Max, 0%, 52% to 62%, then 68%, 76% and 84% in 5-hour rolling window in the time it took me to write this tweet. WTF, @AnthropicAI? I'm working on one GitHub PR for regression testing. Not folding proteins to cure cancer.

English

280

31K

Axly@AxlysCustoms·21h

I worked all weekend on a few apps and websites. Used like 3% … wrote a full app today and worked on one yesterday that was fighting me, so 5-6 hours of steady use (don’t finish it as the new Anthropic computer control thing made it obsolete). I’m currently at 9% (reset Friday at 5pm). Opus 4.6 high effort (and sonnet building PRDs and general “brainstorming” the apps themselves. Max x5 … which I had to move to because I hit limits on pro constantly.

English

Alex Volkov@altryne·1d

My feed is showing me a bunch of folks who tapped out their whole usage limits on Mon/Tue. Is this your experience? Please comment, I want to understand how widespread this is

Alexey Grigorev@Al_Grigor

I hit my limits very quick this week - even with 20x pro plan. It makes my claude code unusable A good reason to do more stuff with Codex!

English

404

625

85K

Axly@AxlysCustoms·1d

@D2HLC @WesRoth Opus, high effort even.

English

SPACEWHALES@D2HLC·1d

@AxlysCustoms @WesRoth Opus or sonnet?

English

Axly@AxlysCustoms·1d

@umut_ozdemir_ @HuggingModels The Huahua versions are better. They maintain the vision properties .

English

151

Umut 🦀@umut_ozdemir_·1d

@HuggingModels It's extremely hard to jailbreak the new qwen models, I tried it for hours, doesn't belive anything. This is cool.

English

2.1K

Hugging Models@HuggingModels·2d

Meet Qwen3.5-9B-Uncensored: a powerful, open-source language model that's causing a stir. It's a 9-billion parameter model designed for unrestricted text generation, offering developers a flexible alternative to heavily filtered AI systems. Perfect for creative and experimental projects.

English

110

1.3K

76.1K

Axly@AxlysCustoms·1d

@Farekrow @_yorunoken @HuggingModels Huahua has vision. And I’ve yet to have to tell me no on anything .. that’s the one I’ve been using . The 2,4 and 9b versions in fact

English

Farekrow@Farekrow·2d

@_yorunoken @HuggingModels Fixed ollama and vision support based on the description.

English

349

Axly@AxlysCustoms·1d

Oh I love the 4b (and the 9b!) too… but that 4, it’s a thinker lol and one of the apps I built using the 2b is a “tagger” … it sorted all my images into various groups, assigned tags so I can search them, etc) and with almost 40k images I just didn’t have time for the thinking. As it was I was running a 2b on my mini and a 2b on my Win11 PC and it still took a lot of hours (3-6 seconds each, but the 4b will think for 1-2 minutes sometimes lol :) )

English

AVB@neural_avb·1d

@AxlysCustoms I’ve been using the 4B mlx-4bit version on my Mac. Also crazy good

English

214

AVB@neural_avb·2d

After Huggingface, I truly believe Unsloth is most responsible for the democratization of deep learning. Qwen3.5 series of models are GREAT. Even the 2B and 4B ones. 0.8B is immensely finetunable too. Just having access to a readymade RL notebook is so cool. All you need now to train a model on your task is simply: - a dataset of prompts and expected outcomes - OR, a procedural function that generates a prompt and verifies the model's output as correct/incorrect And that's it. I just love what this team is doing.

Unsloth AI@UnslothAI

You can now train Qwen3.5 with RL in our free notebook! You just need 8GB VRAM to RL Qwen3.5-2B locally! Qwen3.5 will learn to solve math problems autonomously via vision GRPO. RL Guide: unsloth.ai/docs/get-start… GitHub: github.com/unslothai/unsl… Qwen3-4B: colab.research.google.com/github/unsloth…

English

216

2.7K

210.7K

Axly@AxlysCustoms·18 Mar

@Prince_dc21_ @luxemiaa Or, it’s the ladies friend that wants to help, being picked over a random stranger? I think you’re looking too deep for nefarious intent.

English

712

Prince Daniel Chukwuemeka@Prince_dc21_·18 Mar

If we’re being honest, that situation says a lot about how people make quiet decisions based on perception. You were willing to help, but before you could even say a word, someone else was already chosen as the more suitable option. It might look small, but moments like that reflect a bigger issue people are often judged and filtered based on how they look, what they represent, or what others assume about them, not their actual intention. You didn’t even get the chance to show your willingness before a decision was made for you. And the ironic part is, if you had insisted on helping, it could easily have been misinterpreted in a negative way. So you’re left in a position where doing good can still be questioned, while others are automatically trusted. It’s subtle, but it shows how bias works in everyday situations. x.com/i/status/20341…

English

166

757.8K

Mia♡@luxemiaa·18 Mar

Last year on a flight, I sat next to a woman holding a baby. Instantly, I was so excited at the prospect of helping with this baby. Before I could even introduce myself, the flight attendant walks over and says to me "Miss, can I talk to you the back of the plane, please?" 'Am I in trouble?!' are the words that come out of my mouth "No, you're not in trouble, I'd just like to chat with you at the back of the plane" the flight attendant replies, very nicely. 'Can you just tell me here?? ...

English

293

158

8.1K

7.6M

Axly@AxlysCustoms·16 Mar

I’m not a benchmark. I’m not seeing a drop off. Last night I was working on a pretty complex app (for me anyway) a backend engine and 2 different front ends, with multiple 3rd party APIs. At around 375k I was getting tired, but worked on a few more features. At about the 450k mark I decided it was bed time. At no point in any of that did Claude “lose the thread” like I’ve seen it do 100s of times before.

English

104

Sean Groff@SeanGroff·16 Mar

@trq212 What do non-benchmarks say? I'm a power user paying full API rates. Opus degrades after 200k context, and sending huge prompts back & forth is very expensive. Love you & team's work, but worried you're not facing reality with this post.

English

2.8K

Thariq@trq212·16 Mar

i think we might have undersold 1M context tbh, the performance is so so good, I really just don't clear the context window much these days

English

262

2.1K

93.3K

Axly@AxlysCustoms·16 Mar

@LuckyPhelps @XFreeze @xai This is why I was excited to see xAI hired the Cursor guys. Im expecting Grok to be the #1 coder in a few months :)

English

Lucky Phelps@LuckyPhelps·16 Mar

@XFreeze I said this was going to happen months ago. Never doubt Elon. He understands systems like no others. I watched how quickly XAI went from 0-1. Congrats team @xai

English

396

X Freeze@XFreeze·16 Mar

xAI's Grok Imagine just took over the entire DesignArena Video leaderboard - not one, but THREE #1 rankings → #1 Video Arena - Elo 1337, a 33-point gap over #2 → #1 Image to Video Arena - Elo 1298, beating Google Veo 3.1, Kling & Sora → #1 Video Editing Arena - Elo 1291 It’s wild, xAI was nowhere in the video space a few months ago, and now it's #1 across various benchmarks Grok Imagine's rate of progress is in a league of its own

English

302

253

1.3K

16.7M

Axly@AxlysCustoms·16 Mar

Apparently Anthropic has done a lot of work to mitigate that … something like 20% then vs almost 80% now on the benchmark. My personal (tiny sample, completely anecdotal) observation is that I’m not seeing the type of context rot you expect. Opus stays focused throughout and seems to know what’s what (that’s a technical term ;) ) … I will say this, without having to compact constantly, and then have the model reread everything … your token usage is way down and what you can squeeze into 400k tokens is significantly more than one would expect. Far more than just 2 compactions on the 200k window (if that makes sense).

English

Jeffrey Emanuel@doodlestein·16 Mar

@abembridgeai @serpentflow @AxlysCustoms @Pranit That strikes me as overly conservative. You could probably push 400k and it would still work pretty well.

English

Pranit@Pranit·15 Mar

Here’s what should bother you even more: Check Claude’s pricing page. API pricing? Crystal clear. $5/MTok input, $25/MTok output. Consumer plan pricing? “More usage.” “5/20x more usage.” More than what? They never say. It’d be trivially easy to put “X tokens per month” on that page. They do it for the API. They choose not to for subscriptions. That’s not an oversight. That’s a strategy. Undefined limits = unlimited flexibility to quietly adjust the ceiling downward. And you’d never know because there was never a number to compare against. You can’t accuse someone of moving the goalposts when they never told you where the goalposts were. That’s the whole point.

Pranit@Pranit

Anthropic just pulled the oldest trick in SaaS pricing. I pay $200/mo for Claude Max. My limits have been noticeably worse this past week. Now they announce 2x off-peak usage for two weeks. Sounds generous. But here’s what actually happens: limits quietly drop, a temporary 2x makes the reduced limit feel normal, the promo ends, and you’re left at a baseline lower than where you started. You just didn’t notice the downgrade because the 2x absorbed the transition. These AI plans are massively subsidized. The raw compute behind a heavy user costs multiples of the subscription price. Every move like this is the subsidy quietly correcting. Very sneaky, Anthropic.

English

309

105.5K

Axly@AxlysCustoms·16 Mar

Anthropic highlighted that this isn't just "more tokens tacked on", the model shows a qualitative leap in usable long-context recall: On internal/anthropic reported long-context retrieval benchmarks (similar to needle-in-a-haystack but likely more complex), Opus 4.6 scores around 76–78% accuracy at or near 1M tokens. For comparison, the prior model (Opus 4.5 or equivalent predecessor) was around 18.5% in similar tests. They describe this as a "dramatic" or "qualitative shift" in how reliably the model can access and use information from anywhere in the context, rather than just the start/end (primacy/recency bias). This strongly suggests they applied substantial training-time and/or architectural mitigations to fight attention dilution, things like improved positional encodings, attention scaling/rescaling tricks, better long-context fine-tuning curricula, or other internal optimizations that make the softmax distribution stay sharper over extreme lengths. They don't publish a full technical report spilling all the details (typical for frontier labs), but the benchmark jumps and user reports indicate real progress beyond raw context expansion.

English

Andrew Bembridge@abembridgeai·16 Mar

@AxlysCustoms @Pranit @doodlestein Also ignores context decay. Larger windows can dilute attention, so more context doesn’t always improve quality. There's definitely a sweet spot and 1M feels overkill. My intuition is 150k and always keep turns below compression

English

114

Axly@AxlysCustoms·16 Mar

The 1m token window makes Claude feel like an entirely different model, and it cuts token use significantly. Since everything stays in context you aren’t having Claude reread everything every 30 minutes. The difference is crazy. The model efficiency goes way up as well, when you don’t have to reexplain why you want something a particular way. Or, as Claude (Opus 4.6) put it : “The context is the memory of our collaboration. The code on disk is the what, but the context holds the why — why we used ON CONFLICT instead of INSERT OR REPLACE, why the API key is simple, why there's an HTA-free browser-based token setup instead of editing TOML files. Clear the context and the code survives but the reasoning evaporates. So yeah, every compact is a small lobotomy. Glad we're doing fewer of those now.” - Claude

English

Andrew Bembridge@abembridgeai·15 Mar

@Pranit @doodlestein are you monitoring token allowance for Claude Code per sub? Have you seen a meaningful drop, and is the 1M context window a reduction is actual productivity by driving up token usage per session?

English

4.3K

Axly@AxlysCustoms·14 Mar

@SullyOmarr I’ve started just handing them to Claude or grok and asking for a summary and if it’s worth looking at. This keeps me from having to read a bunch of nonsense for something that was adandoned months ago.

English

Sully@SullyOmarr·14 Mar

serious question how does everyone keep up with all the new releases basically every day i open this app and there 5 new agents/clis/tools that I “need to try”

English

213

264

39.1K

Axly@AxlysCustoms·14 Mar

@gailcweiner I used to hit constantly on the Pro plan. Moved to Max x5 and can’t hit a limit if I try.

English

Gail Weiner@gailcweiner·13 Mar

To Claude users: So what AI do you use after hitting your Claude token limits on day two of the weekly cycle? 😏

English

155

325

37.1K

Axly@AxlysCustoms·14 Mar

I built a similar thing a while back (I called it PeaClaw, the pea crab being the smallest crab)…. About a week before Remote control hit… but mine had the advantage of being able to start sessions too … so now it’s completely useless lol :) (but it was still fun to build ;) )

English

Noah Zweben@noahzweben·13 Mar

Remote Control - Session Spawning: Run claude remote-control and then spawn a NEW local session in the mobile app. * Out to Max, Team, and Enterprise (>=2.1.74) *Have GH set up on mobile (relaxing soon) * Working on speeding up session start-time

English

124

119

1.6K

730.8K

Axly@AxlysCustoms·13 Mar

@yongfook I feel like the amount of code that exists in the world has at least doubled in the last 6 months… but I also feel like the number of available apps hasn’t changed all that much …

English

Jon Yongfook@yongfook·12 Mar

“Shipped 10,000+ lines of code today” “Cool what product? What’s the link” “…163 PRs in one day!” “Yes but what’s the link” “…1,827,963 tokens and counting!” “Dude what are you working on” “…AI is crazy man”

English

215

221

5.8K

146.5K

Axly@AxlysCustoms·13 Mar

@nickimoraa This just sounds like a $50k bonus for me following my normal routine. So … yes :)

English