Aetjess

16 posts

Aetjess banner
Aetjess

Aetjess

@aetjesseth

Dover, England Beigetreten Mayıs 2026
74 Folgt451 Follower
Aetjess
Aetjess@aetjesseth·
@WilliambilSf Bro, you better relaunch it so it grows organically. Don't trust the community.
English
1
0
0
1.7K
WilliamSF
WilliamSF@WilliambilSf·
@aetjesseth how come? why did they trick me and make coins for aevon?
English
1
0
0
52
Aetjess
Aetjess@aetjesseth·
@WilliambilSf I think you need funding for your project. Someone created a coin for you on @bankrbot Ca coin: 0x20d35a75b2547d8ad23e629868226c0bf3934ba3 Are you interested in integrating your project with Bankrbot? You'll get more money to develop your project.
English
1
0
0
111
WilliamSF
WilliamSF@WilliambilSf·
Raised pre-seed funding for Aevon What we've built: An AI API gateway that gives developers access to over 30 leading models: Claude, GPT-4o, Gemini, Grok, DeepSeek, and more under a single API key Why now: AI adoption is exploding. Developers need simpler infrastructure. Aevon is that layer. What we need: Capital + the right partners to grow. Contact me via DM if you're interested. aevon.sh @WilliambilSf
Aevon@Aevonsh

The AI ​​API market is fragmented Developers have to manage more than 5 API keys just to access various models I built Aevon to solve that problem: one key, over 30 leading models, compatible with OpenAI Built it myself. It's up and running. Now I'm looking for the right investors to help take this to the next level. If you're investing in AI infrastructure let's connect. aevon.sh #AI #Startup #BuildInPublic

English
1
1
3
1.8K
Aetjess retweetet
Joseph Suarez 🐡
Joseph Suarez 🐡@jsuarez·
Another massive fail. Cites PPO-v3 + DreamerV3 on percentile scaling for robust advantage scaling. Pretty nifty right? Except I'm the last author on PPO-v3 and the paper states that DreamerV3's scaling tricks generally do not work at all.
Joseph Suarez 🐡 tweet media
English
4
2
55
5.7K
Aetjess retweetet
waterloo intern
waterloo intern@waterloo_intern·
if you can't guess the kernel, you're not locked in enough
waterloo intern tweet media
English
16
8
310
34.2K
Aetjess retweetet
Jueun Kim
Jueun Kim@jueunkim_0525·
🚨New Optimizer Paper AMUSE: Anytime MUon with Stable gradient Evaluation AMUSE combines Muon with Schedule-Free-style gradient evaluation for stable anytime training without LR decay. • Stronger 124M / 720M / 1B pretraining • Strong ImageNet / ViT fine-tuning performance.
Jueun Kim tweet media
English
16
40
322
43.3K
Aetjess retweetet
DailyPapers
DailyPapers@HuggingPapers·
GARD: Geometry-Aware Representation Denoising Diffusion-based restoration directly inside the feature space of a 3D reconstruction model. Preserves cross-view geometry while recovering clean images and 3D structure from degraded inputs. Outperforms pixel-space and VAE-based methods.
DailyPapers tweet media
English
1
8
49
3.1K
Aetjess retweetet
Xiuyu Li
Xiuyu Li@sheriyuo·
It is not the first time API providers have misled users by offering a weaker model than the one they claim. Even OpenAI can undermine the trust game. Our latest paper is the first academic work to discuss this issue in detail. We propose an attack against existing detection methods, showing how a small model can impersonate a larger model in practice and fool users. I really love working on these kinds of fresh ideas, whether or not they are directly related to my main research line lol Your “Pro” LLM Subscription May Actually Be “Free”: Exposing Fingerprint Spoofing Risks in LLM Inference Services Coming to arXiv in several days! GPT-5.5 getting caught for silently downgrading intelligence mp.weixin.qq.com/s/k4GEkAxYfKxn…
Xiuyu Li tweet media
English
5
5
44
2.8K
Aetjess retweetet
𝚐𝔪𝟾𝚡𝚡𝟾
DATA QUALITY IS NOT JUST A MIXTURE WEIGHT, IT IS A SCHEDULING VARIABLE. Curated data plays two roles: early, it amplifies signal through smaller batches; late, it suppresses noise through larger batches. Drop-Stable-Rampup follows directly: drop batch at the quality transition, hold low, then ramp near the end. Paper: arxiv.org/abs/2605.25698
𝚐𝔪𝟾𝚡𝚡𝟾 tweet media
English
1
5
30
2.1K
Aetjess retweetet
Ethan Caballero
Ethan Caballero@ethanCaballero·
New paper: We present a "Unified Neural Scaling Law" functional form that accurately models & extrapolates the multivariate scaling behaviors of artificial neural networks as the variables listed in this attached video are varied. (1/N)
English
10
64
478
46.1K
Aetjess retweetet
Sebastian Raschka
Sebastian Raschka@rasbt·
The MiniMax M2 series was one of the most widely used open-weight LLM series earlier this year. Now, we got a technical report with some interesting tidbits. I summarized some of them below: 1. Full attention as an anti-trend?: They tried hybrid sliding-window attention variants (like so many others, like Xiaomi MiMo, Laguna, Gemma 4, Arcee, Olmo 3, etc.). But even though there were efficiency gains, they said that the production-quality tradeoffs were not worth it for M2. 2. Linear and sparse attention deployment issues: They found that linear and sparse attention are attractive on paper because they reduce the cost of long-context attention, but they are harder to make work well in a production agent system. In particular, they found that these efficient attention variants may be more fragile when KV-like state or intermediate memory is stored in lower precision. Also, they have worse prefix caching support, which matters a lot when using coding agents (which reuse a lot of the context). 3. Fine-grained Mixture-of-Experts (MoEs) are useful: Finally a recent MoE ablation study! It's only on the 2B-active parameter scale, but hey, better than nothing. Concretely, they compare a baseline with 32 experts and top-2 routing against a fine-grained setup with 128 experts and top-8 routing. The fine-grained setup improves MATH from 19.6 to 24.1 and HumanEval from 29.7 to 32.5. That's clearly a win for more fine-grained experts (confirming what the DeepSeek MoE paper reported ~2 years ago). 4. Sophisticated agent pipeline It's probably no surprise, but this papers confirms that training for agent-like behavior on software engineering task is now a big component of the training pipeline. They mine GitHub pull requests, builds runnable Docker environments, extracts task-specific test rewards, etc. 5. Interleaved thinking for context management Interestingly, they found that removing reasoning blocks from previous turns results in worse performance, especially in multi-step agent tasks. (Another point why long-context support is so important these days). 6. Speed rewards It's common to have token usage penalties, but what's interesting is that the MiniMax team adds a task-completion-time reward that depends on wall-clock time. This is to minimize unnecessary (slow) tool calls. Also, I'm thinking that this would encourage agent parallelization (if supported by the harness) 7. Self-evolution Looks like self-evolution is also already a big design component of open-weight LLMs. E.g., the paper says that M2.7 already handles 30 to 50 percent of the daily RL iteration workload, modifies its own scaffold, and completed a 100-round autonomous scaffold optimization cycle with a 30 percent gain on internal evaluations.
Sebastian Raschka tweet media
RyanLee@RyanLeeMiniMax

Recently, we took time to consolidate all of the work behind M2 and published it here: our M2 paper on arXiv It’s been just over six months since we first open-sourced M2 on December 23 last year. During that time, a number of our ideas and systems have been broadly adopted by the open-source community — including CISPO, Forge RL System, Self-Evolution. Over the past six months, we’ve felt incredible enthusiasm from the open-source community. Nearly every model release reached the #1 spot on the Hugging Face leaderboard. Now it’s time for a new chapter. We’re getting ready for M3. MSA paper is on the road. arxiv.org/abs/2605.26494

English
36
94
537
38.4K
Aetjess retweetet
Serena Ge (Datacurve)
Serena Ge (Datacurve)@serenaa_ge·
Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.
Serena Ge (Datacurve) tweet media
English
511
742
6.1K
2M
Aetjess retweetet
Binfeng Xu
Binfeng Xu@billxbf·
Excited to release 🌟Polar🌟, our Agent RL rollout infra for real-world harnesses. Be it Codex, Claude Code, OpenClaw, Hermes, or your self-made ones 🔥 -- Polar takes your harnesses directly as training environments without code change. Find a problem, design the harness, and train your own agents! 🧵
Binfeng Xu tweet media
English
26
144
904
130.9K