swappy

291 posts

swappy

@swaapppyyy

gh/hf: rycerzes | working on vlm/llms, post training models :)

blr เข้าร่วม Aralık 2020

705 กำลังติดตาม108 ผู้ติดตาม

ทวีตที่ปักหมุด

swappy@swaapppyyy·6d

Wanted to post this yesterday but I was too tired, but my team and I managed to adapt 4 tasks from @ProximalHQ FrontierSWE benchmark as OpenEnv compatible environments and make them run on HF spaces as part of our hackathon submission checkout the repo at github.com/3xcaffeine/fro…

English

1.5K

swappy@swaapppyyy·1h

@ben_burtenshaw @huggingface cheers to the milestones ben! keep building in public 🦅

English

Ben Burtenshaw@ben_burtenshaw·5h

been at @huggingface for a minute and there are a few things that still give me goosebumps: - pr merged to the hub. - anything transformers related. - model releases. work in places that inspire you with gratitude.

English

755

swappy@swaapppyyy·2d

@jino_rohit simon veitner's blogs are really good

English

438

Jino Rohit@jino_rohit·2d

all simons are cracked af

English

195

8.6K

swappy@swaapppyyy·2d

@badlogicgames thank you, literally had opened an issue for the same xD

English

661

Mario Zechner@badlogicgames·3d

People of pi. I'm removing Gemini CLI and Antigravity logins from pi. Welcome to 2026, the year of the end of subsidies.

English

1.4K

135.6K

swappy@swaapppyyy·3d

@ben_burtenshaw interesting that HF is leaning into agent PRs as signal rather than, while the others are going the other way. mitchellh has a vouch system, mario's Pi has contribution guidelines, zig just outright bans AI contributions would love to see more on this :)

English

141

Ben Burtenshaw@ben_burtenshaw·3d

Open source projects like transformers are drowning in AI agent PRs, so we auto-merged everything to see what would happen and share the results. tl;dr: if 100s of agents want to fix something, it’s probably broken. Agent PRs on transformers have quadrupled over the past quarter. We classified and validated 1k PRs (42% features, 39% bugs, 13% docs). The quality distribution is skewed toward noise. But the bug fixes cluster around a small number of hotspots: tokenizer handling, model loading, dtype mismatches, multimodal pipelines. I.e. an underlying problem. When 28 PRs independently flag the same area, that is signal regardless of whether any individual fix is correct. One issue generated 39 near-identical PRs in a day. Each applied the same decorator pattern to a different model file. A maintainer would do the same cognitive work 39 times, so a single combined PR replaces all of that work. We built tooling to cluster, deduplicate, and merge these contributions at scale, then ran an experiment: bulk-merge hundreds of agent PRs into a fork, benchmark it, and see what breaks. Nothing broke. Zero delta across three models on arc_challenge, gsm8k, and hellaswag. The contributors are not adversarial. They lack the context to evaluate whether the agent's output is correct. Check out this blog post, where we dive deep on this pipeline: huggingface.co/spaces/hugging…

English

115

22.8K

swappy@swaapppyyy·3d

@michellechen ahh cool, i was supposed to be credited with some cloudflare credits for reaching top 10 in a hackathon, so i thought i would redeem that later xD

English

michelle@michellechen·3d

@swaapppyyy GA is a little far off but we’re testing with a few close customers

English

michelle@michellechen·3d

i built my first model with cog yesterday — 2 files, with a few lines of code to define inputs/outputs. pushed it and got it working on a workers ai gpu so much work is happening here, can’t wait for you to try soon github.com/replicate/cog

English

1.9K

swappy@swaapppyyy·3d

@EricParker @thiojoe ELITE BALL RRRAAAHHH 🦅

English

Eric Parker@EricParker·3d

@thiojoe reminds me of pokesav

English

1.3K

ThioJoe@thiojoe·3d

gpt-image-2 is actually good at memes

English

127

179.4K

swappy รีทวีตแล้ว

Zed@zeddotdev·4d

Zed 1.0: Your last next editor.

English

120

300

468.4K

swappy@swaapppyyy·5d

@ben_burtenshaw wait what 🫪

English

Ben Burtenshaw@ben_burtenshaw·5d

announcing MODEL.md. you just describe the tensor operations in pure markdown

English

2.7K

swappy@swaapppyyy·6d

@ben_burtenshaw its going to be the session datasets and harness for sure

English

Ben Burtenshaw@ben_burtenshaw·6d

which layer of work in the stack is going to be the most mainstream. i.e. the app? (don't say all) for example; the model, the dataset, the env, the harness, the plugin, the app, the os?

English

1.1K

swappy@swaapppyyy·6d

@anindyadeeps the community is already sharing their traces as hf datasets x.com/i/status/20409…

Mario Zechner@badlogicgames

Putting my tokens where my mouth is. I built pi-share-hf. Share your pi coding agent sessions as @huggingface datasets. github.com/badlogic/pi-sh… It tries to prevent you from uploading sessions containing PII/sensitive data with 3 tiers of defenses. Best used on OSS coding sessions, as those are less likely to contain sensitive info. Uses pi agents for PII detection, which can cost you a lot of tokens. Read the README with your human eyes so you don't accidentally pwn yourself or get a huge bill. Haven't figured out how to filter for such datasets on HF yet. @ClementDelangue, any pointers on how to best label them so people can find them?

English

168

Anindyadeep@anindyadeeps·6d

Now i am going to say something for which i might gonna cancelled, but i was benchmarking Qwen 27B and Qwen coder 80B with the frontier models. And as expected they have a huge gap when it comes to performance over different tasks. Then i thought, we all are using code harnesses using different frontier models right now and their agent trajectories are saved locally. What if a community emerges which uploads all these trajectories to an open dataset on huggingface and people start continually post train it, then there is a chance, we might get truly open models running on our laptop and as good as the frontier models.

English

1.4K

swappy@swaapppyyy·6d

@_Suresh2 @ProximalHQ not really, openenv compatibility was quite straightforward imo

English

Suresh@_Suresh2·6d

@swaapppyyy @ProximalHQ OpenEnv compatibility probably ate more time than adapting the 4 tasks

English

swappy@swaapppyyy·6d

English

1.5K

swappy@swaapppyyy·6d

@ariG23498 @RisingSayak I'm hyped 🦅

English

197

Aritra 🤗@ariG23498·6d

[Hugging Face ML Club India] We are beyond excited for the next virtual event. We host an incredible researcher and more than that an idol of mine (pretty sure of @RisingSayak's as well). They will be talking about the slow death of scaling. I am pretty sure you know who that is, but more information coming soon. Keep your eyes glued to this space. 🤗

English

177

swappy@swaapppyyy·6d

@mervenoyann @huggingface woohoo congratulations

English

merve@mervenoyann·6d

I have just crossed 10K friends on @huggingface 🤗💗 I try to make myself more and more useful for community and am always happy to be of service 🫡

English

148

5.7K

swappy@swaapppyyy·6d

@ProximalHQ cc: @ben_burtenshaw @MatternJustus, in case you are interested

English

swappy@swaapppyyy·6d

@ProximalHQ so now you can do RL on these environments, we also shipped an adapter for @badlogicgames pi coding agent harness, so you can plug in your favorite model and let it try running, while you get trajectories once the run is over :)

English

swappy@swaapppyyy·6d

@pcuenq that's awesome, congratulations 🙂‍↕️

English

Pedro Cuenca@pcuenq·26 Nis

Life update! (It's not what you think 😂) I JUST RAN MY FIRST HALF MARATHON EVER 🔥🥳🎉🎊🏃 It was an awesome day in Madrid. I've been running for a year. I did nothing before. I never thought I'd be able to make this distance. Just do stuff.

English

6.9K

swappy@swaapppyyy·26 Nis

@novasarc01 agreed! this should interest you as well :) huggingface.co/blog/sergiopan…

English

183

λux@novasarc01·26 Nis

i feel like RL environments get boxed into this narrow “agent using tools” frame (CUA, browsing, coding loops) but that’s honestly a tiny slice of what’s possible. in my opinion the space is way broader and more interesting. for instance embodied + physical simulation envs which force tight coupling between perception, control and dynamics (where rewards are delayed and highly sensitive to trajectory-level decisions)...generative world models as environments are also interesting...similarly scientific discovery settings like drug design, materials discovery (crystal structure search, alloy optimization) are essentially sequential decision problems under extreme epistemic uncertainty with sparse and expensive feedback loops. multi-agent socio-economic environments are another underexplored axis...market simulations or governance systems that introduce strategic interaction and non-stationarity (where the environment distribution shifts as other agents learn)...a lot of these are genuinely hard to build and often need real physical systems or tight sim–RL integration but that’s kind of the point! they’re exactly the setups where you can actually study long-horizon credit assignment, delayed rewards and the role of memory in a meaningful way.

English

115

6.6K

swappy@swaapppyyy·26 Nis

thinking about making a compatibility layer for tinker sdk which enables it to work with @huggingface spaces. maybe then we can make ML-intern perform Frontier-SWE Frogsgame-RL task :) thoughts? @_lewtun @akseljoonas @cmpatino_

English

ค้นพบ

@ben_burtenshaw @huggingface @jino_rohit @badlogicgames @michellechen @EricParker @thiojoe @anindyadeeps