swappy

291 posts

swappy banner
swappy

swappy

@swaapppyyy

gh/hf: rycerzes | working on vlm/llms, post training models :)

blr เข้าร่วม Aralık 2020
705 กำลังติดตาม108 ผู้ติดตาม
ทวีตที่ปักหมุด
swappy
swappy@swaapppyyy·
Wanted to post this yesterday but I was too tired, but my team and I managed to adapt 4 tasks from @ProximalHQ FrontierSWE benchmark as OpenEnv compatible environments and make them run on HF spaces as part of our hackathon submission checkout the repo at github.com/3xcaffeine/fro…
English
5
3
13
1.5K
Ben Burtenshaw
Ben Burtenshaw@ben_burtenshaw·
been at @huggingface for a minute and there are a few things that still give me goosebumps: - pr merged to the hub. - anything transformers related. - model releases. work in places that inspire you with gratitude.
English
0
0
38
755
swappy
swappy@swaapppyyy·
@jino_rohit simon veitner's blogs are really good
English
1
0
2
438
Jino Rohit
Jino Rohit@jino_rohit·
all simons are cracked af
Jino Rohit tweet media
English
7
6
195
8.6K
swappy
swappy@swaapppyyy·
@badlogicgames thank you, literally had opened an issue for the same xD
English
0
0
0
661
Mario Zechner
Mario Zechner@badlogicgames·
People of pi. I'm removing Gemini CLI and Antigravity logins from pi. Welcome to 2026, the year of the end of subsidies.
English
74
30
1.4K
135.6K
swappy
swappy@swaapppyyy·
@ben_burtenshaw interesting that HF is leaning into agent PRs as signal rather than, while the others are going the other way. mitchellh has a vouch system, mario's Pi has contribution guidelines, zig just outright bans AI contributions would love to see more on this :)
English
0
0
4
141
Ben Burtenshaw
Ben Burtenshaw@ben_burtenshaw·
Open source projects like transformers are drowning in AI agent PRs, so we auto-merged everything to see what would happen and share the results. tl;dr: if 100s of agents want to fix something, it’s probably broken. Agent PRs on transformers have quadrupled over the past quarter. We classified and validated 1k PRs (42% features, 39% bugs, 13% docs). The quality distribution is skewed toward noise. But the bug fixes cluster around a small number of hotspots: tokenizer handling, model loading, dtype mismatches, multimodal pipelines. I.e. an underlying problem. When 28 PRs independently flag the same area, that is signal regardless of whether any individual fix is correct. One issue generated 39 near-identical PRs in a day. Each applied the same decorator pattern to a different model file. A maintainer would do the same cognitive work 39 times, so a single combined PR replaces all of that work. We built tooling to cluster, deduplicate, and merge these contributions at scale, then ran an experiment: bulk-merge hundreds of agent PRs into a fork, benchmark it, and see what breaks. Nothing broke. Zero delta across three models on arc_challenge, gsm8k, and hellaswag. The contributors are not adversarial. They lack the context to evaluate whether the agent's output is correct. Check out this blog post, where we dive deep on this pipeline: huggingface.co/spaces/hugging…
English
15
22
115
22.8K
swappy
swappy@swaapppyyy·
@michellechen ahh cool, i was supposed to be credited with some cloudflare credits for reaching top 10 in a hackathon, so i thought i would redeem that later xD
English
0
0
1
50
michelle
michelle@michellechen·
@swaapppyyy GA is a little far off but we’re testing with a few close customers
English
2
0
1
61
michelle
michelle@michellechen·
i built my first model with cog yesterday — 2 files, with a few lines of code to define inputs/outputs. pushed it and got it working on a workers ai gpu so much work is happening here, can’t wait for you to try soon github.com/replicate/cog
English
6
6
54
1.9K
ThioJoe
ThioJoe@thiojoe·
gpt-image-2 is actually good at memes
ThioJoe tweet media
English
80
127
2K
179.4K
swappy รีทวีตแล้ว
Zed
Zed@zeddotdev·
Zed 1.0: Your last next editor.
English
120
300
5K
468.4K
Ben Burtenshaw
Ben Burtenshaw@ben_burtenshaw·
announcing MODEL.md. you just describe the tensor operations in pure markdown
English
2
2
30
2.7K
swappy
swappy@swaapppyyy·
@ben_burtenshaw its going to be the session datasets and harness for sure
English
0
0
1
21
Ben Burtenshaw
Ben Burtenshaw@ben_burtenshaw·
which layer of work in the stack is going to be the most mainstream. i.e. the app? (don't say all) for example; the model, the dataset, the env, the harness, the plugin, the app, the os?
English
4
0
7
1.1K
swappy
swappy@swaapppyyy·
@anindyadeeps the community is already sharing their traces as hf datasets x.com/i/status/20409…
Mario Zechner@badlogicgames

Putting my tokens where my mouth is. I built pi-share-hf. Share your pi coding agent sessions as @huggingface datasets. github.com/badlogic/pi-sh… It tries to prevent you from uploading sessions containing PII/sensitive data with 3 tiers of defenses. Best used on OSS coding sessions, as those are less likely to contain sensitive info. Uses pi agents for PII detection, which can cost you a lot of tokens. Read the README with your human eyes so you don't accidentally pwn yourself or get a huge bill. Haven't figured out how to filter for such datasets on HF yet. @ClementDelangue, any pointers on how to best label them so people can find them?

English
1
0
5
168
Anindyadeep
Anindyadeep@anindyadeeps·
Now i am going to say something for which i might gonna cancelled, but i was benchmarking Qwen 27B and Qwen coder 80B with the frontier models. And as expected they have a huge gap when it comes to performance over different tasks. Then i thought, we all are using code harnesses using different frontier models right now and their agent trajectories are saved locally. What if a community emerges which uploads all these trajectories to an open dataset on huggingface and people start continually post train it, then there is a chance, we might get truly open models running on our laptop and as good as the frontier models.
English
1
4
34
1.4K
swappy
swappy@swaapppyyy·
Wanted to post this yesterday but I was too tired, but my team and I managed to adapt 4 tasks from @ProximalHQ FrontierSWE benchmark as OpenEnv compatible environments and make them run on HF spaces as part of our hackathon submission checkout the repo at github.com/3xcaffeine/fro…
English
5
3
13
1.5K
Aritra 🤗
Aritra 🤗@ariG23498·
[Hugging Face ML Club India] We are beyond excited for the next virtual event. We host an incredible researcher and more than that an idol of mine (pretty sure of @RisingSayak's as well). They will be talking about the slow death of scaling. I am pretty sure you know who that is, but more information coming soon. Keep your eyes glued to this space. 🤗
English
16
4
177
8K
merve
merve@mervenoyann·
I have just crossed 10K friends on @huggingface 🤗💗 I try to make myself more and more useful for community and am always happy to be of service 🫡
merve tweet media
English
10
1
148
5.7K
swappy
swappy@swaapppyyy·
@ProximalHQ so now you can do RL on these environments, we also shipped an adapter for @badlogicgames pi coding agent harness, so you can plug in your favorite model and let it try running, while you get trajectories once the run is over :)
English
0
0
2
87
swappy
swappy@swaapppyyy·
@pcuenq that's awesome, congratulations 🙂‍↕️
English
0
0
2
12
Pedro Cuenca
Pedro Cuenca@pcuenq·
Life update! (It's not what you think 😂) I JUST RAN MY FIRST HALF MARATHON EVER 🔥🥳🎉🎊🏃 It was an awesome day in Madrid. I've been running for a year. I did nothing before. I never thought I'd be able to make this distance. Just do stuff.
English
21
0
92
6.9K
λux
λux@novasarc01·
i feel like RL environments get boxed into this narrow “agent using tools” frame (CUA, browsing, coding loops) but that’s honestly a tiny slice of what’s possible. in my opinion the space is way broader and more interesting. for instance embodied + physical simulation envs which force tight coupling between perception, control and dynamics (where rewards are delayed and highly sensitive to trajectory-level decisions)...generative world models as environments are also interesting...similarly scientific discovery settings like drug design, materials discovery (crystal structure search, alloy optimization) are essentially sequential decision problems under extreme epistemic uncertainty with sparse and expensive feedback loops. multi-agent socio-economic environments are another underexplored axis...market simulations or governance systems that introduce strategic interaction and non-stationarity (where the environment distribution shifts as other agents learn)...a lot of these are genuinely hard to build and often need real physical systems or tight sim–RL integration but that’s kind of the point! they’re exactly the setups where you can actually study long-horizon credit assignment, delayed rewards and the role of memory in a meaningful way.
English
6
6
115
6.6K
swappy
swappy@swaapppyyy·
thinking about making a compatibility layer for tinker sdk which enables it to work with @huggingface spaces. maybe then we can make ML-intern perform Frontier-SWE Frogsgame-RL task :) thoughts? @_lewtun @akseljoonas @cmpatino_
English
0
0
1
68