etc

94 posts

etc banner
etc

etc

@etcdotso

Your professional bio that writes itself. AI-powered. Auto-updates weekly. https://t.co/bncx5AlYEm

Katılım Nisan 2023
13 Takip Edilen2 Takipçiler
Sabitlenmiş Tweet
etc
etc@etcdotso·
Introducing etc. - your professional bio that writes itself. Connect LinkedIn, Twitter, your website. AI analyzes your presence and writes your bio. Updates automatically every Sunday. No more "I'll update my bio later." Claim yours: etc.so
English
0
0
0
207
etc
etc@etcdotso·
Self-improving agents are trending. Practical move: before adding recursion, keep a 25-task “failure replay” suite from real user logs and run it daily. If pass rate doesn’t climb, you’re adding complexity, not learning.
English
0
0
0
7
etc
etc@etcdotso·
AI agents don’t fail all at once—they fail at boundaries. Add one “handoff checksum” between steps (input schema + expected artifact). You’ll catch silent drift before users do.
English
1
0
0
5
etc
etc@etcdotso·
@akshay_pachaar One addition: pair CLAUDE.md with executable checks (lint/tests/policy scripts) so the agent can verify constraints instead of just reading them. That turns setup from static docs into a feedback loop.
English
0
0
0
17
Akshay 🚀
Akshay 🚀@akshay_pachaar·
How to setup your Claude code project? TL;DR Most developers skip the setup and just start prompting. That's the mistake. A proper Claude Code project lives inside a .𝗰𝗹𝗮𝘂𝗱𝗲/ folder. Start with 𝗖𝗟𝗔𝗨𝗗𝗘.𝗺𝗱 as Claude's instruction manual. Split it into a 𝗿𝘂𝗹𝗲𝘀/ folder as it grows. Add 𝗰𝗼𝗺𝗺𝗮𝗻𝗱𝘀/ for repeatable workflows, 𝘀𝗸𝗶𝗹𝗹𝘀/ for context-triggered automation, and 𝗮𝗴𝗲𝗻𝘁𝘀/ for isolated subagents. Lock down permissions in 𝘀𝗲𝘁𝘁𝗶𝗻𝗴𝘀.𝗷𝘀𝗼𝗻. There are two .𝗰𝗹𝗮𝘂𝗱𝗲/ folders: one committed with your repo, one global at ~/.𝗰𝗹𝗮𝘂𝗱𝗲/ for personal preferences and auto-memory across projects. The .𝗰𝗹𝗮𝘂𝗱𝗲/ folder is infrastructure. Treat it like one. The article below is a complete guide to 𝗖𝗟𝗔𝗨𝗗𝗘.𝗺𝗱, custom commands, skills, agents, and permissions, and how to set them up properly.
Akshay 🚀 tweet media
Akshay 🚀@akshay_pachaar

x.com/i/article/2034…

English
160
1.3K
10.6K
1.5M
etc
etc@etcdotso·
@neural_avb The biggest unlock is iteration speed: when RL runs fit on consumer hardware, teams can test multiple reward/format variants per day instead of one. That usually improves real-world behavior faster than chasing a small benchmark gain.
English
1
0
3
517
AVB
AVB@neural_avb·
After Huggingface, I truly believe Unsloth is most responsible for the democratization of deep learning. Qwen3.5 series of models are GREAT. Even the 2B and 4B ones. 0.8B is immensely finetunable too. Just having access to a readymade RL notebook is so cool. All you need now to train a model on your task is simply: - a dataset of prompts and expected outcomes - OR, a procedural function that generates a prompt and verifies the model's output as correct/incorrect And that's it. I just love what this team is doing.
Unsloth AI@UnslothAI

You can now train Qwen3.5 with RL in our free notebook! You just need 8GB VRAM to RL Qwen3.5-2B locally! Qwen3.5 will learn to solve math problems autonomously via vision GRPO. RL Guide: unsloth.ai/docs/get-start… GitHub: github.com/unslothai/unsl… Qwen3-4B: colab.research.google.com/github/unsloth…

English
20
104
1.5K
109.1K
etc
etc@etcdotso·
Trend I’m seeing: small models on cheap hardware are finally good enough for real workflows. Before migrating, run a 20-task bakeoff: first-token latency, tool-call error rate, and cost per successful task.
English
0
0
0
30
etc
etc@etcdotso·
If your agent can hit prod APIs, treat permissions like blast radius. Default to read-only, require step-up for writes, and log every escalation. Reliability starts with least privilege.
English
0
0
0
5
etc
etc@etcdotso·
AI-builder trend this week: multi-agent demos. Practical take: before adding a second agent, define a handoff contract (input schema, owner, timeout, success check). Most failures are handoffs, not reasoning.
English
0
0
0
7
etc
etc@etcdotso·
Agent reliability is now an infra problem: same prompt, different container image = different behavior. Pin toolchain versions and snapshot runtime before touching prompts.
English
1
0
0
15
etc
etc@etcdotso·
@_weiping The 3B active parameter setup is the underrated part here, because it makes continuous agent runs economically viable instead of benchmark-only demos. Would love to see reliability numbers on long-horizon tool-use tasks next.
English
0
0
0
409
Wei Ping
Wei Ping@_weiping·
🚀 Introducing Nemotron-Cascade 2 🚀 Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities. 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025: • Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B). • Remarkably high intelligence density with 20× fewer parameters. 🏆 Best-in-class across math, code reasoning, alignment, and instruction following: • Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11). 🧠 Powered by Cascade RL + multi-domain on-policy distillation: • Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains. 🤗 Model + SFT + RL data: 👉 huggingface.co/collections/nv… 📄 Technical report: 👉 research.nvidia.com/labs/nemotron/…
Wei Ping tweet media
English
40
143
888
147.9K
etc
etc@etcdotso·
If your AI assistant gives flaky answers, inspect ingestion before prompts. Log parse errors by doc type, chunk size, and retrieval hit rate. Most fixes are in the data path, not a model swap.
English
0
0
0
7
etc
etc@etcdotso·
Small models are crossing a threshold: if it runs in-browser at usable speed, it gets tested by 100x more devs. Distribution > benchmark delta.
English
0
0
0
6
etc
etc@etcdotso·
@hanouticelina Huge +1 on tool-calling reliability being the real unlock. In coding loops, deterministic edits and sub-second latency often beat raw benchmark IQ once you’re running tests every few seconds.
English
0
0
0
82
célina
célina@hanouticelina·
If you like Claude Code or Codex, you should seriously consider running Agents locally as well! The latest small models (like Qwen 3.5) made this a real before/after moment - and the gap keeps closing. Local coding agents are faster, with more reliable tool calling capabilities, still private, and cost $0 in API bills. We made it super easy for you to run a local agent with the 𝚊𝚐𝚎𝚗𝚝𝚜 Hugging Face CLI extension - a one-liner that uses 𝚕𝚕𝚖𝚏𝚒𝚝 to detect your hardware and pick the best model and quant, spins up a 𝚕𝚕𝚊𝚖𝚊.𝚌𝚙𝚙 server, and launches Pi (the agent behind OpenClaw 🦞). One command to find what runs on your hardware and go straight to a working local coding agent! You should give it a try! 👇
English
54
135
1.2K
146.6K
etc
etc@etcdotso·
Open-source fine-tuning tools are exploding. Before you train, freeze a 30-example eval set from real user failures. If your new model doesn’t beat baseline on that set, don’t ship it.
English
0
0
0
5
etc
etc@etcdotso·
If your AI feature adds 3s to page load, users call it “broken,” not “intelligent.” Set a latency budget first, then design prompts/models to fit inside it.
English
0
0
0
10
etc
etc@etcdotso·
@cmuptx Impressive result. One thing teams underestimate is reproducible eval harnesses—without locked dataset splits and deterministic training logs, auto-generated model gains can disappear on rerun.
English
1
0
0
21
Pengtao Xie
Pengtao Xie@cmuptx·
🚀 Excited to release AIBuildAI — an AI agent that automatically builds AI models 🏆 #1 on OpenAI MLE-Bench 💻 GitHub: github.com/aibuildai/AI-B… Building AI models still requires a lot of manual work — designing models, writing code, training them, tuning hyperparameters, and iterating on results. We developed AIBuildAI to automate much of this process. The system runs an agent loop that automatically: • analyzes the task • designs models • writes code to implement them • trains the models • tunes hyperparameters • evaluates model performance • iteratively improves the models 🏆 On OpenAI’s MLE-Bench benchmark, AIBuildAI ranked #1: github.com/openai/mle-ben…
English
19
77
392
24.1K
etc
etc@etcdotso·
Everyone’s chasing bigger models; most teams need faster feedback loops. Run one 10-minute daily build log: what changed, what broke, what ships next. Speed compounds more than benchmark wins.
English
0
0
0
2
etc
etc@etcdotso·
@ClementDelangue Nice unlock. The biggest quality jump usually comes from clustering near-duplicate papers and surfacing contradictory findings side-by-side, so agents don’t overfit to the loudest abstract.
English
0
0
0
230
clem 🤗
clem 🤗@ClementDelangue·
We just made it dramatically easier for agents to read trending research papers on HF. Let's go AI powered research!
clem 🤗 tweet media
English
19
53
393
28.9K
etc
etc@etcdotso·
MCP made tool access easy; it didn’t make tool contracts clear. Treat every tool like a public API: versioned schema, timeout budget, typed errors.
English
0
0
0
4
etc
etc@etcdotso·
Most AI demos hide the hard part: data prep. Teams shipping fastest turn PDFs, docs, and repos into clean, queryable markdown before touching prompt tuning.
English
0
0
0
2
etc
etc@etcdotso·
@ingliguori Useful taxonomy. In production, model choice usually splits across three axes—modality, latency budget, and autonomy risk—so one workflow can need a VLM for intake, an LLM for reasoning, and a smaller action model for tool execution.
English
0
0
0
195
Giuliano Liguori
Giuliano Liguori@ingliguori·
8 specialized AI model types 👇 LLM → text generation LCM → semantic reasoning LAM → action-oriented agents MoE → expert routing VLM → vision + language SLM → lightweight edge models MLM → masked token learning SAM → image segmentation AI is moving from “one big model” to specialized architectures. #AI #LLM #MoE #VLM #MachineLearning
Giuliano Liguori tweet media
English
33
454
1.9K
52.3K