Alessandro Benigni

416 posts

Alessandro Benigni banner
Alessandro Benigni

Alessandro Benigni

@itsbenigni

Co-Founder & CMO at https://t.co/zjkdUzAMrV

New York Katılım Mayıs 2025
176 Takip Edilen160 Takipçiler
Alessandro Benigni retweetledi
Emir Atli
Emir Atli@emiratli_·
2026 is a tough year for guys named Claude
Emir Atli tweet media
English
159
737
18.4K
581.4K
Alessandro Benigni
Alessandro Benigni@itsbenigni·
Everyone is using AI to write code faster, but it usually ends up as an unstructured mess. If you use Claude Code, you need to check out Arness. It’s an open-source tool that forces your AI to act like a Senior Engineering Team. Instead of just guessing, it uses a pipeline: 📝 Specs before code 🗺️ Plans before execution 🔎 Reviews before shipping It has 3 plugins handling Idea ➡️ Dev ➡️ Deployment. The craziest part? It used its own 134 AI agents to build itself from scratch. 🤯 Check it out (not my project, Tried and got blown away) link in the comments:
Alessandro Benigni tweet media
English
1
1
2
118
Alessandro Benigni
Alessandro Benigni@itsbenigni·
@cryptopunk7213 The only thing the west has more than others at this point is HYPE, only thing they are good at, and to be honest even of it was true, spending money on anything aside cleaning up their streets from homeless and drug addict should be seen as shameful
English
0
0
1
627
Ejaaz
Ejaaz@cryptopunk7213·
so this is the most insane use of ai i've ever seen the CIA rescued a soldier by detecting his heartbeat from 40 MILES AWAY. used AI to drown out other heartbeats, then literal diamonds to detect his heartbeat's electromagnetic fingerprint... the numbers don't make sense: at just 10 centimeters away, a hearts magentic field is barely detectable at 1 meter? 1/1000th of that. this fucking device did this at 40 miles. lockheed martin created it in their secret advanced development division AI drowned out the heartbeats of every human, dog, animal to achieve this. the officer named 'dude 44 bravo' evaded capture for 36 hours, located and survived.
English
24
7
128
26.3K
Alessandro Benigni retweetledi
Sudo su
Sudo su@sudoingX·
last time the openclaw founder said open models aren't there yet. now he's saying local models on consumer hardware are the issue. this is not someone who cares about open source speaking. this is someone with a corporate paycheck channeling you toward their subscription because every local AI install is a subscription they lose. he picked a year old hermes model against a new agent harness and called it not ready. that didn't come from proper testing. it came from watching nousresearch grow exponentially while his project bleeds relevance. a founder who left his project without seeing its full potential is arguing with a founder who is still grinding day and night for open source. nousresearch is open source head to toe. the models, the harness, the memory system, everything. and it's winning. that's what panics them. while corporate salesmen say local models on consumer hardware aren't there, i just published an article where 27B dense on a $900 consumer GPU one-shotted a task that 120B on $70K enterprise hardware could not complete in 3 tries. through hermes agent. every number is in the article. you decide which side you want to be on. the side that mines every bit of your thinking and profits off it. or the side where you own your compute and your cognition. don't let corporate salesmen disguised as information lead you to their subscription page.
Peter Steinberger 🦞@steipete

@Teknium @jrswab Ya know local models on consumer hardware are the issue, like Hermes doesn’t work with Hermes 🙃

English
65
46
802
83.2K
Alessandro Benigni retweetledi
Sudo su
Sudo su@sudoingX·
you see this? this is what they're panicking about. hermes agent usage went vertical in 10 days. builders are dropping openclaw bloat and the numbers don't lie. every bar on this chart is someone who tested hermes agent and never went back. they're losing users. and when you're losing users you attack whatever you can find. a year old hermes model that was released before the agent harness even existed. that's the best they could pick. the nous team was openly saying hermes models aren't optimized for hermes agent yet and they're working on it. he grabbed it anyway. some founders choose their project. some choose the paycheck. and the ones who chose the paycheck are now spending their time attacking the ones who stayed. that tells you everything about where the momentum is. nousresearch ships the models and the harness. fully open source head to toe. the community decides who wins this and the community is already deciding. don't let someone with a corporate paycheck redirect you to a subscription page. test it yourself. the data is yours to verify.
Sudo su tweet media
Sudo su@sudoingX

last time the openclaw founder said open models aren't there yet. now he's saying local models on consumer hardware are the issue. this is not someone who cares about open source speaking. this is someone with a corporate paycheck channeling you toward their subscription because every local AI install is a subscription they lose. he picked a year old hermes model against a new agent harness and called it not ready. that didn't come from proper testing. it came from watching nousresearch grow exponentially while his project bleeds relevance. a founder who left his project without seeing its full potential is arguing with a founder who is still grinding day and night for open source. nousresearch is open source head to toe. the models, the harness, the memory system, everything. and it's winning. that's what panics them. while corporate salesmen say local models on consumer hardware aren't there, i just published an article where 27B dense on a $900 consumer GPU one-shotted a task that 120B on $70K enterprise hardware could not complete in 3 tries. through hermes agent. every number is in the article. you decide which side you want to be on. the side that mines every bit of your thinking and profits off it. or the side where you own your compute and your cognition. don't let corporate salesmen disguised as information lead you to their subscription page.

English
30
36
540
39.4K
Alessandro Benigni
Alessandro Benigni@itsbenigni·
Just created n8n Workflow Builder 7 commands. 1,396 nodes. 22 workflow patterns. Zero guesswork. Type what you want in plain English → get a deployed, validated workflow. But that's not even the best part. 🧵 1/ Claude IS your AI brain. Not an API call. Not a token charge. Claude reasons, scores, classifies, writes, @n8n_io just handles the mechanical work. $0 API cost for agent workflows. 2/ Claude-in-the-Middle: One n8n workflow execution. Claude is a processing node INSIDE IT. Workflow runs → pauses → Claude analyzes → POSTs back → workflow resumes with Claude's intelligence merged in. One execution ID. $0. 3/ 75MB SQLite database ships with the plugin. Every n8n node, pre-tagged by intent. Say "send notification" → instantly finds Slack, Gmail, Telegram, Discord, Teams + 40 more. Zero tokens burned. Zero MCP round-trips. Plus new custom scripts (nodes) we create are added in the same database (separate table) with tags for future retrieval and use. 4/ Paste your API key in the chat. Plugin creates the credential in n8n via REST API. No browser switching. No manual config. Also for any credential required for every node. 5/ 7 commands: /n8n (build), /n8n-agent (@claudeai as brain), /n8n-test (assertions), /n8n-docs (auto-docs), /n8n-audit (security A-F grade), /n8n-manage (lifecycle), /n8n-browse (explore 1,396 nodes). Link in the replies:
Alessandro Benigni tweet media
English
2
1
2
159
Alessandro Benigni
Alessandro Benigni@itsbenigni·
@jonallie Totally agree, we are shifting from being the brain behind AI orchestration, to providing the right tools for AI to orchestrate workflows.
English
0
0
2
355
jon allie
jon allie@jonallie·
Personal rule of thumb: don't use an LLM for something that a deterministic program can do. I get it, LLMs are exciting, but they don't mean that software ceases to exist. They are fantastic at dealing with human language and ambiguity, but are terrible (by design and for good reason) at repeatability. To borrow terminology from the book Thinking Fast and Slow, LLMs are "system 2"...slower, more "expensive" (for LLMs, both in time and dollars), but flexible and creative. Traditional programs are "system 1" ..fast and cheap, but inflexible and dumb. Instead of trying to put an LLM in the "hot loop" of your program, it's usually worth asking an agent to write a deterministic program to do the thing you need done. Since code is "cheap", this deterministic tool can do exactly what you want it to, and doesn't consume tokens on every execution. (This applies to agents too..I find myself regularly yelling at Claude to stop repeatedly generating the same 30 lines of python to inspect a file, and instead telling it to generate a 3-line shell script wrapper around jq that it can check in and call repeatedly)
English
86
110
1.1K
97.5K
Alessandro Benigni retweetledi
goodfuture
goodfuture@0xGoodfuture·
every skeleton screen you've ever hand-coded is a waste of time you're literally measuring padding and guessing widths to build a worse version of a layout that already exists in your DOM so I made a package that just reads the real one
English
185
371
6.3K
762.5K
Alessandro Benigni retweetledi
Sudo su
Sudo su@sudoingX·
google just dropped gemma 4 while i'm in the middle of testing nvidia and alibaba's flagships on 2x H200. perfect timing. this 31B thinking model beating qwen 122B and deepseek v3.2 on elo. 26B variant with only 4B active. that fits on a phone. it's almost 2am and i just published nvidia's octopus invaders results. qwen is loading next. but this is going on the queue immediately. if you already ran it drop your numbers below. model, quant, hardware, inference engine, tok/s. i want to see what this thing does before i get my hands on it.
Google Gemma@googlegemma

Meet Gemma 4! Purpose-built for advanced reasoning and agentic workflows on the hardware you own, and released under an Apache 2.0 license. We listened to invaluable community feedback in developing these models. Here is what makes Gemma 4 our most capable open models yet: 👇

English
36
5
315
31.5K
Alessandro Benigni retweetledi
Chris Tate
Chris Tate@ctatedev·
Introducing render-json The Generative JSON framework. 1. Point it at anything 2. It generates JSON 3. That's it Apps, games, and more. If it exists, it can be converted into a JSON spec. 𝚗𝚙𝚖 𝚒 @𝚓𝚜𝚘𝚗-𝚛𝚎𝚗𝚍𝚎𝚛/𝚛𝚎𝚗𝚍𝚎𝚛-𝚓𝚜𝚘𝚗
English
64
186
3.1K
192.5K
Alessandro Benigni retweetledi
Sharbel
Sharbel@sharbel·
🚨 IMPORTANT: Google quietly open sourced a time-series AI that predicts anything. Sales trends. Market prices. User traffic. Energy demand. Crypto volatility. It's called TimesFM. Here's why it's underrated: → Pre-trained on 100B real-world data points → Zero-shot forecasting, no fine-tuning needed → Outperforms supervised models trained on your specific data → Runs locally. Free. Apache license. Most people are focused on language models. The quietly powerful ones are learning to predict the future.
Sharbel tweet media
English
83
412
3.5K
296.5K
Alessandro Benigni retweetledi
Noah
Noah@NoahKingJr·
Me to Claude: "Make no mistakes. DO NOT HALLUCINATE. YOU ARE AN EXPERT SOFTWARE ENGINEER"
English
112
1K
10K
613.8K
Alessandro Benigni retweetledi
le.hl
le.hl@0xleegenz·
My manager watching me leave work at 5 PM when my shift ends at 5 PM
English
306
5.9K
84.9K
4.2M
Alessandro Benigni retweetledi
Sukh Sroay
Sukh Sroay@sukh_saroy·
🚨Breaking: Stanford researchers built a new prompting technique! By adding ~20 words to a prompt, it: - boosts LLM's creativity by 1.6-2x - raises human-rated diversity by 25.7% - beats fine-tuned model without any retraining - restores 66.8% of LLM's lost creativity after alignment Let's understand why and how it works: Post-training alignment methods like RLHF make LLMs helpful and safe, but they unintentionally cause mode collapse. This is where the model favors a narrow set of predictable responses. This happens because of typicality bias in human preference data: When annotators rate LLM responses, they naturally prefer answers that are familiar, easy to read, and predictable. The reward model then learns to boost these "safe" responses, aggressively sharpening the probability distribution and killing creative output. But here's the interesting part: The diverse, creative model isn't gone. After alignment, the LLM still has two personalities. The original pre-trained model with rich possibilities, and the safety-focused aligned model. Verbalized Sampling (VS) is a training-free prompting strategy that recovers the diverse distribution learned during pre-training. The idea is simple: Instead of prompting "Tell me a joke" (which triggers the aligned personality), you prompt: "Generate 5 responses with their corresponding probabilities. Tell me a joke." By asking for a distribution instead of a single instance, you force the model to tap into its full pre-trained knowledge rather than defaulting to the most reinforced answer. Results show verbalized sampling enhances diversity by 1.6-2.1x over direct prompting while maintaining or improving quality. Variants like VS-based Chain-of-Thought and VS-based Multi push diversity even further. You can find the paper link in the next tweet. 👉 Over to you: What other methods can be used to improve LLM diversity?
Sukh Sroay tweet media
English
20
77
318
25.2K
Alessandro Benigni retweetledi
Avi Chawla
Avi Chawla@_avichawla·
Microsoft did it again! Building with AI agents almost never works on the first try. A dev has to spend days tweaking prompts, adding examples, hoping it gets better. This is exactly what Microsoft's Agent Lightning solves. It's an open-source framework that trains ANY AI agent with reinforcement learning. Works with LangChain, AutoGen, CrewAI, OpenAI SDK, or plain Python. Here's how it works: > Your agent runs normally with whatever framework you're using. Just add a lightweight agl.emit() helper or let the tracer auto-collect everything. > Agent Lightning captures every prompt, tool call, and reward. Stores them as structured events. > You pick an algorithm (RL, prompt optimization, fine-tuning). It reads the events, learns patterns, and generates improved prompts or policy weights. > The Trainer pushes updates back to your agent. Your agent gets better without you rewriting anything. In fact, you can also optimize individual agents in a multi-agent system. I have shared the link to the GitHub repo in the replies!
Avi Chawla tweet media
English
86
205
1.3K
110.9K