Alessandro Benigni

416 posts

Alessandro Benigni

@itsbenigni

Co-Founder & CMO at https://t.co/zjkdUzAMrV

New York Katılım Mayıs 2025

176 Takip Edilen160 Takipçiler

Alessandro Benigni@itsbenigni·4d

If you can't beat them, sue them.....

Wes Roth@WesRoth

The U.S. State Department has issued a global diplomatic cable warning that Chinese AI firms most notably DeepSeek, Moonshot AI, and MiniMax, are actively stealing intellectual property from leading American AI labs through a process known as unauthorized model "distillation." Distillation involves systematically feeding the high-quality outputs of massive, expensive frontier models (like those from OpenAI) into smaller, cheaper models to train them. The U.S. government argues this allows foreign competitors to mimic top-tier benchmark performance at a fraction of the original research and development cost.

English

Alessandro Benigni@itsbenigni·23 Nis

Totally Agree with @hunvreus, and unfortunately due to all the noise and grifters things that actually get close to what he is looking for, fly under the radar, such as: github.com/AppsVortex/arn…

Ronan Berder@hunvreus

Talking to smarter folks than me, I'm convinced many of the AI folks in my timeline are full of shit. Nobody is "running 20 agents over night" and building stuff for actual users. Maybe some are building internal tools or disposable software. Maybe. But building software people like using? That doesn't get hacked on day one or blow up after the 3rd user? Nope. I don't even understand what that's supposed to look like. Do you work out a 57 pages document that perfectly describes what you want to build and then summon 14 agents and have them run wild for 6 hours? And what comes out on the other end isn't a broken pile of shit? Nope. Not buying it. PS: it may also be that I have an IQ of 82 and can't figure it out.

English

Alessandro Benigni retweetledi

Emir Atli@emiratli_·10 Nis

2026 is a tough year for guys named Claude

English

159

737

18.4K

581.4K

Alessandro Benigni@itsbenigni·10 Nis

github.com/AppsVortex/arn…

ZXX

Alessandro Benigni@itsbenigni·10 Nis

Everyone is using AI to write code faster, but it usually ends up as an unstructured mess. If you use Claude Code, you need to check out Arness. It’s an open-source tool that forces your AI to act like a Senior Engineering Team. Instead of just guessing, it uses a pipeline: 📝 Specs before code 🗺️ Plans before execution 🔎 Reviews before shipping It has 3 plugins handling Idea ➡️ Dev ➡️ Deployment. The craziest part? It used its own 134 AI agents to build itself from scratch. 🤯 Check it out (not my project, Tried and got blown away) link in the comments:

English

118

Alessandro Benigni@itsbenigni·9 Nis

@cryptopunk7213 The only thing the west has more than others at this point is HYPE, only thing they are good at, and to be honest even of it was true, spending money on anything aside cleaning up their streets from homeless and drug addict should be seen as shameful

English

627

Ejaaz@cryptopunk7213·9 Nis

so this is the most insane use of ai i've ever seen the CIA rescued a soldier by detecting his heartbeat from 40 MILES AWAY. used AI to drown out other heartbeats, then literal diamonds to detect his heartbeat's electromagnetic fingerprint... the numbers don't make sense: at just 10 centimeters away, a hearts magentic field is barely detectable at 1 meter? 1/1000th of that. this fucking device did this at 40 miles. lockheed martin created it in their secret advanced development division AI drowned out the heartbeats of every human, dog, animal to achieve this. the officer named 'dude 44 bravo' evaded capture for 36 hours, located and survived.

English

128

26.3K

Alessandro Benigni retweetledi

Sudo su@sudoingX·6 Nis

last time the openclaw founder said open models aren't there yet. now he's saying local models on consumer hardware are the issue. this is not someone who cares about open source speaking. this is someone with a corporate paycheck channeling you toward their subscription because every local AI install is a subscription they lose. he picked a year old hermes model against a new agent harness and called it not ready. that didn't come from proper testing. it came from watching nousresearch grow exponentially while his project bleeds relevance. a founder who left his project without seeing its full potential is arguing with a founder who is still grinding day and night for open source. nousresearch is open source head to toe. the models, the harness, the memory system, everything. and it's winning. that's what panics them. while corporate salesmen say local models on consumer hardware aren't there, i just published an article where 27B dense on a $900 consumer GPU one-shotted a task that 120B on $70K enterprise hardware could not complete in 3 tries. through hermes agent. every number is in the article. you decide which side you want to be on. the side that mines every bit of your thinking and profits off it. or the side where you own your compute and your cognition. don't let corporate salesmen disguised as information lead you to their subscription page.

Peter Steinberger 🦞@steipete

@Teknium @jrswab Ya know local models on consumer hardware are the issue, like Hermes doesn’t work with Hermes 🙃

English

802

83.2K

Alessandro Benigni retweetledi

Sudo su@sudoingX·6 Nis

you see this? this is what they're panicking about. hermes agent usage went vertical in 10 days. builders are dropping openclaw bloat and the numbers don't lie. every bar on this chart is someone who tested hermes agent and never went back. they're losing users. and when you're losing users you attack whatever you can find. a year old hermes model that was released before the agent harness even existed. that's the best they could pick. the nous team was openly saying hermes models aren't optimized for hermes agent yet and they're working on it. he grabbed it anyway. some founders choose their project. some choose the paycheck. and the ones who chose the paycheck are now spending their time attacking the ones who stayed. that tells you everything about where the momentum is. nousresearch ships the models and the harness. fully open source head to toe. the community decides who wins this and the community is already deciding. don't let someone with a corporate paycheck redirect you to a subscription page. test it yourself. the data is yours to verify.

Sudo su@sudoingX

English

540

39.4K

Alessandro Benigni@itsbenigni·5 Nis

github.com/alessandrobeni…

ZXX

Alessandro Benigni@itsbenigni·5 Nis

Just created n8n Workflow Builder 7 commands. 1,396 nodes. 22 workflow patterns. Zero guesswork. Type what you want in plain English → get a deployed, validated workflow. But that's not even the best part. 🧵 1/ Claude IS your AI brain. Not an API call. Not a token charge. Claude reasons, scores, classifies, writes, @n8n_io just handles the mechanical work. $0 API cost for agent workflows. 2/ Claude-in-the-Middle: One n8n workflow execution. Claude is a processing node INSIDE IT. Workflow runs → pauses → Claude analyzes → POSTs back → workflow resumes with Claude's intelligence merged in. One execution ID. $0. 3/ 75MB SQLite database ships with the plugin. Every n8n node, pre-tagged by intent. Say "send notification" → instantly finds Slack, Gmail, Telegram, Discord, Teams + 40 more. Zero tokens burned. Zero MCP round-trips. Plus new custom scripts (nodes) we create are added in the same database (separate table) with tags for future retrieval and use. 4/ Paste your API key in the chat. Plugin creates the credential in n8n via REST API. No browser switching. No manual config. Also for any credential required for every node. 5/ 7 commands: /n8n (build), /n8n-agent (@claudeai as brain), /n8n-test (assertions), /n8n-docs (auto-docs), /n8n-audit (security A-F grade), /n8n-manage (lifecycle), /n8n-browse (explore 1,396 nodes). Link in the replies:

English

159

Alessandro Benigni@itsbenigni·4 Nis

@jonallie Totally agree, we are shifting from being the brain behind AI orchestration, to providing the right tools for AI to orchestrate workflows.

English

355

jon allie@jonallie·4 Nis

Personal rule of thumb: don't use an LLM for something that a deterministic program can do. I get it, LLMs are exciting, but they don't mean that software ceases to exist. They are fantastic at dealing with human language and ambiguity, but are terrible (by design and for good reason) at repeatability. To borrow terminology from the book Thinking Fast and Slow, LLMs are "system 2"...slower, more "expensive" (for LLMs, both in time and dollars), but flexible and creative. Traditional programs are "system 1" ..fast and cheap, but inflexible and dumb. Instead of trying to put an LLM in the "hot loop" of your program, it's usually worth asking an agent to write a deterministic program to do the thing you need done. Since code is "cheap", this deterministic tool can do exactly what you want it to, and doesn't consume tokens on every execution. (This applies to agents too..I find myself regularly yelling at Claude to stop repeatedly generating the same 30 lines of python to inspect a file, and instead telling it to generate a 3-line shell script wrapper around jq that it can check in and call repeatedly)

English

110

1.1K

97.5K

Alessandro Benigni retweetledi

goodfuture@0xGoodfuture·3 Nis

every skeleton screen you've ever hand-coded is a waste of time you're literally measuring padding and guessing widths to build a worse version of a layout that already exists in your DOM so I made a package that just reads the real one

English

185

371

6.3K

762.5K

Alessandro Benigni retweetledi

Sudo su@sudoingX·2 Nis

google just dropped gemma 4 while i'm in the middle of testing nvidia and alibaba's flagships on 2x H200. perfect timing. this 31B thinking model beating qwen 122B and deepseek v3.2 on elo. 26B variant with only 4B active. that fits on a phone. it's almost 2am and i just published nvidia's octopus invaders results. qwen is loading next. but this is going on the queue immediately. if you already ran it drop your numbers below. model, quant, hardware, inference engine, tok/s. i want to see what this thing does before i get my hands on it.

Google Gemma@googlegemma

Meet Gemma 4! Purpose-built for advanced reasoning and agentic workflows on the hardware you own, and released under an Apache 2.0 license. We listened to invaluable community feedback in developing these models. Here is what makes Gemma 4 our most capable open models yet: 👇

English

315

31.5K

Alessandro Benigni retweetledi

Chris Tate@ctatedev·2 Nis

Introducing render-json The Generative JSON framework. 1. Point it at anything 2. It generates JSON 3. That's it Apps, games, and more. If it exists, it can be converted into a JSON spec. 𝚗𝚙𝚖 𝚒 @𝚓𝚜𝚘𝚗-𝚛𝚎𝚗𝚍𝚎𝚛/𝚛𝚎𝚗𝚍𝚎𝚛-𝚓𝚜𝚘𝚗

English

186

3.1K

192.5K

Alessandro Benigni retweetledi

Sharbel@sharbel·1 Nis

🚨 IMPORTANT: Google quietly open sourced a time-series AI that predicts anything. Sales trends. Market prices. User traffic. Energy demand. Crypto volatility. It's called TimesFM. Here's why it's underrated: → Pre-trained on 100B real-world data points → Zero-shot forecasting, no fine-tuning needed → Outperforms supervised models trained on your specific data → Runs locally. Free. Apache license. Most people are focused on language models. The quietly powerful ones are learning to predict the future.

English

412

3.5K

296.5K

Alessandro Benigni retweetledi

Noah@NoahKingJr·2 Nis

Me to Claude: "Make no mistakes. DO NOT HALLUCINATE. YOU ARE AN EXPERT SOFTWARE ENGINEER"

English

112

10K

613.8K

Alessandro Benigni retweetledi

le.hl@0xleegenz·2 Nis

My manager watching me leave work at 5 PM when my shift ends at 5 PM

English

306

5.9K

84.9K

4.2M

Alessandro Benigni retweetledi

David YT@coffeecup2020·2 Nis

Turbo Quant on weight Happy to announce TQ3_4S 2x faster, better quality than TQ3_1S, same size. huggingface.co/YTan2000/Qwen3… github.com/turbo-tan/llam… Credit to @no_stp_on_snek who inspired me with his Turbo Quant KV. Please note Q3 wins on median PPL. Will beat this in my next model. Just need slight tweak.

English

210

12.4K

Alessandro Benigni retweetledi

Sukh Sroay@sukh_saroy·2 Nis

🚨Breaking: Stanford researchers built a new prompting technique! By adding ~20 words to a prompt, it: - boosts LLM's creativity by 1.6-2x - raises human-rated diversity by 25.7% - beats fine-tuned model without any retraining - restores 66.8% of LLM's lost creativity after alignment Let's understand why and how it works: Post-training alignment methods like RLHF make LLMs helpful and safe, but they unintentionally cause mode collapse. This is where the model favors a narrow set of predictable responses. This happens because of typicality bias in human preference data: When annotators rate LLM responses, they naturally prefer answers that are familiar, easy to read, and predictable. The reward model then learns to boost these "safe" responses, aggressively sharpening the probability distribution and killing creative output. But here's the interesting part: The diverse, creative model isn't gone. After alignment, the LLM still has two personalities. The original pre-trained model with rich possibilities, and the safety-focused aligned model. Verbalized Sampling (VS) is a training-free prompting strategy that recovers the diverse distribution learned during pre-training. The idea is simple: Instead of prompting "Tell me a joke" (which triggers the aligned personality), you prompt: "Generate 5 responses with their corresponding probabilities. Tell me a joke." By asking for a distribution instead of a single instance, you force the model to tap into its full pre-trained knowledge rather than defaulting to the most reinforced answer. Results show verbalized sampling enhances diversity by 1.6-2.1x over direct prompting while maintaining or improving quality. Variants like VS-based Chain-of-Thought and VS-based Multi push diversity even further. You can find the paper link in the next tweet. 👉 Over to you: What other methods can be used to improve LLM diversity?

English

318

25.2K

Alessandro Benigni retweetledi

Avi Chawla@_avichawla·29 Mar

Microsoft did it again! Building with AI agents almost never works on the first try. A dev has to spend days tweaking prompts, adding examples, hoping it gets better. This is exactly what Microsoft's Agent Lightning solves. It's an open-source framework that trains ANY AI agent with reinforcement learning. Works with LangChain, AutoGen, CrewAI, OpenAI SDK, or plain Python. Here's how it works: > Your agent runs normally with whatever framework you're using. Just add a lightweight agl.emit() helper or let the tracer auto-collect everything. > Agent Lightning captures every prompt, tool call, and reward. Stores them as structured events. > You pick an algorithm (RL, prompt optimization, fine-tuning). It reads the events, learns patterns, and generates improved prompts or policy weights. > The Trainer pushes updates back to your agent. Your agent gets better without you rewriting anything. In fact, you can also optimize individual agents in a multi-agent system. I have shared the link to the GitHub repo in the replies!

English

205

1.3K

110.9K

Keşfet

@hunvreus @cryptopunk7213 @n8n_io @claudeai @jonallie @no_stp_on_snek @elonmusk @BarackObama