Ettore Di Giacinto

2.7K posts

Ettore Di Giacinto

@mudler_it

dad, creator of LocalAI(https://t.co/ReVYw5Pf4D) and Kairos (https://t.co/R6M51FYVs7) , ex @SUSE/@Rancher, ex-Gentoo Dev.

Italy Katılım Ocak 2016

256 Takip Edilen3K Takipçiler

Sabitlenmiş Tweet

Ettore Di Giacinto@mudler_it·11 May

LocalAI ( @LocalAI_API ) 4.2.0 is out, just few numbers and facts: - +392 commits ( we squash these 😄 ) - +11 Backends: voice and face recognition, vibevoice.cpp (from me), LocalQVE from @jichiep and among @sgl_project , @__tinygrad__ , @no_stp_on_snek 's Turboquant, ik_llama.cpp, sam.cpp from @el_PA_B - Many new QoL improvements, increased sglang and VLLM support and hardening on distributed mode - 16+ new contributors ! Thanks to the community! LocalAI is all about give you flexibility to run the latest from the community, and ds4 support from @antirez is on its way! This is the year of Local AI!

English

7.1K

Ettore Di Giacinto@mudler_it·1d

@outsource_ That's basically @LocalAI_API!

English

2.7K

Eric ⚡️ Building...@outsource_·1d

I have this thesis to build a tool that detects your hardware -> select your use case -> auto loads a swarm of agents on your hardware Maximum potential to get 100% utilization To load multiple models across your stack. It’s not about what model you can load, it’s about how MANY!

0xSero@0xSero

I think the play is to have multiple tiers of hardware, each for a different use-case. Instead of focusing on having 1 huge model than can do it all, we can have vision, audio, media gen, agent, recursive problem solving, security etc.. all small and focused. Cheaper better

English

3.4K

Ettore Di Giacinto@mudler_it·2d

@lucasmeijer And grammars as well are pretty common for this

Ettore Di Giacinto@mudler_it

Seems @LocalAI_API functions (OpenAI functions equivalent) are getting closer!🔥 My weekend project: playing with github.com/ggerganov/llam…

English

825

Ettore Di Giacinto@mudler_it·2d

@lucasmeijer Bah, maybe I'm missing something, but this is exactly how everything started with agents almost 3 years ago

Ettore Di Giacinto@mudler_it

So, I'm playing with something.. and.. almost alive! 🧪Experiment: can you make 100% local (that works on CPU and on GPU) something like Bing or AutoGPT and works as well with OpenAI APIs? Seems you can! a small 🧵👇

English

1.7K

Lucas Meijer@lucasmeijer·3d

Turns out if your agent and your inference are in the same process you can do some cool tricks. He basically,

antirez@antirez

I finally found *the* solution I wanted to the old/new editing problem. And it is a solution that at the same time works extremely well, is quite elegant I believe, and can't be implemented if you don't build something like DwarfStar. Thread (but check [upto] in the screenshot).

English

186

49.2K

Ettore Di Giacinto@mudler_it·3d

@rfleury Nothing new, forks and renames / refactors used to be a thing too

English

398

Ryan Fleury@rfleury·3d

Open source will be greatly diminished due to mass obfuscated license infringement. Your licenses may as well be letters to Santa. The path forward will be paywalls and codesharing within curated communities.

Ray@raysan5

Wow! It seems somebody already vibe-coded a game-engine with an extremely similar API to raylib... unfortunately no mention to raylib, at all... 😓

English

99.1K

Ettore Di Giacinto@mudler_it·3d

It definitely is. Been working on this since 2023

Sandro@pupposandro

Local AI is still too complex for 99% of people and the fix isn't better kernel tutorials or cleaner docs. it's solving the entire software stack Local LLMs are still waiting for their plug-and-play moment. Exactly what we're building @luceboxai for

English

5.6K

Ettore Di Giacinto@mudler_it·3d

@ngiocoli Oppure, con ancora più lungimiranza si da più opportunità a quello che abbiamo nel nostro paese invece di favorire big corps American, con un monopolio che è molto rischioso perché attualmente nel campo dell'AI c'è uno sbilanciamento pazzesco.

Italiano

227

Nicola Giocoli@ngiocoli·4d

Un governo minimamente lungimirante proverebbe a convincere Amodei a trasformare un semplice ufficio in un centro di ricerca. Dandogli carta bianca su come farlo. Accompagnando l'offerta con sgravi fiscali ad hoc (e negoziando con UE di non rompere col discorso aiuti di Stato).

Seb Johnson@SebJohnsonUK

Anthropic is doubling down on Europe and opening an office in Milan. Europe is Anthropic's fastest growing region with revenue up 9x YoY, while Milan is the centre of Italian tech. Anthropic now has hubs in London, Dublin, Zurich, Paris, Munich and Milan. It's pushing more and more into Europe as its relationship with the US continues to sour. Great stuff for Italy!

Italiano

593

58.9K

Ettore Di Giacinto@mudler_it·5d

@lu_zero_ @spacemit_riscv Let's see :)

English

Luca Barbato@lu_zero_·5d

@spacemit_riscv @mudler_it shall we see if the APEX quantization plays nicely with this?

English

367

SpacemiT@spacemit_riscv·19 May

Just merged: SpacemiT K3 IME2 support is now upstream in llama.cpp (ggml CPU backend). 1. IME2 instructions for SpacemiT K3 2. Native quantized formats: Q2_K → Q8_0, efficient 4-bit ops for Q4 models 3. TCM access: first-time public interface with usage examples This upstream patch adds K3’s IME2-based acceleration in llama.cpp, providing a maintainable path for developers to leverage hardware-optimized kernels. Details: github.com/ggml-org/llama… 🔧 Detailed technical walkthrough and examples are coming soon — stay tuned. #RISC_V #llamacpp #AI #BackendDev #OpenSource

English

1.2K

Ettore Di Giacinto@mudler_it·6d

@KhazAkar @TheAhmadOsman @LocalAI_API Thanks for the shout out! Would be cool if @TheAhmadOsman would have a look at it, but probably he didn't heard about it as he joined late the local llama bandwagon

English

Damian Z. 💛🇪🇺@KhazAkar·6d

@TheAhmadOsman I would swap pure llama.cpp into @LocalAI_API by @mudler_it , unless you have HW like RISC-V/ARM/Other odd setup - why? You get most of that portability of llama.cpp, with a possibility to run orchestration for 2+ PC over the network for distributed inference. Easier scaling etc

English

105

Ahmad@TheAhmadOsman·6d

DROP EVERYTHING The bible for running LLMs locally is now available online to read for free Covers what to use on - Laptop / edge / odd hardware - Mac-first workflows - Single RTX GPUs - 2-4+ NVIDIA / CUDA GPUs - General production serving - Long-context / MoE / routing - NVIDIA max performance - Cluster orchestration Software - llama.cpp - MLX / MLX-LM - ExLlamaV2 - ExLlamaV3 - vLLM - SGLang - TensorRT-LLM - NVIDIA Dynamo You should read this, and if you cannot now then you most definitely wanna bookmark it for later Local AI FTW

Ahmad@TheAhmadOsman

x.com/i/article/2057…

English

238

1.8K

236.8K

Ettore Di Giacinto@mudler_it·20 May

Binance looks so hot in security rn

CZ 🔶 BNB@cz_binance

If you have API keys in your code, even private repos, now is the time to double check and change them...

English

Ettore Di Giacinto retweetledi

Richard Palethorpe@jichiep·20 May

The idea that LLMs produce slop and are therefor useless is as dangerous to creators as the idea that because your vibe coded OS works in some capacity, the code base is therefor a sound foundation. While the unconstrained output of LLMs is dangerous. When you restrict the output so that it can only make errors that you have a method of recovering from then you move far, far quicker. Not as fast as people may naively think from a weekend vibe coding, but still very fast and potentially better quality because covering every edge case is far cheaper (as with fuzzing). What's dangerous is when you don't know what kind of error the system can produce or be subject to. There are a lot of people in AI using FOMO and alarm to get attention and I'm in AI, but I have to call it as I see it and there is very high risk to companies and projects where the failure modes of the product are well known and the agentic coding system can be suitably constrained so that the system identifies errors and can "learn" from them, reverse them or only suffers from harmless errors the user can tolerate. A lot of companies and individuals are failing to identify the ways in which LLMs and in particular LLM coding agents can go wrong and this is a public safety issue. We don't fully understand what happens when software complexity increases and some types of software are just not well understood at all. However you have to look at what you are creating and ask if that's really your present project?

English

1.1K

Ettore Di Giacinto@mudler_it·20 May

Congratulations for the release @allen_ai This is how real OSS AI looks like and it should, I wish they can be an inspiration for many. Other companies claiming doing AI OSS can't compare to the same openness that @allen_ai shows constantly. Yes, they indeed publish pre-training datasets. - Thank you from the community!

Ai2@allen_ai

Today we’re releasing OlmoEarth v1.1. It’s 3x cheaper to run than v1 while delivering the same state-of-the-art performance—and fully open. 🧵

English

1.6K

Ettore Di Giacinto@mudler_it·19 May

woah!

Andrej Karpathy@karpathy

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

English

524

Ettore Di Giacinto retweetledi

Richard Palethorpe@jichiep·19 May

OBS now has realtime echo cancellation and noise suppression via the new LocalVQE plugin! As far as I know there isn't another AI solution which combines echo cancellation with noise suppression for OBS?

English

802

Ettore Di Giacinto@mudler_it·18 May

Go go llama.cpp!

Georgi Gerganov@ggerganov

llama.cpp adds MTP for the Qwen3.6 family This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further. Special thanks to Aman Gupta for leading this development! github.com/ggml-org/llama…

English

1.4K

Ettore Di Giacinto@mudler_it·18 May

@oppollo11 You can check benchmarks here github.com/mudler/apex-qu…

English

Oppollo@oppollo11·18 May

@mudler_it Any benchmarks about how compares to base model?

English

Ettore Di Giacinto@mudler_it·17 May

New APEX-MTP quant drop! Qwen3.6-35B-A3B-APEX-MTP-GGUF 👇HF link

8.1K

Ettore Di Giacinto@mudler_it·17 May

@Italianclownz @no_stp_on_snek Thank you! Really appreciated!

English

Carlo@Italianclownz·17 May

There are a few people in this AI community I would like to really thank for all they do. The first person is @no_stp_on_snek . He has worked in his free time to make AI more accessible to a majority of people with TurboQuant and his vLLM Swift project. Not only has he do ne these projects but he has written on his thoughts and ideas. I would like to thank @UnslothAI for their dedication to AI and making great quants. Giving the community the tools to make AI more accessible. @mr_r0b0t for his contributions in the AI space and posting results. @mudler_it for giving us some great options in model choices. I see a lot of great APEX models being released. @morganlinton for being a positive member, contributing to the space, eager and happy to put models through their long paces. Probably had the longest /goal I have ever seen. And @Scobleizer who is always pointing out more amazing people in the AI space who are contributing to projects and building with new ideas. I just want to let you all know I appreciate all of you. There are many others but the Xeet would just go on and on. Sorry for this tweet with all the tags but I hope people that see this and follow me, also follow all of you because of all the great work you just keep doing.

English

1.1K

Ettore Di Giacinto@mudler_it·17 May

@maxgreco Can't tell as I don't own that hardware, but let us know how it goes!

English

maxgreco@maxgreco·17 May

@mudler_it Thanks a lot, there are improvements in speed also for Pascal generation Nvidia cards?

English

253

Ettore Di Giacinto@mudler_it·17 May

@cursedrobot APEX was developed with help of Claude, not vibe science, but it heavily helps in automatisms and keeping long quants/benchmarks going.

English

Id est@cursedrobot·17 May

@mudler_it Gotta ask, how exactly did you develop your APEX method? I see Claude in contributors, does it mean it is vibe-science?

English

177

Ettore Di Giacinto@mudler_it·17 May

@_Suresh2 @LocalAI_API Quantization is supposed to be stable, I wouldn't expect major differences

English

Suresh@_Suresh2·17 May

@mudler_it @LocalAI_API hope the re-quant preserves the original blend's perplexity

English

103

Ettore Di Giacinto@mudler_it·17 May

MTP just merged in llama.cpp, meanwhile: - currently re-quantizing all APEX quants with MTP, stay tuned as they land in my HF profile - tagged a new @LocalAI_API release with MTP support: importing models with embedded drafter will automatically configure best settings. Nothing to do. Just click and go at max speed.

Ettore Di Giacinto@mudler_it

Just got merged github.com/ggml-org/llama…

English