


Nicolas Pinto
3.2K posts

@npinto
Stealth (Safe/Decentralized AI). Prev: Private AI @ Perceptio (acq by Apple), Scientist/Lecturer @ MIT+Harvard. Music Producer. Engineer. Angel Investor.











LiteLLM HAS BEEN COMPROMISED, DO NOT UPDATE. We just discovered that LiteLLM pypi release 1.82.8. It has been compromised, it contains litellm_init.pth with base64 encoded instructions to send all the credentials it can find to remote server + self-replicate. link below


Microsoft Threat Intelligence has observed threat actors actively experimenting with techniques to bypass or “jailbreak” AI safety controls. By reframing malicious requests, chaining instructions across multiple interactions, and misusing system‑ or developer‑style prompts, threat actors can coerce models into generating restricted content that bypasses built‑in safeguards. These techniques demonstrate how generative AI models are probed, shaped, and redirected to support reconnaissance, malware development, and social engineering while minimizing friction from moderation. AI guardrails have become dynamic surfaces that attackers test and manipulate to sustain operational advantage. As AI becomes more deeply embedded in enterprise workflows, understanding how attackers test and manipulate these guardrails is critical for defenders. Learn more about securing generative AI models on Azure AI Foundry: msft.it/6013Qs5oX

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: hanseungwook.github.io/blog/nca-pre-p… (1/n)






