NDIF

114 posts

NDIF banner
NDIF

NDIF

@ndif_team

The National Deep Inference Fabric, an NSF-funded computational infrastructure to enable research on large-scale Artificial Intelligence. https://t.co/STsQ707an3

Boston, MA Katılım Mayıs 2024
0 Takip Edilen490 Takipçiler
NDIF
NDIF@ndif_team·
Submit a clean write-up as a Colab notebook by end of month. All code and training scripts are open source: github.com/andyrdt/puzzles We'll highlight and showcase our favorite submissions!
English
0
0
9
1.1K
NDIF
NDIF@ndif_team·
Directly inspired by @calsmcdougall's ARENA challenges, these are meant to be educational and fun — a way to practice mech interp on models small enough to fully understand. It's also a great excuse to try out nnsight (nnsight.net) for poking around model internals!
English
1
0
13
1.5K
NDIF
NDIF@ndif_team·
📣 Launching monthly interp puzzles 🧩 Each month: a model trained on a toy task. Your job: reverse-engineer the algorithm it learned. First puzzle: how does a 1-2L attn-only transformer find the max of a list? Starter Colab included. Deadline: April 30 puzzles.baulab.info
English
3
30
229
36.1K
NDIF
NDIF@ndif_team·
🚨 You have one week left to submit your red team proposal for our hackathon! Due 3/31: cadenza-labs.github.io/red-team-rfp For inspiration, check out our blog post on what our team plans to implement: nnsight.net/blog/2026/03/2…
David Bau@davidbau

Calling attention to an exciting "deception detection" hackathon we're planning this summer! w @NDIF and @CadenzaLabs. Recruiting red teams now, blue teams later. Red teams, time is short: proposals due Mar 31. $10K stipend + compute, $15K finals prize. nnsight.net/blog/2026/03/2…

English
0
2
5
918
NDIF
NDIF@ndif_team·
📣 Try NNsight 0.6 today! Includes 2.4–3.9x speed boost, vLLM improvements, remote source code serialization (greater flexibility!), native Python debugging, vision and diffusion model built-ins, and more! Read the blog post: nnsight.net/blog/2026/02/2…
Jaden Fiotto-Kaufman@jadenfk23

NNsight 0.6 is out now! We directly address your feedback in our biggest release yet. Pain points included cryptic errors, slow traces, no remote execution of custom code, and limited vLLM support. We tackle all of these and more in this new release. 🧵 Here's what changed:

English
1
1
7
204
NDIF
NDIF@ndif_team·
Red teams trained a model with a secret objective by exploiting RLHF reward models. Blue teams then audited the model, using techniques such as interpretability with sparse autoencoders, behavioral attacks, and training data analysis to successfully uncover the hidden objective.
English
1
0
1
64
NDIF
NDIF@ndif_team·
Watch @saprmarks present his work "Auditing Language Models for Hidden Objectives" in our new YouTube video! Sam's team ran a blind auditing game to assess efficacy of black box and white box techniques for LLM alignment auditing. 🔗 youtu.be/jZiOJTHqB6M
YouTube video
YouTube
English
1
0
5
257
NDIF
NDIF@ndif_team·
Big thanks to the whole organizing team, especially @NeelNanda5 and @andyarditi, for hosting such a great workshop and inviting us to speak!
English
0
0
4
100
NDIF
NDIF@ndif_team·
New YouTube video posted! @FeuerBenjamin discusses LLM's annus mirabilis, presenting his work on open questions surrounding LLM judges, benchmark trustworthiness, and maximizing the potential of synthetic data. Watch here: youtube.com/watch?v=pehcEd…
YouTube video
YouTube
English
0
0
6
155
NDIF
NDIF@ndif_team·
Check out Interpreto, a new AI interpretability toolbox! Interpreto provides user-friendly tooling for interpretability methods and evaluation and uses NNsight under the hood to collect activations for their concept evaluation framework.
Antonin Poché@Antonin_Poche

🔥I am super excited for the official release of an open-source library we've been working on for about a year! 🪄interpreto is an interpretability toolbox for HF language models🤗. In both generation and classification! Why do you need it, and for what? 1/8 (links at the end)

English
0
3
9
376
NDIF
NDIF@ndif_team·
New year, new YouTube videos! We are resuming our regular interpretability seminar posts, with a fantastic talk by @deeptigp on interpretability in diffusion models. Watch the video: youtu.be/4eqvABPX5rA
YouTube video
YouTube
English
1
1
7
414
NDIF
NDIF@ndif_team·
nnterp by @Butanium_ is now part of the NDIF ecosystem! nnterp standardizes transformer naming conventions, includes built-in best practices for common interventions, and is perfectly compatible with original HF model implementations. Learn more: ndif-team.github.io/nnterp/
English
1
2
10
729