Nimit Kalra

199 posts

Nimit Kalra banner
Nimit Kalra

Nimit Kalra

@qw3rtman

Research @Columbia with @MicahGoldblum (self-play, RL, reasoning, world models). Prev: @HaizeLabs @Citadel.

Katılım Ekim 2011
1.1K Takip Edilen1.4K Takipçiler
Nimit Kalra
Nimit Kalra@qw3rtman·
@YafahEdelman Very large knowledge updates that don't fit in-context (and it's hard/impossible to filter down to a good subset). Also I think it's not clear if/when in-context is actually as first-class as model weights.
English
1
0
0
174
Yafah Edelman
Yafah Edelman@YafahEdelman·
I don't get why everyone is talking about continual learning so much. The original GPT-3 paper was all about and how models can learn in-context, and they've only gotten better since then.
English
23
3
86
9.1K
Alexander Panfilov
Alexander Panfilov@kotekjedi_ml·
New paper: We deploy Claude Code in an autoresearch loop to discover novel jailbreaking algorithms – and it works. It beats 30+ existing GCG-like attacks (with AutoML hyperparameter tuning) This is a strong sign that incremental safety and security research can now be automated.
Alexander Panfilov tweet media
English
49
207
1.6K
300.9K
Nimit Kalra
Nimit Kalra@qw3rtman·
coding agents that insert breakpoints for themselves
English
0
0
1
334
Nimit Kalra retweetledi
Leonard Tang
Leonard Tang@leonardtang_·
Hello MJ1: The World's TASTIEST Judge Model Agent verification is the bottleneck to AI's progress. The field's ability to verify visual output lags far behind that of text, especially in matters of ~taste~. So we built the world's tastiest multimodal judge model, MJ1.
Leonard Tang tweet media
English
10
7
63
10.9K
Nimit Kalra retweetledi
Manya Wadhwa
Manya Wadhwa@ManyaWadhwa1·
⚛️ Introducing CREATE, a benchmark for creative associative reasoning in LLMs. Making novel, meaningful connections is key for scientific & creative works. We objectively measure how well LLMs can do this. 🧵👇
Manya Wadhwa tweet media
English
4
43
143
21.1K
Nimit Kalra retweetledi
Greg Durrett
Greg Durrett@gregd_nlp·
Check out Manya's benchmark for LLM creativity! Inspired by work on creativity in graphs (@AdtRaghunathan's "roll the dice" paper), CREATE isolates testing of creative insights for discovery. Future: understand how LLMs derive insights & how they can be better creative partners!
Manya Wadhwa@ManyaWadhwa1

⚛️ Introducing CREATE, a benchmark for creative associative reasoning in LLMs. Making novel, meaningful connections is key for scientific & creative works. We objectively measure how well LLMs can do this. 🧵👇

English
0
13
57
7.7K
Nimit Kalra retweetledi
Ying Wang
Ying Wang@yingwww_·
What is a good latent space for world modeling and planning? 🤔 Inspired by the perceptual straightening hypothesis in human vision, we introduce temporal straightening to improve representation learning for latent planning. 📑: agenticlearning.ai/temporal-strai…
Ying Wang tweet media
English
29
129
778
235.9K
Nimit Kalra retweetledi
Justus Mattern
Justus Mattern@MatternJustus·
Proximal is hiring research scientist interns! Our research team is working on projects touching coding agents, long-horizon RL and evals. We share our work in the public through papers and open-source releases (first ones coming soon!). DM me if interested :)
English
21
23
463
99.8K
Nimit Kalra retweetledi
CASEY
CASEY@caseykcaruso·
we love the AI scientific discovery category i really believe llms are going to make insane future scientific discoveries last yr ppl still wondered if llms were stupid next token predictors. glad we are over that. dm me or @hunterharloff if you're building in the space
Sina@SinaHartung

this VC firm literally mapped out the entire AI space and no one is talking about it 🤯 huge alpha here for anyone looking to get into the space or understand the current sota of tech IMO

English
9
7
136
19.1K
Nimit Kalra retweetledi
Sina
Sina@SinaHartung·
this VC firm literally mapped out the entire AI space and no one is talking about it 🤯 huge alpha here for anyone looking to get into the space or understand the current sota of tech IMO
Sina tweet media
English
15
19
415
68.3K
Nimit Kalra retweetledi
Karina Nguyen
Karina Nguyen@karinanguyen·
Excited to release PostTrainBench v1.0! This benchmark evaluates the ability of frontier AI agents to post-train language models in a simplified setting. We believe this is a first step toward tracking progress in recursive self-improvement 🧵:
English
45
90
677
148.2K
Nimit Kalra retweetledi
Nimit Kalra retweetledi
Ken Liu
Ken Liu@kenziyuliu·
Can we build a blind, *unlinkable inference* layer where ChatGPT/Claude/Gemini can't tell which call came from which users, like a “VPN for AI inference”? Yes! Blog post below + we built it into open source infra/chat app and served >15k prompts at Stanford so far. How it helps with AI user privacy: # The AI user privacy problem If you ask AI to analyze your ChatGPT history today, it’s surprisingly easy to infer your demographics, health, immigration status, and political beliefs. Every prompt we send accumulates into an (identity-linked) profile that the AI lab controls completely and indefinitely. At a minimum this is a goldmine for ads (as we know now). A bigger issue is the concentration of power: AI labs can easily become (or asked to become) a Cambridge Analytica, whistleblow your immigration status, or work with health insurance to adjust your premium if they so choose. This is a uniquely worse problem than search engines because your average query is now more revealing (not just keywords), interactive, and intelligence is now cheap. Despite this, most of us still want these remote models; they’re just too good and convenient! (this is aka the "privacy paradox".) # Unlinkable inference as a user privacy architecture The idea of unlinkable inference is to add privacy while preserving access to the remote models controlled by someone else. A “privacy wrapper” or “VPN for AI inference”, so to speak. Concretely, it’s a blind inference middle layer that: (1) consists of decentralized proxies that anyone can operate; (2) blindly authenticates requests (via blind signatures / RFC9474,9578) so requests are provably sandboxed from each other and from user identity; (3) relays prompts over randomly chosen proxies that don’t see or log traffic (via client-side ephemeral keys or hosting in TEEs); and (4) the provider simply sees a mixed pool of anonymous prompts from the proxies. No state, pseudonyms, or linkable metadata. If you squint, an unlinkable inference layer is essentially a vendor for per-request, anonymous, ephemeral AI access credentials (for users or agents alike). It partitions your context so that user tracking is drastically harder. Obviously, unlinkability isn’t a silver bullet: the prompt itself still goes to the remote model and can leak privacy (so don't use our chat app for a therapy session!). It aims to combat *longitudinal tracking* as a major threat to user privacy, and its statistical power increases quickly by mixing more users and requests. Unlinkability can be applied at any granularity. For an AI chat app, you can unlinkably request a fresh ephemeral key for every session so tracking is virtually impossible. # The Open Anonymity Project We started this project with the belief that intelligence should be a truly public utility. Like water and electricity, providers should be compensated by usage, not who you are or what you do with it. We think unlinkable inference is a first step towards this “intelligence neutrality”. # Try it out! It’s quite practical - Chat app “oa-chat”: chat.openanonymity.ai (<20 seconds to get going) - Blog post that should be a fun read: openanonymity.ai/blog/unlinkabl… - Project page: openanonymity.ai - GitHub: github.com/OpenAnonymity
Ken Liu tweet media
English
62
155
829
377.1K
Nimit Kalra
Nimit Kalra@qw3rtman·
@tejasmanohar Game changing workflow 🙌 pulling from a custom content library is a huge unlock
English
0
0
2
140
Nimit Kalra retweetledi
Tejas Manohar
Tejas Manohar@tejasmanohar·
Content (!!) in Hightouch is happening. I'm really pumped about this new feature 🔥 The core idea is something we call "content assembly" Ask for a campaign and we’ll make it using all your existing creative across your tools like Figma, Adobe, Dropbox, etc.
English
5
11
34
3.4K