Harit Vishwakarma

274 posts

Harit Vishwakarma

@harit_v

Postdoc@Oxford, LLM Reliability and Data-centric AI, prev. @WisconsinCS, @iiscbangalore, @IBMResearch.

Oxford, UK Katılım Aralık 2012

1.7K Takip Edilen569 Takipçiler

Harit Vishwakarma retweetledi

Aniket Rege@wregss·16 Mar

Hi ML Twitter! My Summer 2026 internship unfortunately fell through last minute 😵‍💫 If your team is looking for interns, I’d love to connect - RTs appreciated 🙏 My website: aniketrege.github.io

English

270

32.3K

Harit Vishwakarma retweetledi

Yee Whye Teh@yeewhye·2 Mar

Looking for a senior postdoc working in the diffusions & flows space @OxfordStats @oxcsml , particularly with interests in reward steering and inverse problems. 15 months in first instance, extendable. Please reach out if you have questions and apply: my.corehr.com/pls/uoxrecruit…

English

101

10.4K

Harit Vishwakarma retweetledi

Demis Hassabis@demishassabis·18 Şub

Thanks for hosting me @iiscbangalore! Really enjoyed talking with Prof. Rangarajan & Varun Mayya @waitin4agi_ about AI for Science. Impressed by the energy and enthusiasm for AI in India, especially from the young. Great to see the statue of Ramanujan, one of my all-time heroes!

IISc Bangalore@iiscbangalore

We were honoured to host Sir Demis Hassabis at IISc today for deeply insightful discussions!

English

323

4.9K

344.4K

Harit Vishwakarma retweetledi

Gabe Orlanski@GOrlanski·11 Şub

GPT-5.3 Codex and Opus 4.6 are a clear jump in capability. On SCBench, both exceed 50% on core tests. Meaningful improvement over prior models. 5.3 is ~54% faster than its predecessor. 4.6 nearly doubles the runtime over 4.5. The code is still a huge mess when unsupervised.

English

5.6K

Harit Vishwakarma retweetledi

Fred Sala@fredsala·12 Şub

We’ve made huge strides in model & agent capability. Now it’s time to scale up measurement. We’re excited to support open benchmarks that capture every aspect of the brave new agentic world: complexity, long horizon, autonomy, and rich outputs. Work with us to make it happen!

vincent sunn chen@vincentsunnchen

x.com/i/article/2021…

English

4.3K

Harit Vishwakarma retweetledi

vincent sunn chen@vincentsunnchen·11 Şub

x.com/i/article/2021…

ZXX

323

145K

Harit Vishwakarma@harit_v·5 Şub

I want to go with this same analogy, but it breaks a bit due to a lack of reliability. On gcc I could trust blindly, and if there is a bug, it is in my code and not in the compiler's translation. We went from the lowest level of abstraction(semiconductors) to a relatively high level programing lanaguage (say Python), and there was no loss in meaning (except maybe for some rarest events), i.e., we always got what we asked for. Maybe we need abstractions that are higher than Python but lower than natural language to impose some structure that can help build trust, like traditional compilers.

English

Karthik Narasimhan@karthik_r_n·5 Şub

Ok here’s the corollary. If coding agents are like compilers, the actual code generated isn’t going to be the valuable asset very soon. Instead, it will be the prompts, the design specs, the curated examples, etc. - we will be using version control for these artifacts instead of the generated code. And probably spending a lot more time on carefully designing them. And when the next improved coding model drops, simply ‘recompile’ your spec again to get a newly optimized version (or ask it to improve your existing code). Just like how your C program can be the same but compiler improvements under the hood help improve its execution.

Karthik Narasimhan@karthik_r_n

AI coding agents are essentially compilers -- for English instead of C.

English

3.8K

Harit Vishwakarma retweetledi

Murat Kocaoglu@murat_kocaoglu_·3 Şub

I am excited to announce our Workshop on Causality in the Age of AI Scaling in AISTATS 2026! - Is scaling sufficient for intelligent systems? - Can causal abilities emerge from scale? - What can causal modeling bring that scale cannot? causcale.github.io RTs appreciated

English

3.9K

Harit Vishwakarma retweetledi

Snorkel AI@SnorkelAI·28 Oca

Exciting work from our team, studying data efficiency for RLVR. These kinds of insights inform our dataset creation work for foundation model labs. Kudos to @realjustinbauer @pham_derek for this paper's acceptance to #MLSys 2026!

Justin Bauer@realjustinbauer

Our paper “Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes” was accepted to #MLSys 2026! We introduce three procedurally generated, verifiable datasets—Counting, Graph, and Spatial Reasoning—to study RLVR under low-data / low-compute constraints. Key result: small, mixed-complexity datasets can be more data-efficient than large, easy ones.

English

1.3K

Harit Vishwakarma@harit_v·28 Oca

It was super fun doing RLVR with different data mixtures in settings of data and compute scarcity. Excited for this work to appear in MLSys!!!

Justin Bauer@realjustinbauer

English

437

Harit Vishwakarma retweetledi

Shashank Agnihotri@shashankska·26 Oca

Currently underway in Peridot 202 #AAAI2026 Singapore Expo at the AIR-FM workshop!

Yee Whye Teh@yeewhye

@shashankska Junxian He from HKUST keynote on reliability of virtual employees 🫡

English

165

Harit Vishwakarma retweetledi

Yee Whye Teh@yeewhye·26 Oca

@shashankska Junxian He from HKUST keynote on reliability of virtual employees 🫡

English

858

Harit Vishwakarma@harit_v·25 Oca

Happening tomorrow!!! Do join us if you are at #AAAI2026 In Singapore till 28th, happy to chat :)

Shashank Agnihotri@shashankska

Join us at #AAAI2026 (Singapore) for AIR-FM: Assessing and Improving Reliability of Foundation Models in the Real World. 📅 Mon, 26 Jan 2026 | 8:30–5:00 📍 Peridot 202 (2nd Floor) llmrel.github.io

English

249

Harit Vishwakarma retweetledi

Fred Sala@fredsala·18 Ara

Had enough AI code slop? Us too. Extremely excited for our new code benchmark explicitly measuring erosion in evolving settings. Check out SlopCodeBench scbench.ai Awesome work from @GOrlanski

Gabe Orlanski@GOrlanski

Often, a bug I have had to fix has been tied directly to overly verbose or defensive code from an AI agent that has eroded my project as the specs changed. Why isn't there an eval for this? There is now. Very excited to announce SlopCodeBench scbench.ai

English

2.4K

Harit Vishwakarma retweetledi

Gabe Orlanski@GOrlanski·18 Ara

English

14.8K

Harit Vishwakarma retweetledi

Dyah Adila 🦄@dyahadila_·1 Ara

⭐ New blog post! Most people think activation steering ≈ a cheap version of finetuning. But why does it sometimes work, and sometimes fall flat? We dug into this and found a surprisingly clear answer. Full breakdown here 👇 sprocketlab.github.io/posts/2025/11/…

English

5.8K

Harit Vishwakarma retweetledi

Fred Sala@fredsala·1 Ara

I’ll be at #NeurIPS2025 (12/3-12/8) representing SprocketLab at @WisconsinCS and @SnorkelAI. If you’re coming and want to chat about data-centric AI, data development, agents, or foundation models, reach out!

English

2.3K

Harit Vishwakarma retweetledi

Snorkel AI@SnorkelAI·25 Kas

Next week at #NeurIPS2025 → the SEA Workshop on scaling environments for agents. If you care about eval, RL, or containerized rollouts — this is the room. Proud to sponsor this workshop. Shoutout to @guohao_li , @lawhy_X , @douglas_ym and @celineee_xie , along with the rest of the organizing team for putting together an incredible lineup. Snorkel’s @fredsala joins the panel on Dec 7. Details: sea-workshop.github.io

English

2.1K

Harit Vishwakarma retweetledi

Andrew Akbashev@Andrew_Akbashev·7 Kas

Pressure to publish jumps. And researchers have no time to do science. (New survey from Elsevier) Survey of 3200 researchers: 1. Only 45% of scientists have sufficient time for actual research. 2. For 68%, the pressure to publish today is greater than 2-3 years ago. 3. 29% of researchers are considering relocating to another country (for better funding, work‐life balance, or greater research freedom). 4. 58% of researchers use AI tools in their work. 5. Reported benefits from AI: saving time (58%), helping with literature summaries (61%), literature reviews (51%), data analysis (38%), drafting proposals (41%), and drafting papers (38%). Globally, life in academia is getting worse. For students & postdocs - it’s especially hard to decide on an academic career. ❗️ A few days ago, I gave a lecture on this topic. “PhD: Dreams, Reality and Consequences” Watch it here: lnkd.in/dA_GhYwd (I’ll appreciate if you ‘like’ this video - you will GREATLY help it reach more students.)

English

518

145.5K

Harit Vishwakarma retweetledi

Justin Bauer@realjustinbauer·31 Eki

Excited to share a new paper I co-authored with the Snorkel Research team: BeTaL — Benchmark Tuning with an LLM-in-the-loop. We explore how LLMs can reason about and refine benchmarks—creating dynamic evaluations that evolve with model capabilities. 📄 arxiv.org/abs/2510.25039

Snorkel AI@SnorkelAI

Static benchmarks can’t keep up with the pace of AI progress. Our latest research introduces BeTaL—Benchmark Tuning with an LLM-in-the-loop—a framework that uses reasoning models to optimize benchmark design dynamically. ✍️ From the Snorkel Research team: @amanda_dsouza , @harit_v , @qi_zhengyang , @realjustinbauer, @pham_derek, Tom Walshe, @ArminParchami, @fredsala, and Paroma Varma

English

598

Keşfet

@OxfordStats @oxcsml @iiscbangalore @waitin4agi_ @realjustinbauer @pham_derek @shashankska @GOrlanski