Harit Vishwakarma

274 posts

Harit Vishwakarma

Harit Vishwakarma

@harit_v

Postdoc@Oxford, LLM Reliability and Data-centric AI, prev. @WisconsinCS, @iiscbangalore, @IBMResearch.

Oxford, UK Katılım Aralık 2012
1.7K Takip Edilen569 Takipçiler
Harit Vishwakarma retweetledi
Aniket Rege
Aniket Rege@wregss·
Hi ML Twitter! My Summer 2026 internship unfortunately fell through last minute 😵‍💫 If your team is looking for interns, I’d love to connect - RTs appreciated 🙏 My website: aniketrege.github.io
English
18
29
270
32.3K
Harit Vishwakarma retweetledi
Yee Whye Teh
Yee Whye Teh@yeewhye·
Looking for a senior postdoc working in the diffusions & flows space @OxfordStats @oxcsml , particularly with interests in reward steering and inverse problems. 15 months in first instance, extendable. Please reach out if you have questions and apply: my.corehr.com/pls/uoxrecruit…
English
1
19
101
10.4K
Harit Vishwakarma retweetledi
Gabe Orlanski
Gabe Orlanski@GOrlanski·
GPT-5.3 Codex and Opus 4.6 are a clear jump in capability. On SCBench, both exceed 50% on core tests. Meaningful improvement over prior models. 5.3 is ~54% faster than its predecessor. 4.6 nearly doubles the runtime over 4.5. The code is still a huge mess when unsupervised.
Gabe Orlanski tweet media
English
5
8
46
5.6K
Harit Vishwakarma retweetledi
Fred Sala
Fred Sala@fredsala·
We’ve made huge strides in model & agent capability. Now it’s time to scale up measurement. We’re excited to support open benchmarks that capture every aspect of the brave new agentic world: complexity, long horizon, autonomy, and rich outputs. Work with us to make it happen!
vincent sunn chen@vincentsunnchen

x.com/i/article/2021…

English
0
14
47
4.3K
Harit Vishwakarma
Harit Vishwakarma@harit_v·
I want to go with this same analogy, but it breaks a bit due to a lack of reliability. On gcc I could trust blindly, and if there is a bug, it is in my code and not in the compiler's translation. We went from the lowest level of abstraction(semiconductors) to a relatively high level programing lanaguage (say Python), and there was no loss in meaning (except maybe for some rarest events), i.e., we always got what we asked for. Maybe we need abstractions that are higher than Python but lower than natural language to impose some structure that can help build trust, like traditional compilers.
English
1
0
0
99
Karthik Narasimhan
Karthik Narasimhan@karthik_r_n·
Ok here’s the corollary. If coding agents are like compilers, the actual code generated isn’t going to be the valuable asset very soon. Instead, it will be the prompts, the design specs, the curated examples, etc. - we will be using version control for these artifacts instead of the generated code. And probably spending a lot more time on carefully designing them. And when the next improved coding model drops, simply ‘recompile’ your spec again to get a newly optimized version (or ask it to improve your existing code). Just like how your C program can be the same but compiler improvements under the hood help improve its execution.
Karthik Narasimhan@karthik_r_n

AI coding agents are essentially compilers -- for English instead of C.

English
3
4
26
3.8K
Harit Vishwakarma retweetledi
Murat Kocaoglu
Murat Kocaoglu@murat_kocaoglu_·
I am excited to announce our Workshop on Causality in the Age of AI Scaling in AISTATS 2026! - Is scaling sufficient for intelligent systems? - Can causal abilities emerge from scale? - What can causal modeling bring that scale cannot? causcale.github.io RTs appreciated
English
2
21
33
3.9K
Harit Vishwakarma retweetledi
Snorkel AI
Snorkel AI@SnorkelAI·
Exciting work from our team, studying data efficiency for RLVR. These kinds of insights inform our dataset creation work for foundation model labs. Kudos to @realjustinbauer @pham_derek for this paper's acceptance to #MLSys 2026!
Justin Bauer@realjustinbauer

Our paper “Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes” was accepted to #MLSys 2026! We introduce three procedurally generated, verifiable datasets—Counting, Graph, and Spatial Reasoning—to study RLVR under low-data / low-compute constraints. Key result: small, mixed-complexity datasets can be more data-efficient than large, easy ones.

English
0
4
18
1.3K
Harit Vishwakarma
Harit Vishwakarma@harit_v·
It was super fun doing RLVR with different data mixtures in settings of data and compute scarcity. Excited for this work to appear in MLSys!!!
Justin Bauer@realjustinbauer

Our paper “Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes” was accepted to #MLSys 2026! We introduce three procedurally generated, verifiable datasets—Counting, Graph, and Spatial Reasoning—to study RLVR under low-data / low-compute constraints. Key result: small, mixed-complexity datasets can be more data-efficient than large, easy ones.

English
0
1
7
437
Harit Vishwakarma retweetledi
Yee Whye Teh
Yee Whye Teh@yeewhye·
@shashankska Junxian He from HKUST keynote on reliability of virtual employees 🫡
Yee Whye Teh tweet media
English
1
1
5
858
Harit Vishwakarma retweetledi
Fred Sala
Fred Sala@fredsala·
Had enough AI code slop? Us too. Extremely excited for our new code benchmark explicitly measuring erosion in evolving settings. Check out SlopCodeBench scbench.ai Awesome work from @GOrlanski
Gabe Orlanski@GOrlanski

Often, a bug I have had to fix has been tied directly to overly verbose or defensive code from an AI agent that has eroded my project as the specs changed. Why isn't there an eval for this? There is now. Very excited to announce SlopCodeBench scbench.ai

English
0
14
27
2.4K
Harit Vishwakarma retweetledi
Gabe Orlanski
Gabe Orlanski@GOrlanski·
Often, a bug I have had to fix has been tied directly to overly verbose or defensive code from an AI agent that has eroded my project as the specs changed. Why isn't there an eval for this? There is now. Very excited to announce SlopCodeBench scbench.ai
Gabe Orlanski tweet media
English
1
18
52
14.8K
Harit Vishwakarma retweetledi
Dyah Adila 🦄
Dyah Adila 🦄@dyahadila_·
⭐ New blog post! Most people think activation steering ≈ a cheap version of finetuning. But why does it sometimes work, and sometimes fall flat? We dug into this and found a surprisingly clear answer. Full breakdown here 👇 sprocketlab.github.io/posts/2025/11/…
Dyah Adila 🦄 tweet media
English
1
16
27
5.8K
Harit Vishwakarma retweetledi
Fred Sala
Fred Sala@fredsala·
I’ll be at #NeurIPS2025 (12/3-12/8) representing SprocketLab at @WisconsinCS and @SnorkelAI. If you’re coming and want to chat about data-centric AI, data development, agents, or foundation models, reach out!
English
2
21
43
2.3K
Harit Vishwakarma retweetledi
Snorkel AI
Snorkel AI@SnorkelAI·
Next week at #NeurIPS2025 → the SEA Workshop on scaling environments for agents. If you care about eval, RL, or containerized rollouts — this is the room. Proud to sponsor this workshop. Shoutout to @guohao_li , @lawhy_X , @douglas_ym and @celineee_xie , along with the rest of the organizing team for putting together an incredible lineup. Snorkel’s @fredsala joins the panel on Dec 7. Details: sea-workshop.github.io
Snorkel AI tweet media
English
0
5
21
2.1K
Harit Vishwakarma retweetledi
Andrew Akbashev
Andrew Akbashev@Andrew_Akbashev·
Pressure to publish jumps. And researchers have no time to do science. (New survey from Elsevier) Survey of 3200 researchers: 1. Only 45% of scientists have sufficient time for actual research. 2. For 68%, the pressure to publish today is greater than 2-3 years ago. 3. 29% of researchers are considering relocating to another country (for better funding, work‐life balance, or greater research freedom). 4. 58% of researchers use AI tools in their work. 5. Reported benefits from AI: saving time (58%), helping with literature summaries (61%), literature reviews (51%), data analysis (38%), drafting proposals (41%), and drafting papers (38%). Globally, life in academia is getting worse. For students & postdocs - it’s especially hard to decide on an academic career. ❗️ A few days ago, I gave a lecture on this topic. “PhD: Dreams, Reality and Consequences” Watch it here: lnkd.in/dA_GhYwd (I’ll appreciate if you ‘like’ this video - you will GREATLY help it reach more students.)
Andrew Akbashev tweet media
English
41
518
2K
145.5K
Harit Vishwakarma retweetledi
Justin Bauer
Justin Bauer@realjustinbauer·
Excited to share a new paper I co-authored with the Snorkel Research team: BeTaL — Benchmark Tuning with an LLM-in-the-loop. We explore how LLMs can reason about and refine benchmarks—creating dynamic evaluations that evolve with model capabilities. 📄 arxiv.org/abs/2510.25039
Snorkel AI@SnorkelAI

Static benchmarks can’t keep up with the pace of AI progress. Our latest research introduces BeTaL—Benchmark Tuning with an LLM-in-the-loop—a framework that uses reasoning models to optimize benchmark design dynamically. ✍️ From the Snorkel Research team: @amanda_dsouza , @harit_v , @qi_zhengyang , @realjustinbauer, @pham_derek, Tom Walshe, @ArminParchami, @fredsala, and Paroma Varma

English
0
2
8
598