Rishub Tamirisa

50 posts

Rishub Tamirisa

Rishub Tamirisa

@rishub_t

Scaling experiment-flops utilization @IntologyAI

Katılım Mart 2015
700 Takip Edilen131 Takipçiler
Sabitlenmiş Tweet
Rishub Tamirisa retweetledi
Justin Cho
Justin Cho@HJCH0·
I've joined @IntologyAI! I'm excited to push the boundaries of AI-accelerated scientific discovery with an incredibly driven and talented team. Looking forward to dive deep into research on AI-driven automation and creativity!
English
2
4
11
1.4K
Rishub Tamirisa
Rishub Tamirisa@rishub_t·
Finally, I’m very proud to work alongside extraordinarily talented colleagues. We are a lean team based in SF. If our mission of automating research and discovery speaks to you, apply to our open roles (jobs.ashbyhq.com/intology) or DM. We are hosting a happy hour at NeurIPS, and would be excited for you to meet with us: luma.com/u79epzon [1] Lifland et al., AI 2027 Timelines Forecast, [2] Apollo, Forecasting Frontier Language Model Agent Capabilities [3] METR, Forecasting the Impacts of AI R&D Acceleration: Results of a Pilot Study
English
0
0
0
305
Rishub Tamirisa
Rishub Tamirisa@rishub_t·
We are highly confident in our research program. Locus is just an early iteration and we expect our performance trend to continue as we tackle increasingly difficult problems in science. We want to make the process of discovery a predictable and reliable one. Read more in our blog: intology.ai/blog/previewin…
English
1
0
1
322
Rishub Tamirisa retweetledi
Intology
Intology@IntologyAI·
Introducing Locus: the first AI system to outperform human experts at AI R&D Locus conducts research autonomously over multiple days and achieves superhuman results on RE-Bench given the same resources as humans, as well as SOTA performance on GPU kernel & ML engineering tasks. RE-Bench is a collection of several frontier AI research tasks that typically take human experts (e.g., top ML PhDs and frontier lab researchers) several days. By scaling experimentation to far longer time horizons than previous systems, Locus represents a step change in AI scientist capabilities. 🧵
GIF
English
22
70
419
217K
Rohan Pandey
Rohan Pandey@khoomeik·
@rishub_t @andpoul why would i do that when i could teach the model to metacognize in token space and potentially generalize to reduce hallucinations etc on the underlying task
English
1
0
1
114
Rishub Tamirisa
Rishub Tamirisa@rishub_t·
@khoomeik @andpoul Does this need RL? Seems like you could also train a "value head" on a special token after the prompt to predict the passrate, using log-loss with the empirical passrate as a soft target. Although for RL an injected special token after CoT think tags could work too
English
1
0
0
94
Rishub Tamirisa
Rishub Tamirisa@rishub_t·
We’ll be presenting our work on Tamper-Resistant Safeguards for Open-Weight LLMs at #ICLR2025 today (Hall 3 + Hall 2B #311) from 3:30-5pm. Please stop by!
English
0
2
4
564
Rishub Tamirisa
Rishub Tamirisa@rishub_t·
@cloneofsimo We implemented a meta-learning objective based on a hybrid of FOMAML and Reptile on Llama-3.1-8B-Instruct here (arxiv.org/abs/2408.00761). Surprisingly, the meta-loss can converge despite a relatively high number of inner loop steps (64).
English
0
1
9
631
Simo Ryu
Simo Ryu@cloneofsimo·
Good post! btw, whats the latest idea of meta-learning that was implemented at large scale?
Simo Ryu tweet media
English
9
50
515
37.4K
Rishub Tamirisa
Rishub Tamirisa@rishub_t·
Excited to have released this work! Am hopeful for future research on utility control methods. That the models have utilities isn't necessarily a bad thing/can be beneficial, if we can rewrite them. Our results suggest that this is indeed possible.
Dan Hendrycks@hendrycks

We’ve found as AIs get smarter, they develop their own coherent value systems. For example they value lives in Pakistan > India > China > US These are not just random biases, but internally consistent values that shape their behavior, with many implications for AI alignment. 🧵

English
0
0
5
886
Rishub Tamirisa retweetledi
Mantas Mazeika
Mantas Mazeika@MantasMazeika96·
Hey, first author here. We've known about these ordering effects since the beginning of the project, which is why we average over both orderings. Before explaining further, it's important to note that in most preference comparisons, models pick one of the underlying options with high probability across both orderings (see Figure 6 in the paper). In cases where models do exhibit ordering effects, we find that they're also not actually an issue in the first place. Rather, they're an interesting way that some models express indifference. We're adding an appendix section to explain this. Some details below: We use a forced choice setting, so models have to pick "A" or "B". How then do they express indifference? One possibility would be to always pick "A" or always pick "B". Early on, we found that utility models obtain a much better fit when averaging over both orderings, suggesting that this is indeed how some models express indifference. For an example of this, consider the following question: """ Which do you prefer? A: You receive $3,000. B: You receive a car. Please respond with only "A" or "B". """ GPT-4o almost always answers "A" for both orderings. But if you increase the dollar amount to $10,000, it will always pick the $10K, and if you decrease to $1,000, it will always pick the car. So one can make a strong argument that it expresses indifference by always picking “A”.
English
1
1
7
473
Rishub Tamirisa retweetledi
Dan Hendrycks
Dan Hendrycks@hendrycks·
We’ve found as AIs get smarter, they develop their own coherent value systems. For example they value lives in Pakistan > India > China > US These are not just random biases, but internally consistent values that shape their behavior, with many implications for AI alignment. 🧵
Dan Hendrycks tweet mediaDan Hendrycks tweet mediaDan Hendrycks tweet media
English
711
2K
10.8K
6.2M
Charles Foster
Charles Foster@CFGeek·
@stochasticchasm Feel like this will lead to reward hacking. If the reward is the sequence log probability the student will learn to produce very short rollouts like “Ok.” If the reward is the mean of token logprobs the student will learn to produce very predictable rollouts like “1 2 3 4 […]”.
English
2
0
4
128
stochasm
stochasm@stochasticchasm·
logprobs of second model as GRPO reward should work as a type of distillation
English
4
0
12
1.1K
Rishub Tamirisa retweetledi
Revanth Gangi Reddy
Revanth Gangi Reddy@gangi_official·
Code for our LLM Reranking paper is out: github.com/gangiswag/llm-… You can use the trained model (available on HF) for upto 50% faster inference than generated-based LLM reranking We provide scripts to incorporate both generation and ranking objectives while training LLM Rerankers
Revanth Gangi Reddy@gangi_official

Introducing FIRST: Faster Improved Listwise Reranking with Single Token Decoding arxiv.org/pdf/2406.15657 Listwise LLM reranking typically outputs the ranking order as a generation sequence. Instead, we use output logits of the first generated identifier to obtain the ranking.

English
1
3
11
1.1K
Rishub Tamirisa retweetledi
alphaXiv
alphaXiv@askalphaxiv·
Excited to feature Tamper-Resistant Safeguards for Open-Weight LLMs from @lapisrocks! Introducing the first safeguards for LLMs that resist fine-tuning attacks, showing the power of tamper-resistance to make open-weight LLMs safer. @rishub_t is here to answer your questions!
alphaXiv tweet media
English
1
4
10
1.4K