Armaan

1.7K posts

Armaan

@apkinverse

alignment research @Umass | prev-Google | built @CurationsClub✨

Katılım Mayıs 2017

923 Takip Edilen608 Takipçiler

Sabitlenmiş Tweet

Armaan@apkinverse·15 Nis

after a year of procrastinating, my personal website is finaallllyyy live✨ it only took me 3 evenings after work to put it all together... should’ve done this way sooner!

English

235

18.6K

Armaan@apkinverse·6 Nis

been reading a lot about RL and policy collapse lately and i can't stop thinking about the analogy in real life — what happens when we all optimize for the same reward function wrote some thoughts here: @sandhuapk/what-are-we-leaving-behind-for-museums-6615bc536bc7" target="_blank" rel="nofollow noopener">medium.com/@sandhuapk/wha…

English

Armaan@apkinverse·7 Mar

@demonshadow007 started with this: youtu.be/zduSFxRajkE?si… and then read up more on the mentioned references here

YouTube

English

Dipak Patil@demonshadow007·6 Mar

@apkinverse From where are you reading? Any suggestions?

English

Armaan@apkinverse·6 Mar

reading more about tokenizers has made me such an empath. no one should have to deal with my spelling crimes.

English

Armaan@apkinverse·6 Mar

@__HorizonX__ agree, i think it comes more easily with practice, but requires deliberate effort in the beginning

English

HorizonX@__HorizonX__·5 Mar

@apkinverse "figuring out what i actually wanted from a paper" is such an underrated skill. most people sit down with a paper with no real question in mind, just hoping the paper will tell them what matters. that's where hours disappear.

English

Armaan@apkinverse·4 Mar

feb × papers wrapped - i tried reading research papers every day in feb. the goal was: compress exposure, get better at reading papers, and stay closer to what’s happening in current research. what i gained/what was hard 👇

English

Armaan@apkinverse·4 Mar

some more papers i read but didn’t post about: - Generative Adversarial Imitation Learning - Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning - Maximum Entropy Inverse Reinforcement Learning - Prompt Optimization Makes Misalignment Legible

English

Armaan@apkinverse·4 Mar

what was hard: - resisting the urge to treat it like a checklist - figuring out what I actually wanted from a paper - intuition, math, implementation, or just the big idea - accepting that some papers need multiple passes - being consistent

English

Armaan retweetledi

Lee Robinson@leerob·24 Şub

Imagine being this early

English

122

512

13.5K

547.5K

Armaan@apkinverse·24 Şub

@tensorcruncher @Cohere_Labs hi yes, thanks. i am already on it <3

English

Tensor Cruncher@tensorcruncher·24 Şub

@apkinverse Check out the expedition tiny aya by @Cohere_Labs starting today

English

Armaan@apkinverse·24 Şub

internship search update:

English

151

Armaan retweetledi

sudox@kmcnam1·23 Şub

ZXX

428

2.2K

26.6K

32.4M

Armaan retweetledi

mert@mert·23 Şub

silicon valley was a documentary damn it jian yang

Anthropic@AnthropicAI

We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models.

English

119

592

8.9K

596.2K

Armaan@apkinverse·23 Şub

4/ Both need some form of human data: IRL assumes demonstrations encode the true objective. TAMER assumes feedback encodes the true objective.

English

Armaan@apkinverse·23 Şub

3/ they also differ in temporal structure. IRL preserves long-horizon reasoning: infer reward -> run RL -> optimize expected return. TAMER removes credit assignment: predict immediate approval -> act.

English

Armaan@apkinverse·23 Şub

1/ i read about TAMER and IRL this week - both try to handle reward misspecification, which can make them seem similar. but the core difference: IRL tries to recover the "goal" behind behavior. TAMER models "human approval" signals directly.

English

Armaan@apkinverse·23 Şub

6/ but inferring the “true” objective from behavior is ambiguous as multiple reward functions can explain the same behavior. also, IRL problem needs assumptions (e.g. linearity) and good features. without that, the recovered reward might just be an artifact. this makes it tricky

English

Armaan@apkinverse·23 Şub

5/ compared to behavior cloning: behavior cloning -> supervised learning (state -> action) fragile to distribution shift IRL -> learns why the expert chooses actions then plans accordingly this can avoid cascading error problem in BC.

English

Armaan@apkinverse·23 Şub

days 14 and 15 of feb × papers read about Apprenticeship Learning / Inverse Reinforcement Learning (IRL) - learning the objective from expert behavior rather than hand-crafting reward functions. what i read 👇

English

Keşfet

@demonshadow007 @__HorizonX__ @tensorcruncher @Cohere_Labs @elonmusk @BarackObama @taylorswift13 @cristiano