Armaan

1.7K posts

Armaan banner
Armaan

Armaan

@apkinverse

alignment research @Umass | prev-Google | built @CurationsClub✨

Katılım Mayıs 2017
923 Takip Edilen608 Takipçiler
Sabitlenmiş Tweet
Armaan
Armaan@apkinverse·
after a year of procrastinating, my personal website is finaallllyyy live✨ it only took me 3 evenings after work to put it all together... should’ve done this way sooner!
Armaan tweet media
English
16
1
235
18.6K
Armaan
Armaan@apkinverse·
been reading a lot about RL and policy collapse lately and i can't stop thinking about the analogy in real life — what happens when we all optimize for the same reward function wrote some thoughts here: @sandhuapk/what-are-we-leaving-behind-for-museums-6615bc536bc7" target="_blank" rel="nofollow noopener">medium.com/@sandhuapk/wha…
English
0
0
2
63
Armaan
Armaan@apkinverse·
reading more about tokenizers has made me such an empath. no one should have to deal with my spelling crimes.
English
1
0
1
68
Armaan
Armaan@apkinverse·
@__HorizonX__ agree, i think it comes more easily with practice, but requires deliberate effort in the beginning
English
0
0
0
13
HorizonX
HorizonX@__HorizonX__·
@apkinverse "figuring out what i actually wanted from a paper" is such an underrated skill. most people sit down with a paper with no real question in mind, just hoping the paper will tell them what matters. that's where hours disappear.
English
1
0
1
21
Armaan
Armaan@apkinverse·
feb × papers wrapped - i tried reading research papers every day in feb. the goal was: compress exposure, get better at reading papers, and stay closer to what’s happening in current research. what i gained/what was hard 👇
English
3
0
2
83
Armaan
Armaan@apkinverse·
some more papers i read but didn’t post about: - Generative Adversarial Imitation Learning - Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning - Maximum Entropy Inverse Reinforcement Learning - Prompt Optimization Makes Misalignment Legible
English
0
0
0
44
Armaan
Armaan@apkinverse·
what was hard: - resisting the urge to treat it like a checklist - figuring out what I actually wanted from a paper - intuition, math, implementation, or just the big idea - accepting that some papers need multiple passes - being consistent
English
1
0
1
58
Armaan retweetledi
Lee Robinson
Lee Robinson@leerob·
Imagine being this early
Lee Robinson tweet media
English
122
512
13.5K
547.5K
Armaan
Armaan@apkinverse·
internship search update:
Armaan tweet media
English
1
0
3
151
Armaan retweetledi
sudox
sudox@kmcnam1·
sudox tweet media
ZXX
428
2.2K
26.6K
32.4M
Armaan
Armaan@apkinverse·
4/ Both need some form of human data: IRL assumes demonstrations encode the true objective. TAMER assumes feedback encodes the true objective.
English
0
0
3
37
Armaan
Armaan@apkinverse·
3/ they also differ in temporal structure. IRL preserves long-horizon reasoning: infer reward -> run RL -> optimize expected return. TAMER removes credit assignment: predict immediate approval -> act.
English
1
0
3
43
Armaan
Armaan@apkinverse·
1/ i read about TAMER and IRL this week - both try to handle reward misspecification, which can make them seem similar. but the core difference: IRL tries to recover the "goal" behind behavior. TAMER models "human approval" signals directly.
English
1
0
4
53
Armaan
Armaan@apkinverse·
6/ but inferring the “true” objective from behavior is ambiguous as multiple reward functions can explain the same behavior. also, IRL problem needs assumptions (e.g. linearity) and good features. without that, the recovered reward might just be an artifact. this makes it tricky
English
0
0
1
30
Armaan
Armaan@apkinverse·
5/ compared to behavior cloning: behavior cloning -> supervised learning (state -> action) fragile to distribution shift IRL -> learns why the expert chooses actions then plans accordingly this can avoid cascading error problem in BC.
English
1
0
1
30
Armaan
Armaan@apkinverse·
days 14 and 15 of feb × papers read about Apprenticeship Learning / Inverse Reinforcement Learning (IRL) - learning the objective from expert behavior rather than hand-crafting reward functions. what i read 👇
Armaan tweet media
English
1
0
2
72