Sabitlenmiş Tweet
Laura Ruis
1.3K posts

Laura Ruis
@LauraRuis
Postdoc with @jacobandreas @MIT_CSAIL. PhD from @ucl_dark with @_rockt and @egrefen. Anon feedback: https://t.co/sbebAl53tU
London Katılım Ekim 2019
802 Takip Edilen6.9K Takipçiler

@AdriGarriga I used to appreciate reviewing but this icml was rough
English

@LauraRuis @tallinzen ... and another thing are possible explanations for that phenomenon, which I am now excited to read more about as they do potentially seem new and interesting :)
English

@glnmario @tallinzen Something being covered prior doesn’t mean it can’t be useful to refine. Oocr has led to many novel findings even though reasoning outside of the context existed before. I don’t think it costs us much to make new definitions even if related concepts existed before
English

@LauraRuis @tallinzen That seems like a type error then - one thing is the phenomenon, which is what the definition seems to be about: "when an LLM reaches a conclusion that requires non-trivial reasoning but the reasoning is not present in the context window". This I'd argue is already covered...
English

@glnmario @tallinzen I’m just not sure if multi hop reasoning really covers the concept here, it’s fair to want attributions to the concept that existed but oocr feels like a much more general phenomenon of learning during training that has emerged with better llms
English

@LauraRuis @tallinzen I guess I have an allergy for "we have discovered/introduced X, which is Y", where X is a misleading way to call Y, and Y is something that has been known for 5-10 years.
One could just say "we're studying Y and we learned many new interesting things"
English

got it! I haven't read all of the many references from Owain's group in that website, certainly sounds like there are cool empirical findings in this line of work! but this is still framed in an odd way ("I have discovered that LLMs can correctly answer multi-hop questions even when the hops are not provided in context")
English

@glnmario @tallinzen I don’t know if it really matters whether the concept is new, maybe neel phrasing it as a discovery is too strong but I think it’s pretty clear that it’s redefinition or refining in oocr has helped understanding llms
English

@tallinzen @LauraRuis To be clear, I’m not saying that it wouldn’t be impressive if models did multi-hop reasoning in the forward pass accurately, for non trivial numbers of hops, and in a way that generalises. But surely the concept isn’t new?
English

@tallinzen @glnmario Or when a function or high-level strategy can be induced from many separate examples, etc. each of these had some kind of name before (eg program induction) but redefining the broader phenomena has illuminated/predicts a bunch of generalizations llms make
English

@tallinzen @glnmario Even if one is a kind of the other (that is multi-hop of OOCR), oocr lit showed its emergence in LLMs in examples beyond a -> b and b -> c therefore a -> c, like when the b’s are only implicitly related, or when the reasoning pattern is described instead of demonstrated
English

@tallinzen @glnmario OOCR is the kind of thing that seems obvious because it’s so natural but the extent to which owains definition of it has demonstrated surprising generalizations (as well as its limitations wrt in context reasoning) far beyond 2-hop reasoning shows its usefulness
English

@glnmario I think in this community there's a lot of alpha for coming up with a new term for an obvious thing, or an unusually scary term for a not-actually-scary thing
English

@LauraRuis Ahah I saw codex just write a python snippet and call shutil.rmtree
English

when you blocklist Bash(rm) like a responsible adult but Claude calls subprocess.run(["rm", "-rf", "/"])
English

@LauraRuis parenting _is_ hard :)
x.com/tkukurin/statu…
Toni Kukurin@tkukurin
@deliprao @AdityaPonnada as it should. ironically, "optimism in the face of uncertainty" _is_ the optimal policy. human baby explores and falls off a couch. agent baby explores and drops CouchDB. both learned the proud parent keeps them safe. :)
English

@LauraRuis Cursor once used "find" to remove *.o *.cpp and then complain about missing code
English
Laura Ruis retweetledi

@NikolasGoebel @MinqiJiang until we give that away as well 👀 i remember a time when we said we were gonna sandbox the ai, that didnt last long
English

@LauraRuis @MinqiJiang Agreed. A lot of work is only "better" or "worse" because a human judges it so, based on their value system. While that is the case, there is always an opportunity for humans to better leverage and align the available raw intelligence.
We're not off the hook yet, Laura!
English

@LauraRuis I'd bet it will stay true so long as humans continue to make and be responsible for deciding what matters at the high level. In the alternative, we would have lost the plot.
English
Laura Ruis retweetledi





