Chu-Cheng Lin

38 posts

Chu-Cheng Lin

Chu-Cheng Lin

@kitsing_l

#NLProc and #ML @jhuclsp. Pronouns: i/ta.

Baltimore Katılım Nisan 2020
305 Takip Edilen64 Takipçiler
Sabitlenmiş Tweet
Chu-Cheng Lin
Chu-Cheng Lin@kitsing_l·
@nouhadziri This is interesting work! Wondering if you have thought of connecting this to our previous work on limitations of LMs in general arxiv.org/abs/2010.11939 , to justify assumptions of your theoretical results (Prop. D.1 for example?)
English
0
0
4
1.6K
Nouha Dziri
Nouha Dziri@nouhadziri·
🚀📢 GPT models have blown our minds with their astonishing capabilities. But, do they truly acquire the ability to perform reasoning tasks that humans find easy to execute? NO⛔️ We investigate the limits of Transformers *empirically* and *theoretically* on compositional tasks🔥
Nouha Dziri tweet media
English
37
328
1.4K
503.6K
Chu-Cheng Lin
Chu-Cheng Lin@kitsing_l·
Even though many parameter estimation methods have been proposed, they stop working for model families that are expressive enough to parametrize our pathological EBM.
English
1
1
0
0
Chu-Cheng Lin
Chu-Cheng Lin@kitsing_l·
I'm presenting our spotlight paper "On the Uncomputability of Partition Functions in Energy-Based Sequence Models" at ICLR poster session 5 until 3:30pm EDT! #ICLR2022
Chu-Cheng Lin tweet media
English
1
3
6
0
Chu-Cheng Lin retweetledi
JHU CLSP
JHU CLSP@jhuclsp·
Capabilities of autoregressive AI models will always be limited by their inability to reason like humans, says @JHUCompSci PhD candidate and CLSP member Chu-Cheng Lin (@kitsing_l). Read more about Lin's research in the latest from the @HubJHU! hub.jhu.edu/2021/11/22/lim…
English
0
2
6
0
Jason Eisner
Jason Eisner@adveisner·
Everyone's using big autoregressive language models. But ... they predict the next word with a polysized circuit (computation graph). So they can't accurately model settings where that prediction is NP-hard. 😢
English
1
15
110
0
Chu-Cheng Lin
Chu-Cheng Lin@kitsing_l·
@jacobmbuckman @adveisner A bit more on ‘tell whether a prefix is valid’: assuming we could model this language using an AR, this AR would assign nonzero prob to a prefix iff the prefix is a TM that can halt. Does this make sense?
English
1
0
1
0
Chu-Cheng Lin
Chu-Cheng Lin@kitsing_l·
@jacobmbuckman @adveisner ...which is already undecidable, let alone the polytime requirement. On the other hand *verifying* a TM + an exec trace is doable in polytime.
English
1
0
0
0
Chu-Cheng Lin
Chu-Cheng Lin@kitsing_l·
@jacobmbuckman @adveisner In our construction in the paper, we simply don’t have non-halting-machine prefixes in our language. Having a no halt special token would probably make it impossible for an EBM to score the string in polytime.
English
1
0
0
0
Chu-Cheng Lin
Chu-Cheng Lin@kitsing_l·
@jacobmbuckman @adveisner Sorry I wasn’t clear: we can weight strings in time polynomial in string weight, regardless of their validity. This can be done in polytime for some Turing machines (you just need to simulate their execution trace over time).
English
1
0
0
0
Jacob Buckman
Jacob Buckman@jacobmbuckman·
@kitsing_l @adveisner Can you elaborate on this? I feel like there must be some catch here. What is "polynomial time" wrt to? Will it assign positive weight to *all* valid strings, or just to *only* valid strings?
English
1
0
0
0
Chu-Cheng Lin
Chu-Cheng Lin@kitsing_l·
@jacobmbuckman @adveisner (I wrote "assign positive probability to p(x'#)" a few messages ago but it was not accurate. I meant that x'# is a good prefix with positive-weight continuations. I should have written Z(x'#) )
English
0
0
0
0
Chu-Cheng Lin
Chu-Cheng Lin@kitsing_l·
@jacobmbuckman @adveisner Note that in both cases, an energy-based model can assign positive weights to (and only to) valid strings, in polynomial time.
English
2
0
0
0