Chu-Cheng Lin

38 posts

Chu-Cheng Lin

@kitsing_l

#NLProc and #ML @jhuclsp. Pronouns: i/ta.

Baltimore Katılım Nisan 2020

305 Takip Edilen64 Takipçiler

Sabitlenmiş Tweet

Chu-Cheng Lin@kitsing_l·5 Haz

Our work "Limitations of Autoregressive Models and Their Alternatives" to appear at #NAACL2021 ! w/ @AaronJaech , Xin Li, Matt Gormley, @adveisner . arxiv.org/abs/2010.11939 Session 14D (5PM PDT Wed Jun 9 2021)

Jason Eisner@adveisner

Everyone's using big autoregressive language models. But ... they predict the next word with a polysized circuit (computation graph). So they can't accurately model settings where that prediction is NP-hard. 😢

English

Chu-Cheng Lin@kitsing_l·2 Haz

@nouhadziri This is interesting work! Wondering if you have thought of connecting this to our previous work on limitations of LMs in general arxiv.org/abs/2010.11939 , to justify assumptions of your theoretical results (Prop. D.1 for example?)

English

1.6K

Nouha Dziri@nouhadziri·31 May

🚀📢 GPT models have blown our minds with their astonishing capabilities. But, do they truly acquire the ability to perform reasoning tasks that humans find easy to execute? NO⛔️ We investigate the limits of Transformers *empirically* and *theoretically* on compositional tasks🔥

English

328

1.4K

503.6K

Chu-Cheng Lin@kitsing_l·26 Nis

Paper link: openreview.net/forum?id=SsPCt… Joint work with @aryamccarthy

English

Chu-Cheng Lin@kitsing_l·26 Nis

Even though many parameter estimation methods have been proposed, they stop working for model families that are expressive enough to parametrize our pathological EBM.

English

Chu-Cheng Lin@kitsing_l·26 Nis

I'm presenting our spotlight paper "On the Uncomputability of Partition Functions in Energy-Based Sequence Models" at ICLR poster session 5 until 3:30pm EDT! #ICLR2022

English

Chu-Cheng Lin retweetledi

JHU CLSP@jhuclsp·22 Kas

Capabilities of autoregressive AI models will always be limited by their inability to reason like humans, says @JHUCompSci PhD candidate and CLSP member Chu-Cheng Lin (@kitsing_l). Read more about Lin's research in the latest from the @HubJHU! hub.jhu.edu/2021/11/22/lim…

English

Chu-Cheng Lin retweetledi

Austin Blodgett@austinblodgett5·14 Haz

Take a look at our ACL camera-ready presenting our structurally-comprehensive AMR-to-text alignments. @complingy arxiv.org/abs/2106.06002

English

Chu-Cheng Lin@kitsing_l·11 Haz

@jacobmbuckman @adveisner Sure! Let me know if you have any other questions / comments 😀

English

Jacob Buckman@jacobmbuckman·10 Haz

@kitsing_l @adveisner Ah, it is clear now! Thanks Chu-Cheng!

English

Jason Eisner@adveisner·5 Haz

English

110

Chu-Cheng Lin@kitsing_l·10 Haz

@jacobmbuckman @adveisner A bit more on ‘tell whether a prefix is valid’: assuming we could model this language using an AR, this AR would assign nonzero prob to a prefix iff the prefix is a TM that can halt. Does this make sense?

English

Chu-Cheng Lin@kitsing_l·10 Haz

@jacobmbuckman @adveisner ...which is already undecidable, let alone the polytime requirement. On the other hand *verifying* a TM + an exec trace is doable in polytime.

English

Chu-Cheng Lin@kitsing_l·10 Haz

@jacobmbuckman @adveisner (I was referring to our proof of thm 5 in the paper: arxiv.org/pdf/2010.11939… )

English

Chu-Cheng Lin@kitsing_l·10 Haz

@jacobmbuckman @adveisner In our construction in the paper, we simply don’t have non-halting-machine prefixes in our language. Having a no halt special token would probably make it impossible for an EBM to score the string in polytime.

English

Chu-Cheng Lin@kitsing_l·10 Haz

@jacobmbuckman @adveisner Oh, sorry 😞 I had a typo above. Meant to write ‘ time polynomial in string *length*’

English

Chu-Cheng Lin@kitsing_l·10 Haz

@jacobmbuckman @adveisner By weights I meant unnormalized probabilities (in the context of EBMs)

English

Chu-Cheng Lin@kitsing_l·10 Haz

@jacobmbuckman @adveisner Sorry I wasn’t clear: we can weight strings in time polynomial in string weight, regardless of their validity. This can be done in polytime for some Turing machines (you just need to simulate their execution trace over time).

English

Jacob Buckman@jacobmbuckman·10 Haz

@kitsing_l @adveisner Can you elaborate on this? I feel like there must be some catch here. What is "polynomial time" wrt to? Will it assign positive weight to *all* valid strings, or just to *only* valid strings?

English

Chu-Cheng Lin@kitsing_l·10 Haz

@jacobmbuckman @adveisner (I wrote "assign positive probability to p(x'#)" a few messages ago but it was not accurate. I meant that x'# is a good prefix with positive-weight continuations. I should have written Z(x'#) )

English

Chu-Cheng Lin@kitsing_l·10 Haz

@jacobmbuckman @adveisner Note that in both cases, an energy-based model can assign positive weights to (and only to) valid strings, in polynomial time.

English

Keşfet

@nouhadziri @aryamccarthy @JHUCompSci @HubJHU @complingy @jacobmbuckman @adveisner @elonmusk