kstechly

12 posts

kstechly banner
kstechly

kstechly

@kayastechly

Linguistics M.A. at ASU working in the Yochan lab. Starting a Comp Sci PhD at Yale advised by Tom McCoy and Tyler Brooke-Wilson in Fall 2025.

Tempe, Arizona Katılım Eylül 2023
97 Takip Edilen194 Takipçiler
kstechly retweetledi
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
PSA for #ICLR2025 authors frantically making posters: Stop worrying! Just prompt o3 and GPT-4o and you will have an AGI-ready poster in seconds! Here is one @kayastechly and @karthikv792 cooked up--and it looks fully legit from poster distance!
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) tweet media
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

I will be at #ICLR2025 and look forward to chatting about LRMs/reasoning/planning etc. Easiest way to run into me would likely be at our poster 👇 on Thursday (10AM poster #219). DM/Whova if you would like to meet. (Fwiw, this is my first time attending ICLR.. 😳)

English
7
11
79
22.7K
kstechly retweetledi
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
📢Delighted to share that our analysis of the planning abilities of 🍓 o1 has now been accepted to #TMLR @TmlrOrg . Joint work with @karthikv792 @kayastechly and @21stwarlock. Final version, including DeepSeek R1 results, appearing soon..
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

All this and more is discussed in our latest report on Planning in 🍓 Fields (👉arxiv.org/abs/2410.02162). This report also extends the evaluation of o1 models to scheduling benchmarks (since much of what goes under "planning" in LLM benchmarks--such as Travel Planning, Trip Planning, Meeting Planning etc. are really scheduling problems reducible to canonical CSP instances, rather than the more general planning problems reducible to graph search). (It also includes our speculations on o1's internal operations as an added bonus appendix.. 😋 x.com/rao2z/status/1…) (Work lead by @karthikv792 & @kayastechly -- with help from @21stwarlock) 2/

English
1
3
22
3.8K
kstechly retweetledi
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
📢"On the Self-Verification Limitations of LLMs in Reasoning and Planning Tasks" arxiv.org/abs/2402.08115 with @kayastechly and @karthikv792 apparently made it to #ICLR2025.. Swimming🏊 to Singapore.. 😎 [The 🧵s below give the details.. ]
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

📢 "On the self-verification limitations of LLMs in Reasoning and Planning Tasks" arxiv.org/abs/2402.08115 (lead by @karthikv792 and @kayastechly) 👇 Investigates LLM self-verification in three formal benchmarks--Game of 24, Graph Coloring and Planning, and shows that accuracy consistently degrades when LLMs self-verify, but improves when external verifiers are used in a backprompting architecture. This combines and extends our studies from last October--arxiv.org/abs/2310.12397 and arxiv.org/abs/2310.08118… at #FMDM workshop at #NeurIPS2023). 1/

English
1
6
24
3K
kstechly retweetledi
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
👉 Our #NeurIPS2024 paper on Chain of Thoughtlessness @ 11AM poster session today (East Hall #3010, 11AM-2pm...). All three of us are here and looking forward to chat/answer qns.. 🙏
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) tweet media
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

So @karthikv792, @kayastechly, @SaldytLucas and I will be at #NeurIPS2024 starting Tuesday--to present a bunch of things👇. Easiest to catch us at 11th 11AM poster session at our "Chain of thoughtlessness" poster (East hall #3010). (I am around 10th-13th--and am happy to chat about anything #AI including "Planning: Will they? Won't they?"😋) 👉docs.google.com/document/d/1Mo…

English
0
4
35
11.9K
kstechly retweetledi
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
So @karthikv792, @kayastechly, @SaldytLucas and I will be at #NeurIPS2024 starting Tuesday--to present a bunch of things👇. Easiest to catch us at 11th 11AM poster session at our "Chain of thoughtlessness" poster (East hall #3010). (I am around 10th-13th--and am happy to chat about anything #AI including "Planning: Will they? Won't they?"😋) 👉docs.google.com/document/d/1Mo…
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) tweet media
English
1
6
36
8.7K
kstechly retweetledi
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
Woo hoo.. Chain of Thoughtlessness paper will be showing up at #NeurIPS2024 🤗 Congrats to @karthikv792 @kayastechly [details below 👇]
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

📢Thanks to @karthikv792 and @kayastechly's tireless efforts, here is the paper analyzing the (in)effectiveness of Chain of Thought prompting. The good news is that everything I said here and in my talks about CoT delusions still holds. The better news is that Karthik and Kaya have done more extensive experiments both with GPT4 and Claude 3 Opus. 👉 arxiv.org/abs/2405.04776 tldr; LLMs may well be smarter than that dog in the Farside cartoon (although I am sure @ylecun will pushback vociferously😋), but there is little reason to believe that we can advise them the way we advise our friends--and expect them to operationalize that advise..

English
3
12
89
10.8K
kstechly retweetledi
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
A research note describing our evaluation of the planning capabilities of o1 🍓 is now on @arxiv arxiv.org/abs/2409.13373 (thanks to @karthikv792 & @kayastechly). As promised, here is a summary (..although you should read the whole thing..) 🧵 1/
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) tweet media
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

So @karthikv792 and @kayastechly stayed up until wee hours and submitted this 18-page note to arXiv; will summarize once it is made public..

English
16
116
669
284.4K
kstechly retweetledi
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
📢 "On the self-verification limitations of LLMs in Reasoning and Planning Tasks" arxiv.org/abs/2402.08115 (lead by @karthikv792 and @kayastechly) 👇 Investigates LLM self-verification in three formal benchmarks--Game of 24, Graph Coloring and Planning, and shows that accuracy consistently degrades when LLMs self-verify, but improves when external verifiers are used in a backprompting architecture. This combines and extends our studies from last October--arxiv.org/abs/2310.12397 and arxiv.org/abs/2310.08118… at #FMDM workshop at #NeurIPS2023). 1/
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) tweet mediaSubbarao Kambhampati (కంభంపాటి సుబ్బారావు) tweet mediaSubbarao Kambhampati (కంభంపాటి సుబ్బారావు) tweet media
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

Can LLMs really self-critique (and iteratively improve) their solutions, as claimed in the literature?🤔 Two new papers from our group investigate (and call into question) these claims in reasoning (arxiv.org/abs/2310.12397) and planning (arxiv.org/abs/2310.08118) tasks.🧵 1/

English
5
34
108
61.8K
kstechly retweetledi