Ankit Shah

155 posts

Ankit Shah

Ankit Shah

@ankitjs

Research Scientist at The Boston Dynamics AI Institute. Prev at Brown CS, Ph.D. from MIT. Making robots easy to program and deploy. https://t.co/MjcDjhu9pd

Cambridge, MA Katılım Ağustos 2013
288 Takip Edilen220 Takipçiler
Ankit Shah retweetledi
MIT CSAIL
MIT CSAIL@MIT_CSAIL·
A new handheld interface from MIT gives anyone the ability to train a robot for tasks in fields like manufacturing. The versatile tool can teach a robot new skills using one of three approaches: natural teaching, kinesthetic training, & teleoperation: bit.ly/4nTAw6F
MIT CSAIL tweet media
English
7
28
97
13.7K
Ankit Shah retweetledi
Experimental Philosophy
Experimental Philosophy@xphilosopher·
Yale Philosophy offers a course on “Formal Philosophical Methods” — a broad introduction to probability, logic, formal semantics, etc. Instructor Calum McNamara has now made all materials for the course (78 pages) freely available static1.squarespace.com/static/6255ffe…
Experimental Philosophy tweet media
English
14
131
590
42.2K
Ankit Shah retweetledi
Gary Marcus
Gary Marcus@GaryMarcus·
Wow! The core finding in the much-maligned Apple paper from @ParshinShojaee et al – that reasoning models generalize poorly in the face of complexity – has been conceptually replicated three times in three weeks. C. Opus sure didn’t see that coming. And a lot of people owe Ms. Shojaee an apology.
Laura Ruis@LauraRuis

@GaryMarcus @JonnyCoook @silviasapora @aahmadian_ @akbirkhan @_rockt @j_foerst @ParshinShojaee @i_mirzadeh @MFarajtabar @nouhadziri The programs we look at are quite simple and all represent novel combinations of familiar operations. We also find lower performance for more complex programs, especially for the compositions. Also, I have a sense that LLMs can handle OOD problems easier when represented in code

English
3
15
59
17.9K
Ankit Shah retweetledi
Ankit Shah retweetledi
Palash
Palash@ABiggerSpalash·
Friends, need your help. @antarikshB, a senior from IIT B has launched an incredible project of organizing all Sanskrit literature in one place, in a user-friendly manner. The service is free, not-for-profit, created purely out of passion. Media coverage will go a long way in ensuring the service reaches the right people. Could you help by RT-ing and perhaps tag the right people? (link below)
English
178
2.5K
5K
243.2K
Ankit Shah
Ankit Shah@ankitjs·
@sama Marginal improvements for exponential cost. Welcome to the world of computational complexity. Plenty of research directions abandoned because they would never scale
English
0
0
0
58
Sam Altman
Sam Altman@sama·
seemingly somewhat lost in the noise of today: on many coding tasks, o3-mini will outperform o1 at a massive cost reduction! i expect this trend to continue, but also that the ability to get marginally more performance for exponentially more money will be really strange.
English
547
615
12K
1.2M
Ankit Shah
Ankit Shah@ankitjs·
@deedydas I see in a later tweet you mention that that trends hold up over the years, so it would be interesting if you could try and set up null hypothesis test
English
0
0
0
39
Ankit Shah
Ankit Shah@ankitjs·
@deedydas In expectation, the country should not change the odds, but a lottery is a single sample from what should be a Dirichlet distribution. What you have posted are raw numbers, what I am missing is... Is this sample truly unlikely under the null (a truly random lottery)?
English
1
0
1
390
Deedy
Deedy@deedydas·
HUGE Immigration News! We have the first EVER look at H-1B lottery data. Did you also suspect the lottery wasn't truly random? They're not. Certain companies like Tiktok and Bytedance have 50% higher odds than average. I broke it down by nationality, company and age... 1/5
Deedy tweet mediaDeedy tweet media
English
127
280
2.3K
660.6K
Ankit Shah
Ankit Shah@ankitjs·
@chris_j_paxton Lack of novelty manifests in three ways: 1) reinventing things we know failed 2) things tabled to be looked at later given better hardware/manufacturing or compute 3) multiple parallel efforts 3) promising efforts that died because it just wasn't the time Optimus is first three?
English
0
0
0
97
Chris Paxton
Chris Paxton@chris_j_paxton·
Lot of roboticists get too hung up on the lack of novelty in humanoid efforts like Tesla. ChatGPT was nothing new either
English
3
1
33
2.6K
Ankit Shah
Ankit Shah@ankitjs·
@chris_j_paxton Agree with you by and large, but with the enshittification of products (ads everywhere, anandoned great products) the early (but unsustainable) preview felt better than what was delivered
English
1
0
1
120
Ankit Shah
Ankit Shah@ankitjs·
@chris_j_paxton I think that's what I was trying to get at with "where else does it work" systems based approaches and e2e generalize qualitatively differently. My research bet is along the lines of systems-based backbone and learned features.
English
1
0
1
295
Chris Paxton
Chris Paxton@chris_j_paxton·
@ankitjs It needs to scale. The reason people get excited about e2e is that current robotics mostly does not - you're investing a substantial amount of human effort in every solution
English
1
0
26
1.2K
Chris Paxton
Chris Paxton@chris_j_paxton·
As usual Boston Dynamics blows everyone else away when it comes to autonomous demos. Using a mixture of detection models and specialized grasping policies; no end-to-end stuff. From BD: - The robot receives as input a list of bin locations to move parts between. - Atlas uses a machine learning vision model to detect and localize the environment fixtures and individual bins [0:36]. - The robot uses a specialized grasping policy and continuously estimates the state of manipulated objects to achieve the task. There are no prescribed or teleoperated movements; all motions are generated autonomously online. - The robot is able to detect and react to changes in the environment (e.g., moving fixtures) and action failures (e.g., failure to insert the cover, tripping, environment collisions [1:24]) using a combination of vision, force, and proprioceptive sensors.
English
22
66
479
79.2K
Ankit Shah retweetledi
Chris Paxton
Chris Paxton@chris_j_paxton·
I like how they use the infinite rotation to make planning easier. Taking advantage of how your humanoid doesn't need to be human
English
3
6
67
4.1K
Ankit Shah retweetledi
Jason Liu @HRI
Jason Liu @HRI@jasonxyliu·
How can robots understand spatiotemporal language in novel environments without retraining? 🗣️🤖 In our #IROS2024 paper, we present a modular system that uses LLMs and a VLM to ground spatiotemporal navigation commands in unseen environments described by multimodal semantic maps
English
1
11
22
3.6K
Ankit Shah retweetledi
Kenneth Stanley
Kenneth Stanley@kenneth0stanley·
Recent results like Apple’s show that LLMs (even o1) flub on reasoning with simple changes to problems that shouldn’t matter. A consensus is building that it shows they are “just pattern matching.” But that metaphor is misleading: good reasoning itself can also be framed as “just pattern matching” at each step. The issue is not that we are merely seeing pattern matching, but that we are seeing *bad* pattern matching, at the wrong level of abstraction. If you think about it, that is a more serious pathology because it doesn’t separate when it works vs. when it doesn’t work into conveniently distinct buckets of computational tasks, In a sense, calling it “just pattern matching” implies an easier fix than there really is, as if all it will take is a better o1.
English
28
24
210
36.7K
Ankit Shah retweetledi
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
My (pure) speculation about what OpenAI o1 might be doing [Caveat: I don't know anything more about the internal workings of o1 than the handful of lines about what they are actually doing in that blog post--and on the face of it, it is not more informative than "It uses Python er.. RL".. But here is what I told my students as one possible way it might be working] There are two things--RL and "Private CoT" that are mentioned in the writeup. So imagine you are trying to transplant a "generalized AlphaGo"--let's call it GPTGo--onto the underlying LLM token prediction substate. To do this, you need to know (1) What are the GPTGo moves? For AlphaGo, we had GO moves). What would be the right moves when the task is just "expand the prompt".. ? (2) Where is it getting its external success/failure signal from? for AlphaGo, we had simulators/verifiers giving the success/failure signal. The most interesting question in glomming the Self-play idea for general AI agent is where is it getting this signal? (See e.g. x.com/rao2z/status/1… ) My guess is that the moves are auto-generated CoTs (thus the moves have very high branching factor). Let's assume--for simplification--that we have a CoT-generating LLM, that generates these CoTs conditioned on the prompt. The success signal is from training data with correct answers. When the expanded prompt seems to contain the correct answer (presumably LLM-judged?), then it is success. If not failure. The RL task is: Given the original problem prompt, generate and select a CoT, and use it to continue to extend the prompt (possibly generating subgoal CoTs after every few stages). Get the final success/failure signal for the example (for which you do have answer). Loop on a gazillion training examples with answers, and multiple times per example. [The training examples with answers can either be coming from benchmarks, or from synthetic data with problems and their solutions--using external solvers; see x.com/rao2z/status/1…] Let RL do its thing to figure out credit-blame assignment for the CoTs that were used in that example. Incorporate this RL backup signal into the CoT genertor's weights (?). During inference, stage, you can basically do rollouts (a la the original AlphaGo) to further improve the effectiveness of the moves ("internal CoT's"). The higher the roll out, the longer the time. My guess is that what O1 is printing as a summary is just a summary of the "winning path" (according to it)--rather than the full roll out tree. === Assuming I am on the right path here in guessing what o1 is doing, a couple corollaries: 1. This can at least be better than just fine tuning on the synthetic data (again see x.com/rao2z/status/1…)--we are getting more leverage out of the data by learning move (auto CoT) generators. [Think behavior cloning vs. RL..] 2. There will not still be any guarantees that the answers provided are "correct"--they may be probabilistically a little more correct (subject to the training data). If you want guarantees, you still will need some sort of LLM-Modulo approach even on top of this (c.f. arxiv.org/abs/2402.01817). 3. It is certainly not clear that anyone will be willing to really wait for long periods of time during inference (it is already painful to wait for 10 sec for a 10 word last letter concatenation!). See x.com/rao2z/status/1… The kind of people who will wait for longer periods would certainly want guarantees--and there are deep and narrow System 2's a plenty that can be used for many such cases. 4. There is a bit of a Ship of Theseus feel to calling o1 an LLM--considering how far it is from the other LLM models (all of which essentially have teacher-forced training and sub-real-time next token prediction. That said, this is certainly an interesting way to build a generalized system 2'ish component on top of LLM substrates--but without guarantees. I think we will need to understand how this would combine with other efforts to get System 2 behavior--including LLM-Modulo (arxiv.org/abs/2402.01817) that give guarantees for specific classes. to be contd..
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) tweet media
English
24
112
564
147K
Ankit Shah retweetledi
bigAI
bigAI@BrownBigAI·
@jasonxyliu will present their @IJCAIconf survey paper on robotic language grounding. Please check out his talk (8/8 11:30) if you are at #IJCAI2024 In colab w/ @VanyaCohen, Raymond Mooney from @UTAustin, @StefanieTellex from @BrownCSDept, @drdavidjwatkins from The AI Institute
Jason Liu @HRI@jasonxyliu

How do robots understand natural language? #IJCAI2024 survey paper on robotic language grounding We situated papers into a spectrum w/ two poles, grounding language to symbols and high-dimensional embeddings. We discussed tradeoffs, open problems & exciting future directions!

English
0
3
10
990
Ankit Shah
Ankit Shah@ankitjs·
@kenneth0stanley @khademinori To expand, to me reasoning involves grounding a linguistic concept reliably to a reusable primitive, either a computational one (such as addition, multiplication etc.) or a physical one (pick up an object). LLMs seem to fail at following recipes they generate
English
0
0
1
55
Ankit Shah
Ankit Shah@ankitjs·
@kenneth0stanley @khademinori To rephrase: It can generate an accurate textual description of how to compute a fourier series yet fail to compute it correctly?
English
1
0
1
69
Kenneth Stanley
Kenneth Stanley@kenneth0stanley·
I'm curious, if you think "LLMs can't reason," what do you precisely mean by that and why do you think that is? In particular I'm curious about insights beyond "they are just doing statistics" or "they just make predictions." What beyond that do you think supports the claim? Note that I'm not trying to imply this statement is wrong or right, just curious what it reveals when people dig into the issue more deeply.
English
232
15
294
135.3K