Aditya Bansal

7 posts

Aditya Bansal

@aditya5109

Options Trader at Da Vinci. Love all things tech and entrepreneurship. EE @iitdelhi

Mumbai, India Katılım Şubat 2010

271 Takip Edilen18 Takipçiler

Aditya Bansal@aditya5109·1d

@aditjain1980 How is the data structured for CUA tasks? Is it using GPT 5.4's steps and feeding that in?

English

Adit Jain@aditjain1980·1d

We're hillclimbing open source models on real-world computer use tasks - come join us 🚀

Collinear AI@CollinearAI

We discovered significant gaps between open and closed sourced models on our realistic computer-use-agent tasks, and it is a data problem. Although open models have nearly saturated OSWorld, we found that kimi k2.6 cannot do tasks that GPT-5.4 solves in 50 steps. Our 30 tasks are realistic: the agent works with an open source version of Office Suit in an linux OS, and compiles excel sheets. GPT-5.4-high solves 2/3 in 25 steps, and 1/3 in 50 steps. Kimi k2.6, the strongest open model on OSWorld, fails almost all of them. We understand the problem to be very simple: open models simply are not trained on realistic CUA data enough. To test this hypothesis, we simply RL-ed Kimi K2.6 on 10 in-domain CUA office tasks with LoRA. The result of the simplistic RL is a significant increase of +30% in the capacity to do office tasks. However, the improvement gracefully carries over to OSWorld itself: on a stratified subset of 30 tasks, the RL-ed model sees another +10% lift. The takeaway from our initial results is that CUA models suffer from unrealistic, low-quality data. As a result, we are continually building realistic apps / RL environments to bridge the gap. More to come. Solid work done by @alckasoc

English

965

Aditya Bansal@aditya5109·20 Kas

@__drishtea @aditjain1980 Helps with them knowing there name is attached to it and hence adding accountability. But lot of room for politicisation of the whole process then. Probably best to have a rating system on the reviews itself and then committee reviews the bottom quartile of the reviewers?

English

Drishti Chouhan@__drishtea·20 Kas

@aditjain1980 I am curious what you'd do with the identities if you did get them

English

Adit Jain@aditjain1980·20 Kas

Unpopular opinion: Identities of reviewers should be revealed after the decision has been made after double-blind review. Agreed there might be more politics than currently but would at least prevent them from posting LLM generated/low-effort reviews in the first place. Thoughts?

English

154

Aditya Bansal@aditya5109·24 Eki

@shrutim29045 @amazonIN @AmazonHelp @AmitAgarwal @ajassy @dougherrington @JeffBezos Supposed to be the most customer-friendly company and this is how they treat loyal customers.

English

197

Shruti Mittal@shrutim29045·24 Eki

@amazonIN I ordered a MacBook Air M3 for ₹1,04,990 and got an EMPTY BOX! Despite video proof, @AmazonHelp says I’m falsely claiming. @AmitAgarwal @ajassy @DougHerrington @JeffBezos, fix this or I’ll escalate! #AmazonScam #CustomerServiceFail

English

1.5K

Aditya Bansal@aditya5109·2 Mar

@chiruchat Time for me to have a rematch with him? :)

English

Aditya Bansal retweetledi

TOI Ahmedabad@TOIAhmedabad·31 Tem

Ahmedabad: If opened, 10 can infect 78% on campus in 10 weeks toi.in/Lir44b

English

Aditya Bansal retweetledi

HMPI_Journal@HMPI_Journal·29 Tem

A simulation study by @chiruchat & Aditya Bansal shows that if colleges in India hold in-person classes, #COVID19 could sweep campuses within 10 weeks: bit.ly/3f8qcTO @IIMAhmedabad @ICICIBank @HooverInst @stanford @IIT_Indiaa

English

Aditya Bansal@aditya5109·9 Haz

@chiruchat There's a platform named mfine (mfine.co) providing online doctor consultation and at-home lab tests.

English

Keşfet

@aditjain1980 @__drishtea @shrutim29045 @amazonIN @AmazonHelp @AmitAgarwal @ajassy @dougherrington