Arijit Ray

79 posts

Arijit Ray banner
Arijit Ray

Arijit Ray

@ARRay693

AI PhD Student, w/ Profs @kate_saenko_, @RanjayKrishna | Teaching machines to help humans in the digital and physical world | Prev @Google, @AIatMeta

Cambridge, MA Katılım Kasım 2015
710 Takip Edilen188 Takipçiler
Sabitlenmiş Tweet
Arijit Ray
Arijit Ray@ARRay693·
"It is by logic that we prove, but by [abstract] intuition that we discover." - Henri Poincaré. When faced with a complex problem, we pause, we think. Not exactly in words, not exactly in images — in something more abstract, something harder to name. So, for truly intelligent agents, should we not ask that they do the same? Introducing Mull-Tokens — a modality-agnostic latent thinking paradigm. Now, the model can think in space, in time, in words, in affordances — in all the things that language alone cannot easily convey. arijitray.com/multimodal_thi…
English
1
1
7
549
Kiana Ehsani
Kiana Ehsani@ehsanik·
This is a long post, mainly because I have a lot to say, but in case you are too busy: TLDR: @Vercept_ai is joining @AnthropicAI! We shared a mission, so we joined forces to accelerate it into reality. Couldn't be more excited! Why Vercept was started In 2024, AI coding tools were already becoming magical for developers, but other industries were ages behind. It felt insane that when my mom had IT issues, I still had to hop on a call and walk her through it step by step. Insane that sending a simple email took so many clicks. That's why we started @Vercept_ai : Build something that acts for users instead of telling them how to do it. Two goals: 1) help people do tasks they didn't know how to do, and 2) handle the zero-brainpower tasks so people spend more time on creative work. As simple as scheduling meetings, as complex as reconciling messy financials before tax season. Ultimate goal was to have people spend less time behind screens and more time walking in nature. (Very Pacific Northwest mission 😁) The ride The journey of building an AI native company in this day and age was wild. Going from researcher to founder meant trading “reviewer number 2” for business partners and users, but surprisingly a lot of the same paradigms applied. Come up with a hypothesis, design an experiment, analyze user behavior, change the model and product based on the findings, wash, rinse and repeat. There are some differences though. The pace and the adrenaline. Lows are low, highs are high. We were constantly being challenged and learned at a pace we had never learned before. NEVER! If you are an adrenaline junkie like we are, it's a blast. The joy of the startup adrenaline rush is truly underrated. Why Anthropic We raised more than $50M, had a comfortable runway and a successful product, were building full steam with a small team, and were truly enjoying every minute of it. But that's when the opportunity came to join forces with Anthropic. We already knew how great Anthropic was at building models and we admired their mission, but then we learned more about the vision. We went on hours of walks, had long conversations, talked to members across different orgs, and learned more about Anthropic's vision and commitment to core beliefs which were very similar to ours. The more we talked, the more we realized we had been working on the same mission but from complementary perspectives. We realized that joining forces meant we could build something much much bigger together. And beyond the mission, I am now a big believer that Anthropic's real moat isn't its best model. It's the people. Incredibly talented folks who genuinely care about mission and real impact over hype. A zero-ego culture obsessed with building something meaningful. The choices were clear: we could build independently and work toward the same vision as two separate versions of it, or join forces with an incredible team and accelerate that vision into reality. The decision became an easy choice. What's next for our mission Mission continues, just got a bigger stage and an expanded team. The goal is still to expand AI beyond just a chatbot, to enable non-technical users to leverage it just as much as technical ones. We're just getting started. It takes a village This journey wouldn't have happened without the people who made it what it was. First and foremost, my cofounders @LucaWeihs and @inkynumbers . Best people I could've wished for as cofounders. We never once got into an argument, always had communicative discussions and as a cherry on top shared the same sense of humor! I feel blessed and grateful to have these two in my life. Thankful to our team for trusting in the three of us and showing up day and night. Grateful for @sethbannon , our board member, lead investor, great mentor and the person whose energy is so infectious that whenever we were having a down moment we would say "channel your inner @fiftyyears energy!" And to our wonderful investors and supporters: @chrija and @PointNineCap , Yifan and Jacob and @ai2incubator , and @mattmcilwain and Ted Kummert from @MadronaVentures . Couldn't have done this without you. Onward 🐜
Kiana Ehsani tweet media
English
51
16
339
51K
Arijit Ray
Arijit Ray@ARRay693·
This work would not be possible without all my amazing collaborators: Ahmed Abdelkader, Chengzhi Mao, Bryan Plummer, @kate_saenko_ , @RanjayKrishna , Leonidas Guibas, and Vincent Chu!
English
0
0
1
90
Arijit Ray
Arijit Ray@ARRay693·
As conversations continue around grounding visual & textual reasoning, we believe latent, modality-agnostic thinking could be a promising direction. The latents can be extended to anything - trajectories, 3D point-cloud features, audio! Paper, code, and models posted. Dive in and let us know what you build! 🚀
English
1
0
0
89
Arijit Ray
Arijit Ray@ARRay693·
"It is by logic that we prove, but by [abstract] intuition that we discover." - Henri Poincaré. When faced with a complex problem, we pause, we think. Not exactly in words, not exactly in images — in something more abstract, something harder to name. So, for truly intelligent agents, should we not ask that they do the same? Introducing Mull-Tokens — a modality-agnostic latent thinking paradigm. Now, the model can think in space, in time, in words, in affordances — in all the things that language alone cannot easily convey. arijitray.com/multimodal_thi…
English
1
1
7
549
Arijit Ray retweetledi
Kate Saenko
Kate Saenko@kate_saenko_·
🚀 Excited to share that my team at Meta just launched Segment Anything 3! SAM 3 doubles the performance of existing models on open-vocabulary instance segmentation on our new SA-Co benchmark, with 207K unique object labels. Huge congrats to the team, so proud of this work!
AI at Meta@AIatMeta

Today we’re excited to unveil a new generation of Segment Anything Models: 1️⃣ SAM 3 enables detecting, segmenting and tracking of objects across images and videos, now with short text phrases and exemplar prompts. 🔗 Learn more about SAM 3: go.meta.me/591040 2️⃣ SAM 3D brings the model collection into the 3rd dimension to enable precise reconstruction of 3D objects and people from a single 2D image. 🔗 Learn more about SAM 3D: go.meta.me/305985 These models offer innovative capabilities and unique tools for developers and researchers to create, experiment and uplevel media workflows.

English
4
9
91
9.1K
Arijit Ray
Arijit Ray@ARRay693·
@DJiafei Amazing to work with and has impeccable Twitter game. Hire him!
English
0
0
1
210
Jiafei Duan
Jiafei Duan@DJiafei·
I’m on the academic market this year, and is activately seeking faculty position in robot learning. My work focuses on developing efficient robotics foundation models with strong priors for reasoning and generalization. Please ping me up if there is any opportunities!
English
2
10
132
22.9K
Arijit Ray
Arijit Ray@ARRay693·
SIMS-V offers free (simulated) rich accurate video annotations for object relationships, distances, and temporal tracking—capabilities often lacking in existing video training datasets. 🎞️💫 Mix it into your data and boost your model's performance on video reasoning tasks! Code and data are open! ellisbrown.github.io/sims-v/
Ellis Brown@_ellisbrown

MLLMs are great at understanding videos, but struggle with spatial reasoning—like estimating distances or tracking objects across time. the bottleneck? getting precise 3D spatial annotations on real videos is expensive and error-prone. introducing SIMS-V 🤖 [1/n]

English
0
1
5
361
Arijit Ray retweetledi
Ellis Brown
Ellis Brown@_ellisbrown·
MLLMs are great at understanding videos, but struggle with spatial reasoning—like estimating distances or tracking objects across time. the bottleneck? getting precise 3D spatial annotations on real videos is expensive and error-prone. introducing SIMS-V 🤖 [1/n]
English
3
46
235
16.1K
Arijit Ray
Arijit Ray@ARRay693·
We live, feel, and create by perceiving the world as visual spaces unfolding through time — videos. Our memories and even our language are spatial: mind-palaces, mind-maps, "taking steps in the right direction..." Super excited to see Cambrian-S pushing this frontier! And, happy to also see simulations — building on our SIMS-V and SAT — helping bootstrap the first wave of spatial supersensing in models ⚡️ Code & data are open: 🔹 SIMS-V (Videos): ellisbrown.github.io/sims-v/ 🔹 SAT (Multi-images): arijitray.com/SAT/ Dive in, explore, and let us know what you build toward video super-intelligence!! 🚀
Saining Xie@sainingxie

Introducing Cambrian-S it’s a position, a dataset, a benchmark, and a model but above all, it represents our first steps toward exploring spatial supersensing in video. 🧶

English
1
2
8
883