Navjot

37 posts

Navjot banner
Navjot

Navjot

@_navjotts_

Principal ML Scientist @ Adobe. Past: Founding ML & LLM R&D at @cresta. Co-founded Doculus, acquired by Box. Alum Maths & Comp @IITKgp.

San Francisco, CA Katılım Kasım 2014
241 Takip Edilen105 Takipçiler
Sabitlenmiş Tweet
Navjot
Navjot@_navjotts_·
The future: Domain Foundation Models Specially, if you care about both 1) the depth of capabilities (long-tail accuracy) 2) the breadth of capabilities (features) More updates (detailed benchmarks, impact of self-instruct v/s RLHF) coming soon! @Kuan_Liu_ @plusepsilon @timshi_ai
Tim Shi@timshi_ai

Today, We introduced Ocean-1, a foundation model for the contact center. It's the culmination of our experience in generative AI for large enterprises and our latest milestone in advancing the cutting edge for customer facing conversations. cresta.com/blog/introduci…

English
0
1
8
1.5K
Navjot retweetledi
Ross Taylor
Ross Taylor@rosstaylor90·
Most takes on RL environments are bad. 1. There are hardly any high-quality RL environments and evals available. Most agentic environments and evals are flawed when you look at the details. It’s a crisis: and no one is talking about it because they’re being hoodwinked by labs marketing their models on flawed evals. 2. Even the best public RL environments and agentic evals suck, and usually can’t be used by labs without modification. Academics often publish-and-forget instead of doing the necessary follow-up work to make the envs/evals useful for labs. 3. The best person to make an environment is someone deeply knowledgeable about a field, not a high-level generalist or newbie - 🦔 not 🦊 - but most envs are being made by generalists or low-skill contractors. 4. People are too focused on whether a problem is verifiable or not, not what kind of capabilities they want to bring into being. We don’t need more math and puzzle environments. The usefulness of an environment is proportional to its difficulty of construction. 5. Saying you want to “scale RL environments” is as meaningless as “scale is all you need” in that it says nothing about your choice of what to scale. 6. People are treating RL environment scaling as a new type of pretraining (creating a new internet), but pretraining has extremely high diversity, and expecting a single company (or collection of companies) to replicate this diversity is unrealistic. That means generalisation will be slower to emerge than the previous paradigm - and so there is more leverage in choosing which environments to build first. If you’d like to help answer the right questions in this new space, join us at @GenReasoning.
English
31
47
704
111.6K
Navjot
Navjot@_navjotts_·
Worth re-reading The Bitter Lesson every few months. Each time, a different part hits you, usually "exposing" your latest attempts to dodge the very mistake it warns against. > "We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done." incompleteideas.net/IncIdeas/Bitte…
English
1
0
5
484
Navjot
Navjot@_navjotts_·
Navjot tweet media
ZXX
0
0
0
436
Navjot
Navjot@_navjotts_·
The 2nd bucket of highly-paid AI talent is emerging: the ones who are deep (enough) into not only pre-training and post-training of LLMs – but with the complementary skills to make LLMs actually work in messy real-world use cases (I don't mean SWE skills here, that's table-stakes). Are they special? Not sure – just very rare right now – need ~10k hours of focused IC practice in real-world scenarios, when “GenAI” itself is barely 3 years old. (Related to the recent “95%” Fortune article, the underlying MIT report, and all the chatter about the GenAI bubble)
English
3
3
73
45.7K
Navjot
Navjot@_navjotts_·
My top 5 most memorable “LLMs” launches: 1. text-davinci-002 (first one that really "got it"/worked) 2. GPT4 (biggest step function jump seen till now) 3. Clause 3.5 Sonnet (first true dethroning) 4. o1-pro (clear glimpse of robust human-like reasoning) 5. DeepSeek-R1 (proof open can beat closed)
Nathan Lambert@natolambert

My top 5 most memorable models from using them at/soonafter launch: 1. Claude 3.5 Sonnet (personality, all round perf) 2. o3 (search behavior + perf) 3. o1 pro (robustness) 4. Gemini 2.5 pro (long context + perf) 5. GPT 4.5 (personality)

English
1
1
6
339
Navjot retweetledi
Iwona Bialynicka-Birula ⏩
Heading to @kdd_news where we'll be presenting our work on evaluating LLMs for factuality when analyzing conversation transcripts (github.com/cresta/fect) and @cresta is sponsoring AI Reasoning Day! @_navjotts_, @DalalBinoy, and many other wonderful Crestans will also be there. Come talk to us to hear about the cutting-edge work Cresta is doing in enterprise generative AI!
Iwona Bialynicka-Birula ⏩ tweet media
English
0
2
4
502
Navjot retweetledi
Kyle Corbitt
Kyle Corbitt@corbtt·
Very excited to have Cresta as a user of agent reinforcement trainer (ART)! Cresta is AI-native and has a large, sophisticated ML team. ART isn't just easy to get started with, but also very powerful!
Abhijnan Nath@AbhijnanN

@corbtt @bradhilton Fantastic work y’all. We’ve been using your platform extensively at cresta especially for the email RL project and it’s been fun!

English
2
4
19
2.7K
Navjot
Navjot@_navjotts_·
Why its hard to make LLMs work for real-life usecases (not toy-benchmarks / demos): if you're gonna push a piece of "machinery" to the limit, and expect it to hold together – you have to have some sense of where that limit is. (that limit can't be read on twitter, cant be logically deduced – it has to be "felt" by actually pushing the machinery to the limit)
English
4
3
40
46.2K
Navjot retweetledi
Noam Brown
Noam Brown@polynoamial·
You don’t need a PhD to be a great AI researcher. Even @OpenAI’s Chief Research Officer doesn’t have a PhD.
English
192
198
3.4K
1.3M
clem 🤗
clem 🤗@ClementDelangue·
Who’s at #neurips2024 and want to meet HF team members?
English
8
1
32
7.4K
Navjot
Navjot@_navjotts_·
I am at #NeurIPS2024 this week! Key ML areas our group under @timshi_ai at @cresta is working on: - AI Agents than can reason and troubleshoot effectively in complex enterprise domains - Multimodal Knowledge Grounding - LLM-as-a-judge framework that actually works We are hiring!
English
1
4
11
3.7K
Navjot
Navjot@_navjotts_·
More pragmatically w.r.t. reasoning, whether this (deliberate “Jumping out of the System”) is whats happening with the current approaches of Inference-Time-Scaling / Test-Time-Compute? Thats TBD, but I highly doubt it.
English
0
0
1
113
Navjot
Navjot@_navjotts_·
"Before settling on any answer, it turns inward, questioning its own assumptions, exploring different paths of thought, always seeking deeper truth." Impressive update from @Alibaba_Qwen, but this first release of QwQ-32B really took the above to heart! "excessive cautious"
Navjot tweet media
English
0
0
2
209
Navjot
Navjot@_navjotts_·
@natolambert Would be really impactful, happy to add $100 to the effort.
English
0
0
2
352
Nathan Lambert
Nathan Lambert@natolambert·
I'm offering a paid bounty to successfully convert nvidia/Nemotron-4-340B-Instruct to HuggingFace / related libraries. Starting reward $75 We really need this to unlock synthetic permissive data + open distillation projects. Conditions to satisfy this: 1. Useful FP8 quantization + single node HF implementation. 2. Multi-node HF implementation. I want to create new non OpenAI output permissive datasets, try @billyuchenlin's Magpie method, try distillation to smaller models, and much more. Initial donors: me: $50 @soldni: $25 Calling on more people from the synthetic data community to contribute $ @NousResearch / @teknium, @huggingface (stands to gain the most) / @osanseviero, @synth_labs / @lcastricato @AlbalakAlon
English
59
27
194
97.5K
Navjot
Navjot@_navjotts_·
I am at #NeurIPS2023 this week! Some ML areas our group under @timshi_ai at @cresta is working on: - Domain specific instruction finetuning - Retrieval Augmentation and Knowledge Grounding - Reward modeling and conversation-level outcomes Hit me up for a chat. We are hiring!
English
0
5
12
1.1K
Navjot retweetledi
Tim Shi
Tim Shi@timshi_ai·
Cresta's been deploying finetuned GPT in production since 2019. cresta.com/blog/action-di… We are ramping up LLM effort to build the most advanced conversational agent. Join us at: linkedin.com/jobs/view/3701… Some ML areas we are excited about: - domain foundation models leveraging instruction finetuning and RLHF - zero-shot and few-shot learning for complex semantic concepts - retrieval augmented generation - low-latency LLMs for real-time copilot. - reward modeling and conversation-level outcomes.
English
0
1
3
719
Navjot
Navjot@_navjotts_·
I'll be at #ICML2023 next week! If you work on LLM training + infra, Retrieval Augmentation and Knowledge Grounding, I'd love to chat, and share some of the interesting challenges our group under @timshi_ai is trying to solve at Cresta.
English
1
0
6
532