Navjot

37 posts

Navjot

@_navjotts_

Principal ML Scientist @ Adobe. Past: Founding ML & LLM R&D at @cresta. Co-founded Doculus, acquired by Box. Alum Maths & Comp @IITKgp.

San Francisco, CA Katılım Kasım 2014

241 Takip Edilen105 Takipçiler

Sabitlenmiş Tweet

Navjot@_navjotts_·22 Haz

The future: Domain Foundation Models Specially, if you care about both 1) the depth of capabilities (long-tail accuracy) 2) the breadth of capabilities (features) More updates (detailed benchmarks, impact of self-instruct v/s RLHF) coming soon! @Kuan_Liu_ @plusepsilon @timshi_ai

Tim Shi@timshi_ai

Today, We introduced Ocean-1, a foundation model for the contact center. It's the culmination of our experience in generative AI for large enterprises and our latest milestone in advancing the cutting edge for customer facing conversations. cresta.com/blog/introduci…

English

1.5K

Navjot retweetledi

Ross Taylor@rosstaylor90·24 Ağu

Most takes on RL environments are bad. 1. There are hardly any high-quality RL environments and evals available. Most agentic environments and evals are flawed when you look at the details. It’s a crisis: and no one is talking about it because they’re being hoodwinked by labs marketing their models on flawed evals. 2. Even the best public RL environments and agentic evals suck, and usually can’t be used by labs without modification. Academics often publish-and-forget instead of doing the necessary follow-up work to make the envs/evals useful for labs. 3. The best person to make an environment is someone deeply knowledgeable about a field, not a high-level generalist or newbie - 🦔 not 🦊 - but most envs are being made by generalists or low-skill contractors. 4. People are too focused on whether a problem is verifiable or not, not what kind of capabilities they want to bring into being. We don’t need more math and puzzle environments. The usefulness of an environment is proportional to its difficulty of construction. 5. Saying you want to “scale RL environments” is as meaningless as “scale is all you need” in that it says nothing about your choice of what to scale. 6. People are treating RL environment scaling as a new type of pretraining (creating a new internet), but pretraining has extremely high diversity, and expecting a single company (or collection of companies) to replicate this diversity is unrealistic. That means generalisation will be slower to emerge than the previous paradigm - and so there is more leverage in choosing which environments to build first. If you’d like to help answer the right questions in this new space, join us at @GenReasoning.

English

704

111.6K

Navjot@_navjotts_·23 Ağu

Worth re-reading The Bitter Lesson every few months. Each time, a different part hits you, usually "exposing" your latest attempts to dodge the very mistake it warns against. > "We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done." incompleteideas.net/IncIdeas/Bitte…

English

484

Navjot@_navjotts_·22 Ağu

ZXX

436

Navjot@_navjotts_·22 Ağu

The 2nd bucket of highly-paid AI talent is emerging: the ones who are deep (enough) into not only pre-training and post-training of LLMs – but with the complementary skills to make LLMs actually work in messy real-world use cases (I don't mean SWE skills here, that's table-stakes). Are they special? Not sure – just very rare right now – need ~10k hours of focused IC practice in real-world scenarios, when “GenAI” itself is barely 3 years old. (Related to the recent “95%” Fortune article, the underlying MIT report, and all the chatter about the GenAI bubble)

English

45.7K

Navjot@_navjotts_·18 Ağu

@natolambert I added a more stricter "LLM" clause

English

Navjot@_navjotts_·18 Ağu

My top 5 most memorable “LLMs” launches: 1. text-davinci-002 (first one that really "got it"/worked) 2. GPT4 (biggest step function jump seen till now) 3. Clause 3.5 Sonnet (first true dethroning) 4. o1-pro (clear glimpse of robust human-like reasoning) 5. DeepSeek-R1 (proof open can beat closed)

Nathan Lambert@natolambert

My top 5 most memorable models from using them at/soonafter launch: 1. Claude 3.5 Sonnet (personality, all round perf) 2. o3 (search behavior + perf) 3. o1 pro (robustness) 4. Gemini 2.5 pro (long context + perf) 5. GPT 4.5 (personality)

English

339

Navjot retweetledi

Iwona Bialynicka-Birula ⏩@iwonabb·2 Ağu

Heading to @kdd_news where we'll be presenting our work on evaluating LLMs for factuality when analyzing conversation transcripts (github.com/cresta/fect) and @cresta is sponsoring AI Reasoning Day! @_navjotts_, @DalalBinoy, and many other wonderful Crestans will also be there. Come talk to us to hear about the cutting-edge work Cresta is doing in enterprise generative AI!

English

502

Navjot retweetledi

Kyle Corbitt@corbtt·17 Tem

Very excited to have Cresta as a user of agent reinforcement trainer (ART)! Cresta is AI-native and has a large, sophisticated ML team. ART isn't just easy to get started with, but also very powerful!

Abhijnan Nath@AbhijnanN

@corbtt @bradhilton Fantastic work y’all. We’ve been using your platform extensively at cresta especially for the email RL project and it’s been fun!

English

2.7K

Navjot@_navjotts_·16 Tem

Why its hard to make LLMs work for real-life usecases (not toy-benchmarks / demos): if you're gonna push a piece of "machinery" to the limit, and expect it to hold together – you have to have some sense of where that limit is. (that limit can't be read on twitter, cant be logically deduced – it has to be "felt" by actually pushing the machinery to the limit)

English

46.2K

Navjot retweetledi

Noam Brown@polynoamial·29 Haz

You don’t need a PhD to be a great AI researcher. Even @OpenAI’s Chief Research Officer doesn’t have a PhD.

English

192

198

3.4K

1.3M

Navjot@_navjotts_·14 Ara

@ClementDelangue 👋

QME

clem 🤗@ClementDelangue·14 Ara

Who’s at #neurips2024 and want to meet HF team members?

English

7.4K

Navjot@_navjotts_·11 Ara

I am at #NeurIPS2024 this week! Key ML areas our group under @timshi_ai at @cresta is working on: - AI Agents than can reason and troubleshoot effectively in complex enterprise domains - Multimodal Knowledge Grounding - LLM-as-a-judge framework that actually works We are hiring!

English

3.7K

Navjot@_navjotts_·9 Ara

More pragmatically w.r.t. reasoning, whether this (deliberate “Jumping out of the System”) is whats happening with the current approaches of Inference-Time-Scaling / Test-Time-Compute? Thats TBD, but I highly doubt it.

English

113

Navjot@_navjotts_·9 Ara

This, w.r.t. emergence of consciousness. And I think the “deliberate” aspect is key here. GEB talked about it also in “Jumping out of the System” – the need to jump out of the task being performed, survey whats been done, and ensure key requirements (consistency AND efficiency).

François Chollet@fchollet

"system 2 as iterated system 1 with a self-consistency guarantee" is something I considered a few years ago as one of two hypotheses, and it led me to an interesting potential interpretation of consciousness -- consciousness is the consistency guarantee. It explains why all system 2 processing involves consciousness. Iterated system 1 with low self-consistency = dream-like, incoherent, ever shifting Iterated system 1 with strong self-consistency guardrails = deliberate, coherent, self-correcting The maintenance of strong self-consistency enforcing feedback loops between iterations is what consciousness emerges from.

English

328

Navjot@_navjotts_·28 Kas

"Before settling on any answer, it turns inward, questioning its own assumptions, exploring different paths of thought, always seeking deeper truth." Impressive update from @Alibaba_Qwen, but this first release of QwQ-32B really took the above to heart! "excessive cautious"

English

209

Navjot@_navjotts_·20 Tem

@natolambert Would be really impactful, happy to add $100 to the effort.

English

352

Nathan Lambert@natolambert·20 Tem

I'm offering a paid bounty to successfully convert nvidia/Nemotron-4-340B-Instruct to HuggingFace / related libraries. Starting reward $75 We really need this to unlock synthetic permissive data + open distillation projects. Conditions to satisfy this: 1. Useful FP8 quantization + single node HF implementation. 2. Multi-node HF implementation. I want to create new non OpenAI output permissive datasets, try @billyuchenlin's Magpie method, try distillation to smaller models, and much more. Initial donors: me: $50 @soldni: $25 Calling on more people from the synthetic data community to contribute $ @NousResearch / @teknium, @huggingface (stands to gain the most) / @osanseviero, @synth_labs / @lcastricato @AlbalakAlon

English

194

97.5K

Navjot@_navjotts_·12 Ara

I am at #NeurIPS2023 this week! Some ML areas our group under @timshi_ai at @cresta is working on: - Domain specific instruction finetuning - Retrieval Augmentation and Knowledge Grounding - Reward modeling and conversation-level outcomes Hit me up for a chat. We are hiring!

English

1.1K

Navjot retweetledi

Tim Shi@timshi_ai·31 Ağu

Cresta's been deploying finetuned GPT in production since 2019. cresta.com/blog/action-di… We are ramping up LLM effort to build the most advanced conversational agent. Join us at: linkedin.com/jobs/view/3701… Some ML areas we are excited about: - domain foundation models leveraging instruction finetuning and RLHF - zero-shot and few-shot learning for complex semantic concepts - retrieval augmented generation - low-latency LLMs for real-time copilot. - reward modeling and conversation-level outcomes.

English

719

Navjot@_navjotts_·22 Tem

I'll be at #ICML2023 next week! If you work on LLM training + infra, Retrieval Augmentation and Knowledge Grounding, I'd love to chat, and share some of the interesting challenges our group under @timshi_ai is trying to solve at Cresta.

English

532

Keşfet

@GenReasoning @natolambert @kdd_news @cresta @DalalBinoy @OpenAI @ClementDelangue @timshi_ai