yalishandi

23 posts

yalishandi

@yalishandi

chief executive ic

Katılım Eylül 2024

202 Takip Edilen1 Takipçiler

yalishandi@yalishandi·11 Haz

Gen z boss and an acquihire

NIK@ns123abc

🚨NEWS: Meta in talks to invest more than $10 billion in Scale AI

Nederlands

yalishandi@yalishandi·30 May

@MoonL88537 Looool at the 9.1 = 9.10 > 9.11 These model explanations remind me of the demons in Frieren (animals that evolved the ability to speak purely as a means to deceive humans, with no moral implications)

English

Moon@MoonL88537·29 May

did i mention that this is totally nuts?

English

184

423

6.1K

674.3K

yalishandi@yalishandi·27 May

@jamonholmgren Improves maintainability

English

Jamon@jamonholmgren·27 May

Do you really think precise location will matter for this query, Google?

English

156

377

13K

279.7K

yalishandi@yalishandi·12 May

Subway surfers is an adversarial example for humans, discovered with blackbox optimization

English

yalishandi@yalishandi·9 May

@kalomaze "Human beings fight not because they are different, but because they are the same, and in their attempts to distinguish themselves have made themselves into enemy twins, human doubles in reciprocal violence." -- Girard

English

373

kalomaze@kalomaze·9 May

the Cold War-esque dynamic between vllm and sglang needs to be studied

English

116

23.5K

yalishandi@yalishandi·1 May

@jadechoghari @huggingface Build our product to "get on the radar for Hugging Face internships" The job market is what is, respect the grift 🤣🤷

English

142

jade@jadechoghari·30 Nis

🚨 Announcing: @huggingface @ Waterloo 🤗 A direct path for waterloo students to go from learning ML → shipping code with the HF team → contributing to state-of-the-art open-source models → path to internship and Fellowship—all while building a standout resume. 📝 Applications open now — only 5 to 10 spots per term! 👉 tally.so/r/wdgG5K 🔍 Learn more: hfwaterloo.notion.site 👩‍💻 Two tracks: 1️⃣ Junior ML Scholar (1st/2nd yr) – build @Gradio demos, explore models, and engage with the HF community. 2️⃣ Senior ML Scholar (3rd/4th yr) – work with HF engs to contribute models, test SOTA, and improve core libraries.

English

162

36.1K

yalishandi@yalishandi·25 Nis

@PandaAshwinee @ZainHasan6 Can test by creating the same plot for a non-reasoning model (just prompt the model to use <think> tags)

English

Ashwinee Panda @ICLR2026@PandaAshwinee·25 Nis

@ZainHasan6 cool. so how is it made exactly? it’s typically the case that anything at the end of the cot has much higher conf bc the model has conditioned on so much stuff.

English

Zain@ZainHasan6·25 Nis

mildly interesting average <think> token probabilities are always lower than answer tokens probs for DeepSeek-R1 sample set: AIME 2024 30 problems

English

752

yalishandi@yalishandi·9 Nis

@tianle_cai labs can't pub

English

140

Tianle Cai@tianle_cai·8 Nis

As an RL newbie I came across a very similar idea recently and was shocked to see that this natural idea (using the loss reduction over the ground truth answer when adding CoT before it) was (to my limited RL knowledge) only covered by the following and another very recent paper (arxiv.org/abs/2503.19618) lol. Did I miss any literature on this, or does it simply not work well?

Nathan Lambert@natolambert

Underrated paper and idea on using RL losses on non-verifiable domains, in this case the perplexity of the next chapter of a book.

English

149

35.7K

yalishandi@yalishandi·6 Nis

And then they came for the s&p 500 -- and there was no one left to speak out

English

yalishandi@yalishandi·4 Nis

High schooler first time reviewing for #icml2025. I made sure to hit the "acknowledge rebuttal" the instant it hit my inbox to make sure authors know i'm engaged with the process 😊

English

1.2K

yalishandi@yalishandi·31 Mar

@StartupArchive_ The answer here -- as machine learners will tell you -- is not *no evals*, but a train test split. Find several metrics that correlate with downstream goodness. Hold some of them out, so that teams must try to improve the underlying "g factor" of these metrics

English

Startup Archive@StartupArchive_·30 Mar

Shopify CEO Tobi Lutke explains Goodhart’s law and why he doesn’t like KPIs or OKRs “Goodhart’s law is real. The moment a metric becomes a goal, it’s no longer a useful metric… No metric by itself is a complete heuristic for a complex business. There’s a million different tensions in a company, and you can’t keep all of them in harmony by optimizing for one thing.” For this reason, Shopify doesn’t use KPIs or OKRs. But as Tobi explains, this doesn’t mean they don’t value data and metrics. “We are extremely data informed. We have invested enormous amounts of money and time into systems that give us basically everything at our fingertips… But what Shopify attempts to do is just not over-fit for what’s quantifiable.” People love optimizing for highly-quantifiable things because there’s immediate gratification that comes from seeing a number go up. But Tobi thinks that the most important aspects of a product are rarely quantifiable: “The overlap of the most valuable things you can do with a product and the things that happen to be fully quantifiable are like maybe 20%. Which leaves 80% of a value space unaddressable by the people who only look at quantifiable things.” He continues: “Shopify is comfortable with unquantifiable things like taste, quality, passion, love, hate… The sort of deep satisfaction that a craftsperson feels when they’ve done a job well is actually a better proxy if you allow it to be.” They then have robust analytics systems that tell the company if something’s wrong or a new rollout breaks something. “We think about it as a cockpit for a pilot. The decisions are still made by pilots, and we think this leads to better results… I think there needs to be more acceptance in business of unquantifiable things… And then metrics take a support function.” Video source: @lennysan (Feb 2025)

English

171

1.4K

171.1K

yalishandi@yalishandi·5 Mar

@g_k_swamy Hi Gokul, great paper! I'm studying it carefully. My question: under the generator-verifier gap explanation, should we expect training with a verifier (when there is such a gap) on off-policy data to match online methods? What is the natural way to do so?

English

181

Gokul Swamy@g_k_swamy·4 Mar

1.5 yrs ago, we set out to answer a seemingly simple question: what are we *actually* getting out of RL in fine-tuning? I'm thrilled to share a pearl we found on the deepest dive of my PhD: the value of RL in RLHF seems to come from *generation-verification gaps*. Get ready to🤿!

English

227

1.8K

290.3K

yalishandi@yalishandi·27 Şub

@nrehiew_ If you have a decent prior, you need much stronger evidence than this to believe the guy is "in the trenches"

English

608

wh@nrehiew_·27 Şub

The CEO is quite literally credited as 1/3 people who came up with DualPipe. This isn't just some overseeing/managerial role. He is quite literally in the trenches.

DeepSeek@deepseek_ai

🚀 Day 4 of #OpenSourceWeek: Optimized Parallelism Strategies ✅ DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training. 🔗 github.com/deepseek-ai/Du… ✅ EPLB - an expert-parallel load balancer for V3/R1. 🔗 github.com/deepseek-ai/ep… 📊 Analyze computation-communication overlap in V3/R1. 🔗 github.com/deepseek-ai/pr…

English

760

47.1K

yalishandi@yalishandi·20 Şub

@joshua_clymer Absence of evidence vs evidence of absence

English

Josh Clymer@joshua_clymer·19 Şub

some people tell me: 'i'm an ai risk skeptic, there just isn't much evidence that there's a real risk' well I'm an ai risk skeptic too then. when will superintelligence exist? will it be easy to align? no one knows. that's the problem

English

107

5.3K

yalishandi@yalishandi·19 Şub

@thinkymachines Dug up the grave of character ai, ransacked it, and buried it again

English

Thinking Machines@thinkymachines·18 Şub

Today, we are excited to announce Thinking Machines Lab (thinkingmachines.ai), an artificial intelligence research and product company. We are scientists, engineers, and builders behind some of the most widely used AI products and libraries, including ChatGPT, Character.ai, PyTorch, and Mistral. Our mission is to make artificial intelligence work for you by building a future where everyone has access to the knowledge and tools to make AI serve their unique needs. We are committed to open science through publications and code releases, while focusing on human-AI collaboration that serves diverse domains. Our approach embraces co-design of research and products to enable learning from real-world deployment and rapid iteration. This work requires three core foundations: state-of-the-art model intelligence, high-quality infrastructure, and advanced multimodal capabilities. We are committed to building models at the frontier of capabilities to deliver on this promise. If you’re interested in joining our team, consider applying here: 6wajk07p.paperform.co

English

296

516

2.1M

yalishandi@yalishandi·4 Şub

@Yong18850571 We are in SFT paper overhang

English

261

Yong Lin@Yong18850571·3 Şub

🚀 Introducing Goedel-Prover: A 7B LLM achieving SOTA open-source performance in automated theorem proving! 🔥 ✅ Improving +7% over previous open source SOTA on miniF2F 🏆 Ranking 1st on the PutnamBench Leaderboard 🤖 Solving 1.9X total problems compared to prior works on Lean Workbook [1/n] website: goedel-lm.github.io huggingface: huggingface.co/Goedel-LM/Goed… github: github.com/Goedel-LM/Goed… Amazing collaborators: @sangertang1999 (co-first author) @Lyubh22 @wujiayun12 @hongzhou__lin @KaiyuYang4 @JiaLi52524397 @xiamengzhou @danqi_chen @prfsanjeevarora @chijinML

English

272

87.2K

yalishandi@yalishandi·3 Şub

above the chinese restaurant is cute and all, but imagine the roaches

English

yalishandi@yalishandi·2 Şub

@WenhuChen humanitys_last_exam_FINAL (2).pdf

English

yalishandi@yalishandi·29 Oca

Are markets efficient when they take 4 years to price in switch transformers (2021)?

English

yalishandi@yalishandi·28 Oca

Intelligence is simply compound intuition

English

Keşfet

@MoonL88537 @jamonholmgren @kalomaze @jadechoghari @huggingface @Gradio @PandaAshwinee @ZainHasan6