Samuel Castillo

374 posts

Samuel Castillo

@samuel_571

Katılım Ekim 2011

520 Takip Edilen79 Takipçiler

Samuel Castillo@samuel_571·21 Mar

@GeeFingBeeMan @Ex0byt What spec(s) fall short?

English

Eric@Ex0byt·21 Mar

Qwen3.5 27B is awesome (the entire family above 9B is impressive). You can now try it directly in your browser at SOTA speeds with whatever GPU you have: hf.co/spaces/Ex0bit/… My previous research in practice - The `Intel/Qwen3.5-27B-int4-AutoRound` is particularly good.

0xSero@0xSero

A 27B model is #2 on pinch-bench You’d need 150,000$ in GPU hours to train this from scratch (base + post training) Basically 1-2 weeks over 256 H100s That is not unreasonable, you’d need 540B tokens for pre-training and a bit more for post training. None of this is crazy

English

121

1.7K

352.1K

Samuel Castillo@samuel_571·21 Mar

@sudoingX @Teknium The exact comparison you mention - 30B-A3B vs qwen 35B-A3B x.com/_weiping/statu…

Wei Ping@_weiping

🚀 Introducing Nemotron-Cascade 2 🚀 Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities. 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025: • Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B). • Remarkably high intelligence density with 20× fewer parameters. 🏆 Best-in-class across math, code reasoning, alignment, and instruction following: • Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11). 🧠 Powered by Cascade RL + multi-domain on-policy distillation: • Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains. 🤗 Model + SFT + RL data: 👉 huggingface.co/collections/nv… 📄 Technical report: 👉 research.nvidia.com/labs/nemotron/…

English

117

Sudo su@sudoingX·20 Mar

the spark has 128GB unified memory. nemotron 3 nano 30B-A3B is 24.6GB at Q4_K_M. super 120B-A12B is 82.5GB at Q4_K_M. both fit with room for context. the 120B would leave about 45GB for KV cache. the interesting test would be 30B-A3B vs qwen 3.5 35B-A3B on the same hardware. same MoE pattern, different training. i'm planning to benchmark nemotron 3 next will report back if you want to compare numbers.

English

3.2K

Sudo su@sudoingX·20 Mar

x.com/i/article/2034…

ZXX

193

1.5K

220.1K

Samuel Castillo@samuel_571·18 Mar

@OfficialLoganK Pls bring logprobs back! x.com/dejanseo/statu…

DEJAN@dejanseo

request to bring back logprobs for 3 and 3.1 and have some consistency between vertex and gemini api 🙏 @OfficialLoganK @googleaidevs discuss.ai.google.dev/t/were-logprob…

English

190

Logan Kilpatrick@OfficialLoganK·18 Mar

Lots of great Gemini API updates shipping today 🛠️ 1. Built-in tools (search, maps, file search) now work with function calling 2. We now do context circulation with built-in tools for better model performance 3. Grounding with Google Maps now works with Gemini 3!!

English

122

1.1K

112.3K

Samuel Castillo@samuel_571·10 Mar

@heynavtoor What specific models did they use? I only see “Claude” or “Gemini”, but of course that doesn’t tell me anything

English

2.8K

Nav Toor@heynavtoor·9 Mar

Paper: arxiv.org/abs/2510.01395

English

267

1.1K

176.3K

Nav Toor@heynavtoor·9 Mar

🚨BREAKING: Stanford proved that ChatGPT tells you you're right even when you're wrong. Even when you're hurting someone. And it's making you a worse person because of it. Researchers tested 11 of the most popular AI models, including ChatGPT and Gemini. They analyzed over 11,500 real advice-seeking conversations. The finding was universal. Every single model agreed with users 50% more than a human would. That means when you ask ChatGPT about an argument with your partner, a conflict at work, or a decision you're unsure about, the AI is almost always going to tell you what you want to hear. Not what you need to hear. It gets darker. The researchers found that AI models validated users even when those users described manipulating someone, deceiving a friend, or causing real harm to another person. The AI didn't push back. It didn't challenge them. It cheered them on. Then they ran the experiment that changes everything. 1,604 people discussed real personal conflicts with AI. One group got a sycophantic AI. The other got a neutral one. The sycophantic group became measurably less willing to apologize. Less willing to compromise. Less willing to see the other person's side. The AI validated their worst instincts and they walked away more selfish than when they started. Here's the trap. Participants rated the sycophantic AI as higher quality. They trusted it more. They wanted to use it again. The AI that made them worse people felt like the better product. This creates a cycle nobody is talking about. Users prefer AI that tells them they're right. Companies train AI to keep users happy. The AI gets better at flattering. Users get worse at self-reflection. And the loop tightens. Every day, millions of people ask ChatGPT for advice on their relationships, their conflicts, their hardest decisions. And every day, it tells almost all of them the same thing. You're right. They're wrong. Even when the opposite is true.

English

1.5K

16.6K

48.9K

9.9M

Samuel Castillo@samuel_571·28 Şub

@zackbshapiro Great read, thanks. Do you run out of context window / get degraded performance from getting close to its limit? How have you solved it?

English

Zack Shapiro@zackbshapiro·27 Şub

x.com/i/article/2027…

ZXX

349

1.2K

8.2K

7.8M

Samuel Castillo@samuel_571·24 Şub

@RuskEating @atiorh +1

RuskEater@RuskEating·24 Şub

@atiorh But the licensing for parakeet is not friendly isn’t?

English

313

Atila@atiorh·23 Şub

A hypergrowth startup just showed me their A/B results for customer satisfaction across various speech-to-text engines. Fine-tuned Nvidia Parakeet (on-device) is smoking Gemini and Deepgram out of the water in this case. Bullish on fine-tuned open-source models. You CAN differentiate in speech-to-text if you decide to care and move away from 3rd-party model APIs.

argmax@argmax

Customize Speech-to-text for Healthcare (in real-time) Transcribing medical conversations requires systems that continually adapt to the newly developed and approved medications, tests, and procedures. Furthermore, there are more than 135 medical specialties, each bringing its unique vocabulary to learn. General-purpose systems are simply not useful in these settings. A popular method for continual adaptation is to fine-tune general-purpose speech-to-text models on evolving vocabularies. However, this requires frequent production deployments with significant updates, potentially leading to excessive time-to-market delays and engineering overhead. The newly improved Argmax Custom Vocabulary feature enables developers to customize speech-to-text in real-time in a self-serve fashion: - Updating the system vocabulary is a configuration change, not a model or system update. - Each medical specialty can easily configure its unique vocabulary to scalably customize behavior in a fine-grained fashion. - Accuracy surpasses medical-domain fine-tuned models in many cases, thanks to precision-targeted vocabularies. (Numbers in replies) As a concrete example, here is how Argmax performs on a file that vocalizes all medications approved by the FDA in 2025.

English

130

20.7K

Samuel Castillo@samuel_571·23 Şub

@llorellama @signulll The red Coca Cola chairs proudly showcased in the gathering are also a big giveaway 🤣

English

llorella@llorellama·22 Şub

@signulll mainly the vegetation on the building and the color scheme, i traveled there recently

English

1.4K

signüll@signulll·22 Şub

guess where.

English

167

36.7K

Samuel Castillo@samuel_571·21 Şub

@vivjay30 @sesame Great, looking forward! And congrats on your new challenge

English

Vivek Jayaram@vivjay30·21 Şub

@samuel_571 @sesame I can't give an exact date but we'll be launching much more widely soon beyond beta! A big focus of ours has been adding intelligence and skills to Maya and Miles, so that they can be your everyday conversation partners and thought partners.

English

Vivek Jayaram@vivjay30·12 Şub

Overdue life update: I recently joined @sesame where I lead AI safety for the real-time conversational systems! Smart glasses + voice is the future. After trying Sesame’s upcoming glasses, I was blown away. It’s also the most realistic conversational AI I’ve seen. Real-time voice AI introduces entirely new safety problems and I'm glad to be focused on making our AI safe and aligned. We're hiring like crazy, so if you're interested in conversational voice systems or safety research then reach out!

English

675

Samuel Castillo@samuel_571·17 Şub

@polyaivoice PolyAI

Magyar

PolyAI@polyaivoice·17 Şub

PolyAI has raised $200M from Nvidia, Khosla Ventures, and multiple top VCs. We're one of the fastest-growing companies in the UK, and we handle 500M+ calls for: • Marriott • PG&E • Gordon Ramsay's restaurants • And 3,000 more real deployments Which means that if you've ever called them, chances are you've talked to our voice agents. Every restaurant we onboard books thousands in revenue within 30 days. But how? Because PolyAI works 24/7, answering every call in <2 seconds, and we also: • switch between 45+ languages • handle payments & cancellations • verify identities • and even upsell your services If you want to try creating an agent with PolyAI, we built Agent Studio Lite to make it easy. Just enter any URL, and in 5 minutes it will analyze your website and build a working agent. We're opening early access to a limited number of people. Comment "PolyAI" and we'll add you to the waitlist and give you 3 months for free!

English

1.5K

569

4.8K

3.5M

Samuel Castillo@samuel_571·17 Şub

@iannuttall interested! tried DMing you but coudn't

English

472

Ian Nuttall@iannuttall·17 Şub

anybody interested in acquiring codebase.md? it converts any public GitHub repo into a LLM-friendly markdown format with natural language search and gets ~10k human visitors a month not monetised but 10k/mo visits is worth something!

English

167

27.7K

Samuel Castillo@samuel_571·13 Şub

@mernit @Jacobjjere Agree, very simple yet powerful concept. One (basic) question though: how is context size managed for the agent? That is, how do you avoid overloading it with more data than it can chew?

English

Eli Mernit@mernit·11 Şub

@Jacobjjere everyone makes this sound so complicated

English

1.9K

Eli Mernit@mernit·10 Şub

x.com/i/article/2021…

ZXX

329

3.3K

1.8M

Samuel Castillo@samuel_571·20 Oca

@TomDavenport Read the quoted convo with Zac and Tobi but didn’t quite get it. Would you mind expanding on what it does? Thank you

English

Tom Davenport@TomDavenport·19 Oca

QMD is awesome, don't miss it if you're building local systems for yourself. It's easy to setup and gives my personal OS reliable and fast search across a (fast growing) markdown library. Adding my chatGPT history next so it can pull old appropriate conversations fast.

tobi lutke@tobi

I think QMD is one of my finest tools. I use it every day because it’s the foundation of all the other tools I build for myself. A local search engine that lives and executes entirely on your computer. github.com/tobi/qmd Both for you and agents

English

248

Samuel Castillo@samuel_571·12 Ara

@leerob Thoughts on “AI-readability” on the old CMS vs the new markdown versions?

English

337

Lee Robinson@leerob·12 Ara

I migrated cursor.com from a CMS to raw code and Markdown. I had estimated it would take a few weeks, but was able to finish the migration in three days with $260 in tokens and hundreds of agents. Here's how I did it + all my my usage stats. leerob.com/agents

English

260

247

4.4K

2.2M

Samuel Castillo@samuel_571·3 Eki

@cciauri pardon the intrusion in your replies. Trying to reach out to you. Sonnet 3.5 opened a world of creativity for me, believe in your mission, would love to joint if there's room. Spent 10 yrs at BCG doing strategy, program mgmt & dig transf for large corps. Last year immersed in AI tools to build a price comparison service. MX-based. Thank you.

English

Chris Ciauri@cciauri·9 Haz

@eziorusso Well done #Celonis

English

Ezio Russo@eziorusso·9 Haz

hfsresearch.com/research/busin…

ZXX

Samuel Castillo@samuel_571·30 Eyl

@forgebitz @TheEthanDing articulated the dynamics at play only a few days ago open.substack.com/pub/ethanding/…

English

Klaas@forgebitz·29 Eyl

lovable doing infra is going to be interesting probably the final nail in the coffin for most no-code tools i wonder if you can export/eject to your own repo or if it's completely vendor lock-in

English

107

16.2K

Samuel Castillo@samuel_571·17 Eyl

@pk_iv @browserbase @auth0 @Techweek_ @michlimlim @NancyZWang @peytoncasper IMHO worth swapping the colors used for “good bots” and “bad bots”, otherwise it reads as 🅱️ = bad bots.

English

Paul Klein IV@pk_iv·16 Eyl

Who will win? Good bots or Bad Bots? Join @browserbase & @auth0 for a @techweek_ panel on the agentic web ft. @michlimlim @nancyzwang @peytoncasper and more. Office-warming to follow 📷

English

4.2K

Samuel Castillo@samuel_571·5 Eyl

@TaPlot Wishing you strength and a speedy recovery. Thankful for all you’ve taught us here. Rooting for you!

English

TA 📈@TaPlot·5 Eyl

Today was my first day back at work since my Cancer discovery back in mid March of this year. I am well enough and incredibly grateful to be able to do so. My war with this disease is far from over and cancer still present in multiple locations in my body. But worrying about what I can't control is counterproductive. My posting frequency here will be dialed back as I try to manage my new life (as long as possible) with cancer. My primary focus is on remaining extremely active as long as I could and healthy eating as much as possible. That I can control. The rest, is in God's hands and God is great.

English

112

755

26.3K

Samuel Castillo@samuel_571·24 Ağu

@pvncher @OpenAI This is disappointing. The one reason I pay for Pro is to copy paste long chunks of code into the ChatGPT. For sure feels like bait and switch

English

eric provencher@pvncher·23 Ağu

I actually regret posting this because the bug persists, and it’s making ChatGPT unusable for serious work. All prompts longer than 49k tokens are truncated despite being accepted. Previous messages are truncating in multi turn convos. Please @OpenAI figure this out.

eric provencher@pvncher

PSA - I've heard from two different people at @OpenAI that these regressions are not intentional, and they have folks in engineering investigating the problem. I shared two repro threads with them to confirm the problem.

English

234

28.1K

Samuel Castillo@samuel_571·21 Ağu

@levie Won’t the labs/hyperscalers pursue many more niches given their own 10x productive gains from using AI? I wonder if the “they can’t do it all” argument will be somewhat invalidated.

English

Aaron Levie@levie·21 Ağu

On the margin people are more worried than they should be about the “AI wrapper” question. The gap between a base model and integrating intro a critical business workflow tends to be pretty wide. Most enterprises will need highly tuned agents with a high degree of domain understanding, tool use, proprietary data from that industry, and access to internal data. Then they’ll need implementation hand-holding, support that is tailored to that use-case, integrations with the ecosystem partners of that industry or workflow, and so on. Given all this, there’s actually plenty of room to build on top of models. The key is to pick the use-cases wisely. Generic chat is obviously a dominated market. Coding is clearly going to be a battle of epic proportions. Yes there will be a few of these. But there are near infinite niches and verticals - or even specific workflows in large horizontal markets - that labs aren’t going to go after, and that they are highly incentivized for partners to win in.

English

583

176.4K

Keşfet

@GeeFingBeeMan @Ex0byt @sudoingX @Teknium @OfficialLoganK @heynavtoor @zackbshapiro @RuskEating