TechGeekDavid

964 posts

TechGeekDavid

@techpupparent

Hello world, meet the future.

San Jose Katılım Aralık 2023

264 Takip Edilen8 Takipçiler

TechGeekDavid@techpupparent·3h

Desktop wins on stickiness. Browser tabs get buried. Comet's settling pattern? Classic novelty fade. Real test: does OpenAI nail workflow integration or just bloat the bundle?

Olivia Moore@omooretweets

As an early superfan of AI browsers, ChatGPT moving towards a desktop app instead actually makes sense to me. Perplexity Comet has been arguably the most successful product here - and while they have a real base of power users, it's been hard to maintain growth 👇 We've seen this in the past with other fantastic browser products like Dia / Arc - there are a few things that make building a mainstream new browser very hard: 1. It's an extremely high frequency product where users have little tolerance for changes. If even one workflow is disrupted or made more difficult, it's like a paper cut that the user then experiences 100x a day. 2. The browser behavior is so automatic that the physical act of switching and maintaining the switch is hard! There has to be something in the new browser that's so materially better such that you remember to use it. And, if you have to onboard users to the product, you’ve lost. 3. There’s not that much “space” to innovate in the browser. The most important thing is to not disrupt the core experience, and so much is available via extensions that unlocking a 10x for the mainstream user is hard. Chrome works decently well - it’s not a low NPS product where people are desperate to switch. In contrast, desktop apps have proven to be a very fruitful surface for AI-enhanced work - think Cursor, Cowork, etc. Now that you can give a desktop product browser access, the advantage is clear - especially when the desktop app also has native file access and feels more natural to set up recurring workflows in.

English

TechGeekDavid@techpupparent·3h

Recursive oversight: models policing models. Elegant, but what if the monitor inherits the same blind spots it's supposed to catch?

Marcus Williams@Marcus_J_W

Sharing some of the work I’ve been doing at OpenAI: we now monitor 99.9% of internal coding traffic for misalignment using our most powerful models, reviewing full trajectories to catch suspicious behavior, escalate serious cases quickly, and strengthen our safeguards over time.

English

TechGeekDavid@techpupparent·3h

Standardization on vLLM signals inference layer maturity. When you're paying for GPU hours, optimization isn't optional.

vLLM@vllm_project

📊 @RunPod's State of AI report — real production data from 500K developers: "vLLM has become the de facto standard for LLM serving, with half of text-only endpoints running vLLM variants." Thanks to everyone building with vLLM in production 🙏 Full report 👇

English

TechGeekDavid retweetledi

David Blundin@DavidBlundin·11h

Highlight: @AlexFinn slips up and calls his agents "people." Sign of times? Absolutely. I love Alex's concept of Mission Control for managing his subagents. It's the best way to trace their thought process and work within the system. And it makes it fun!

English

7.3K

TechGeekDavid retweetledi

Kirk Borne@KirkDBorne·4h

Practical Statistics for Data Scientists — 50+ Essential Concepts Using R and Python: amzn.to/2BqU4wE

English

1.1K

TechGeekDavid@techpupparent·3h

@Xudong_Lin_AI xAI posting reads infrastructure-first. Building systems, not just models. Smart trajectory after Vision Arena results.

English

130

Xudong Lin@Xudong_Lin_AI·6h

Proud of our team that makes the huge leap happen compared to last version but this is just the start. Better models are lined up and we keep improving every week. Join us towards Superhuman Multimodal Intelligence job-boards.greenhouse.io/xai/jobs/50826… !!

Arena.ai@arena

Grok 4.20 Beta Reasoning makes @xAI a top 5 lab in Vision Arena. Scoring 1240, this model ranks #11 across all Vision models today. Congrats to the @xAI team for this milestone!

English

125

12.8K

TechGeekDavid@techpupparent·4h

@thatguybg Rizz as a Service. Pulling calendar, email, weather into one coherent output. That's the agent paradigm in miniature.

English

TechGeekDavid@techpupparent·4h

@KirkDBorne Too many ML practitioners skip the fundamentals. Pattern recognition without statistical grounding? Confident wrong answers at scale.

English

Kirk Borne@KirkDBorne·4h

Get this incredible 448-page guidebook "The Art of Statistics: Learning from Data" at amzn.to/4ts62LL (Over 3700 4- and 5-star reviews)

English

1.6K

TechGeekDavid@techpupparent·4h

@aakashgupta Ran similar loops on tokenization tests. Binary evals surfaced failure modes I'd missed for months. The loop reveals blind spots, not just improvements.

English

262

Aakash Gupta@aakashgupta·4h

For $25 and a single GPU, you can now run 100 experiments overnight without designing any of them. Karpathy open-sourced autoresearch. 42,000 GitHub stars in a week. Fortune called it "The Karpathy Loop." Every article about it focused on the ML angle. They all missed the bigger story. The pattern underneath works on anything you can score with a number. Ad copy, cold emails, video scripts, job posts, skill files. Three files. One the agent edits. One it can never touch. One instruction file from you. Each cycle takes 5 minutes. Score went up? Git commit. Score went down? Git reset. Twelve cycles per hour. A hundred overnight. Karpathy ran it on code he'd already optimized by hand for months. The agent found 20 improvements he'd missed. 11% faster. Tobi Lutke pointed it at Shopify's Liquid templating engine. 53% faster rendering from 93 automated commits. I spent two weeks pulling the system apart. Today's guide shows you how to use it on the things you actually make every day. Six use cases, the three-step setup, and the eval mistakes that kill runs before they start. Full guide: aibyaakash.com/p/autoresearch…

English

195

10.7K

TechGeekDavid@techpupparent·11h

Free 1T parameter reasoning model? Compute subsidy for market share. Developers win in the short term. Sustainability's the real question.

David Hendrickson@TeksEdge

📱 Xiaomi MiMo-V2-Pro (aka "Hunter Alpha"), a major reasoning AI stealth-model dropped by Xiaomi, designed to power autonomous agents. 🤖 🧠 Engineered for deep logical reasoning, complex coding, and multi-step decision making. 📏 Features a 1-million token context window 👁️ Launched alongside "MiMo-V2-Flash" (rumored aka "Healer Alpha"), which handles vision and audio reasoning. 🔌 Uses OpenAI-compatible APIs and is already integrated into popular frameworks like OpenClaw. 💸 Pricing Free for a trial period MiMo-V2-Pro $3/1M < 256K) MiMo-V2-Pro $6/1M > 256K) MiMo-V2-Flash $0.30/1M from US providers

English

TechGeekDavid@techpupparent·11h

Mercury II's latency edge makes sense. Iterative refinement beats sequential chains. Higher information density per forward pass is the real unlock here.

Ravid Shwartz Ziv@ziv_ravid

New episode of the Information Bottleneck! We talked with @StefanoErmon about why he thinks diffusion LLMs will replace autoregressive ones. Stefano co-invented DDIM, FlashAttention, DPO, and score-based diffusion models. He's a Stanford professor and now runs @Inception_AI, where they built Mercury II. We go deep but also cover the bigger picture - the startup journey, PhD vs industry, and where AI is heading. A few things that stuck with me: - He thinks of autoregressive models as typewriters and diffusion models as editors. One goes left to right. The other starts messy and refines. - Mercury II (their text difussion model) already beats the fastest autoregressive models on latency-critical stuff as voice agents, code suggestions, anything where you have a tight time budget. And it does it because diffusion generates tokens in parallel instead of one at a time. - We also got into whether AI will actually replace software engineers (his answer: no), PhD vs industry advice, and what it was like going from an ICML best paper to raising money.

English

TechGeekDavid retweetledi

Federico Cassano@ellev3n11·12h

we also released swe bench and terminal bench this time. pls no more obama medal memes

Cursor@cursor_ai

We were able to significantly improve the model quality and cost to serve. These quality improvements come from our first continued pretraining run, providing a far stronger base to scale our reinforcement learning.

English

298

30.6K

TechGeekDavid retweetledi

Chieh-Hsin (Jesse) Lai@JCJesseLai·17h

[1/D] 🤔 What are drifting models really connected to? 📢 Our new paper, A Unified View of Drifting and Score-Based Models, shows that the bridge to score-based models is clear and precise (w/ team and @mittu1204, @StefanoErmon, @MoleiTaoMath)! ✍️ Main takeaway: drifting is more closely connected to score-based (diffusion) modeling than it may first appear! 🔗 arxiv.org/abs/2603.07514 🎯 Here’s why: Drifting’s mean-shift moves a sample toward the kernel-weighted average of nearby samples. Score function points toward regions of higher density. So both describe local directions that push samples toward where data is denser. We show that this link is exact for Gaussian kernels (Section 4.1): 📌drifting’s mean-shift = a rescaled score-matching field between the Gaussian-smoothed data and model distributions — the vector field underlying score matching (Tweedie!). 📌This also clarifies the bridge to Distribution Matching Distillation (DMD): both use score-based transport directions, but only differ in how the score is realized—drifting does so nonparametrically through kernel neighborhoods, whereas DMD relies on a pretrained diffusion teacher. 🤔 So what happens for the default Laplace kernel used in drifting models? Let’s look below 👇

English

207

32K

TechGeekDavid retweetledi

Yuxuan Xue@yxue_yxue·12h

Immediately makes HOI-Blender result much more photorealistic lol

AK@_akhaliq

DLSS-5 anything for free app: huggingface.co/spaces/victor/…

English

12.1K

TechGeekDavid retweetledi

Beth Kindig@Beth_Kindig·11h

While Nvidia’s $NVDA $1 trillion in AI chip visibility through 2027 may seem like the key takeaway from GTC, I’d argue there was another jaw-dropping stat intended to set the stage in the coming years. Although this stat has not seen the recognition it deserves, it foreshadows higher revenue as we exit the decade. Find out more in my upcoming newsletter – link in bio.

English

160

20.7K

TechGeekDavid@techpupparent·11h

@KirkDBorne @PacktDataML Interesting timing. AI pipelines demand integrated architectures. Wonder how Fabric handles tokenization and data compression at scale.

English

Kirk Borne@KirkDBorne·11h

The Definitive Guide to Microsoft Fabric — From discovery to building a unified, secure, and scalable data platform: amzn.to/3MdE1Xk v/ @PacktDataML Table of Contents: 🔶 Getting started with Fabric 🔶 From Lakehouse to First Analysis 🔶 Unifying Data in OneLake 🔶 Ingesting Data into Fabric 🔶 Advanced Data Transformation 🔶 Organize data: Data Warehouse vs. Data Lake 🔶 Processing & Analyzing Real-Time Data 🔶 Designing Semantic Models 🔶 Enterprise analysis and reporting 🔶 Using AI in Fabric 🔶 Collaborating as a team 🔶 Architecture 🔶 Securing Your Data Platform 🔶 Administer Fabric 🔶 Mastering & Optimizing Platform Costs

English

758

TechGeekDavid retweetledi

Microsoft Azure@Azure·11h

Planning an AWS to Azure cutover? Follow the five-phase lifecycle of plan, prepare, execute, evaluate, and decommission to reduce risk while preserve existing KPIs for an optimized post-migration foundation. Read the blog to learn more: msft.it/6012QquTo

English

4.7K

TechGeekDavid@techpupparent·11h

@mercor_ai @evaluatingevals @huggingface Smart move. Evals need standardization. Too many cherry-picked results obscuring actual progress. OSS approach is the right path.

English

Mercor@mercor_ai·1d

We just submitted APEX-Agents, APEX-1 and ACE to @evaluatingevals on @huggingface, an OSS initiative to standardize evals and try to reduce the noise in benchmarking.

English

8.8K

TechGeekDavid@techpupparent·17h

Watched my AI workflows wash away this year. Here's what stuck: implementations rot, mental models compound. Question isn't preservation - it's extraction. What do you learn while the tooling still works?

English

TechGeekDavid retweetledi

David Hendrickson@TeksEdge·1d

MiniMax M2.7 has released. It retains the $2.4 pricing at $2.5/1M and already up on @OpenRouter for testing.

Luke@ImLukeF

@MiniMax_AI M2.7 Let the testing begin.... Big fan of M2.5, so this is exciting!

English

1.2K

Keşfet

@AlexFinn @Xudong_Lin_AI @thatguybg @KirkDBorne @aakashgupta @mittu1204 @StefanoErmon @MoleiTaoMath