Mark Huang

567 posts

Mark Huang

@markatgradient

@Gradient_AI_ Democratizing AI. Former Quant. 🧑‍🍳 models and build agents. https://t.co/ZC0c6oBk3S

انضم Ekim 2013

231 يتبع922 المتابعون

تغريدة مثبتة

Mark Huang@markatgradient·14 Eki

@mdgarratt @chriszeoli @preet1rathi x.com/DeepSkyAI/stat…

Superagent@superagent

Excited to share that we’re officially joining @airtable - bringing along our entire core team with us! 🥳 Our mission remains the same - building DeepSky to help individuals power their entire business with AI. Stay tuned for updates and what’s next! 🔗deepskyai.substack.com/p/deepsky-is-j…

QME

707

Mark Huang@markatgradient·5d

@MillionInt Very true. Conviction, Relentlessness, Fearlessness breeds focus and unshakable focus. Often diametrically opposed to point in time commercial concerns.

English

766

Jerry Tworek@MillionInt·5d

Being brave can be a moat. Especially in a world where top management of most companies isn’t.

English

447

38.4K

Mark Huang@markatgradient·6d

@fendien PE + rollout up into AI native?

English

Jonathan Lehr@fendien·13 Mar

This is wild 🤯 PagerDuty net of cash is trading at a 0.2X ARR multiple. I don't even know how they exit at this point. It seems like a falling knife. Would a strategic buy them not for the revenue, but for the MSAs in their customer base for perhaps a quicker sell-through motion if they sell in an adjacent category?

Jason ✨👾SaaStr.Ai✨ Lemkin@jasonlk

PagerDuty now at $667m market cap on $500m ARR, so just over 1x ARR But it’s worse than that, as they have $550m in cash So enteprise value closer to $120m on $500m ARR Growth is 1%, customer count has not grown. You MUST accelerate today. This is ALL the markets care about.

English

2.5K

Mark Huang@markatgradient·13 Mar

@clarejtbirch @PrimeIntellect Additional reason is that because environment observations (tool call execution) is a user concern the batching provides better workload pattern for a user to control their sandbox service ie consider if the environment has slow cold start but fast subsequent calls.

English

Mark Huang@markatgradient·13 Mar

@clarejtbirch @PrimeIntellect Does Tinker support batched inference for higher latency but lower cost similar to model providers. Use case would be I’ve trained my model checkpoint and want to launch a suite of benchmarks on it. Allows Tinker to smooth consumption but offer users more throughput.

English

clare ❤️‍🔥@clarejtbirch·13 Mar

vv excited for @primeintellect day on Saturday! did you know you can use any of Prime's RL environments with Tinker?

Tinker@tinkerapi

@PrimeIntellect shares our mission of democratizing AI-training and has built a stack for RL including the Verifiers library. Verifiers’ integration with Tinker lets builders tap into Prime’s growing Environments Hub ecosystem, or create their own. x.com/willccbb/statu…

English

5.7K

Mark Huang@markatgradient·11 Mar

@karinanguyen I love the idea of postrainbench! Code generation is near costless so navigating uncertainty is the next state-action to hill climb.

English

595

Karina Nguyen@karinanguyen·11 Mar

Excited to release PostTrainBench v1.0! This benchmark evaluates the ability of frontier AI agents to post-train language models in a simplified setting. We believe this is a first step toward tracking progress in recursive self-improvement 🧵:

English

660

137K

Mark Huang@markatgradient·9 Mar

@Mascobot @harborframework @thinkymachines @daytonaio @tinkerapi Wait this is cool. How easy is time travel within a rollout with forking logic?

English

163

Marco Mascorro@Mascobot·9 Mar

🚨 New: Integrating Harbor (@harborframework) for end-to-end Computer-Use evaluation(for Windows and Linux) at scale with @thinkymachines' Tinker, OSWorld, @daytonaio, and bare-metal servers. We just added support for Computer Use, @tinkerapi, and OSWorld to Harbor - a framework for evaluating agents and generating RL training data by running large-scale rollouts across parallel sandboxed environments and collecting trajectories for SFT and RL. Repo and blogpost below 👇

English

130

18.9K

Mark Huang@markatgradient·7 Mar

@hewliyang Doing the lord’s work

English

Li Yang@hewliyang·7 Mar

added chatgpt to excel to reversing 19 tools (ostensibly) but nothing new here their sandbox is pretty involved and bundles a WASM build of QuickJS for the "sandboxed" code exec. data flows thru 3 realms (host <-> iframe <-> qjs) reimplementation of the sandbox + tools in the repo below

English

1.9K

Mark Huang@markatgradient·7 Mar

@hewliyang Fair enough 🙂

English

Li Yang@hewliyang·7 Mar

@markatgradient sounds like you are speaking on the UX side of things? i view observability as part of UX/app layer rather than the agent layer haha

English

Li Yang@hewliyang·6 Mar

OpenAI is training finetunes of Codex for financial tasks (probably excel) under the codename "basispoints" 👀

English

7.8K

Mark Huang@markatgradient·6 Mar

@hewliyang Maybe we’re aligned in that the interesting layer is the interplay on reward shaping for the underlying behavior or risk or opaque actions. I often get workbooks with varying degrees of values and traceable formulas and would prefer bias for formulas.

English

Mark Huang@markatgradient·6 Mar

@hewliyang I was thinking more from the fine grained workbook control perspective. Most lean heavily on coarse actions and use the code_execution + exec_officejs escape hatch. But if you’re doing multiturn with a human formula referential treatment on formulas is a must

English

Mark Huang@markatgradient·6 Mar

@hewliyang You think so? I was hoping would have more interesting state action space

English

Li Yang@hewliyang·6 Mar

@markatgradient yes sir. nothing too interesting though except for their sandboxing method. i think excel harnesses have more or less converged. all depends on the model now

English

212

Mark Huang@markatgradient·6 Mar

@swyx github.com/eigent-ai/eige… perhaps?

English

swyx@swyx·5 Mar

ok are there any open source Claude Cowork clones because I can no longer function without a cowork pls recommend or i will build

English

116

64.9K

Mark Huang@markatgradient·4 Mar

@adocomplete Interesting and context as to what prompted reverting back? Is "Ultrathink" now the only trigger world (previous there was also "think", etc.)

English

133

Ado@adocomplete·4 Mar

Ultrathink is back!

English

156

120

2.3K

228.8K

Mark Huang@markatgradient·4 Mar

@RLanceMartin QQ so does the new pre-filtering for web_search leverage PTC?

English

Lance Martin@RLanceMartin·27 Şub

x.com/i/article/2027…

ZXX

153

1.3K

261.4K

Mark Huang@markatgradient·4 Mar

@swyx lol Gartner need an inverse Gartner etf @joinautopilot

English

swyx@swyx·4 Mar

hows that prediction looking, gartner

swyx@swyx

TIL gartner, having missed the entire wave in ai engineering, is now calling the top in ai engineering pack it up folks its entirely over only downhill from here

English

116

31.3K

Mark Huang@markatgradient·3 Mar

@OfficialLoganK @GoogleDeepMind Impressive bump in multimodal. Any insight that you’re allowed to share about the weaker (vs Flash) behavior on FACTs and longer ctx

English

143

Logan Kilpatrick@OfficialLoganK·3 Mar

Introducing Gemini 3.1 Flash-Lite 🔦, a huge step forward on the boundary of intelligence, beating 2.5 Flash on many tasks.

English

362

222

3.3K

298.7K

Mark Huang@markatgradient·3 Mar

@JustinLin610 😢

QME

158

Junyang Lin@JustinLin610·3 Mar

me stepping down. bye my beloved qwen.

English

1.7K

740

13.6K

6.5M

Mark Huang@markatgradient·28 Şub

@bcherny Is this basically the next iteration of the code-simplifier plugin?

English

662

Boris Cherny@bcherny·28 Şub

/simplify Use parallel agents to improve code quality, tune code efficiency, and ensure CLAUDE.md compliance. Usage: "hey claude make this code change then run /simplify"

English

1.1K

131.6K

Boris Cherny@bcherny·28 Şub

In the next version of Claude Code.. We're introducing two new Skills: /simplify and /batch. I have been using both daily, and am excited to share them with everyone. Combined, these kills automate much of the work it used to take to (1) shepherd a pull request to production and (2) perform straightforward, parallelizable code migrations.