Mark Huang

567 posts

Mark Huang banner
Mark Huang

Mark Huang

@markatgradient

@Gradient_AI_ Democratizing AI. Former Quant. 🧑‍🍳 models and build agents. https://t.co/ZC0c6oBk3S

انضم Ekim 2013
231 يتبع922 المتابعون
Mark Huang
Mark Huang@markatgradient·
@MillionInt Very true. Conviction, Relentlessness, Fearlessness breeds focus and unshakable focus. Often diametrically opposed to point in time commercial concerns.
English
0
0
1
766
Jerry Tworek
Jerry Tworek@MillionInt·
Being brave can be a moat. Especially in a world where top management of most companies isn’t.
English
17
15
447
38.4K
Jonathan Lehr
Jonathan Lehr@fendien·
This is wild 🤯 PagerDuty net of cash is trading at a 0.2X ARR multiple. I don't even know how they exit at this point. It seems like a falling knife. Would a strategic buy them not for the revenue, but for the MSAs in their customer base for perhaps a quicker sell-through motion if they sell in an adjacent category?
Jason ✨👾SaaStr.Ai✨ Lemkin@jasonlk

PagerDuty now at $667m market cap on $500m ARR, so just over 1x ARR But it’s worse than that, as they have $550m in cash So enteprise value closer to $120m on $500m ARR Growth is 1%, customer count has not grown. You MUST accelerate today. This is ALL the markets care about.

English
3
0
5
2.5K
Mark Huang
Mark Huang@markatgradient·
@clarejtbirch @PrimeIntellect Additional reason is that because environment observations (tool call execution) is a user concern the batching provides better workload pattern for a user to control their sandbox service ie consider if the environment has slow cold start but fast subsequent calls.
English
0
0
0
34
Mark Huang
Mark Huang@markatgradient·
@clarejtbirch @PrimeIntellect Does Tinker support batched inference for higher latency but lower cost similar to model providers. Use case would be I’ve trained my model checkpoint and want to launch a suite of benchmarks on it. Allows Tinker to smooth consumption but offer users more throughput.
English
1
0
0
44
clare ❤️‍🔥
clare ❤️‍🔥@clarejtbirch·
vv excited for @primeintellect day on Saturday! did you know you can use any of Prime's RL environments with Tinker?
Tinker@tinkerapi

@PrimeIntellect shares our mission of democratizing AI-training and has built a stack for RL including the Verifiers library. Verifiers’ integration with Tinker lets builders tap into Prime’s growing Environments Hub ecosystem, or create their own. x.com/willccbb/statu…

English
1
1
65
5.7K
Mark Huang
Mark Huang@markatgradient·
@karinanguyen I love the idea of postrainbench! Code generation is near costless so navigating uncertainty is the next state-action to hill climb.
English
0
0
3
595
Karina Nguyen
Karina Nguyen@karinanguyen·
Excited to release PostTrainBench v1.0! This benchmark evaluates the ability of frontier AI agents to post-train language models in a simplified setting. We believe this is a first step toward tracking progress in recursive self-improvement 🧵:
English
43
90
660
137K
Marco Mascorro
Marco Mascorro@Mascobot·
🚨 New: Integrating Harbor (@harborframework) for end-to-end Computer-Use evaluation(for Windows and Linux) at scale with @thinkymachines' Tinker, OSWorld, @daytonaio, and bare-metal servers. We just added support for Computer Use, @tinkerapi, and OSWorld to Harbor - a framework for evaluating agents and generating RL training data by running large-scale rollouts across parallel sandboxed environments and collecting trajectories for SFT and RL. Repo and blogpost below 👇
English
11
19
130
18.9K
Li Yang
Li Yang@hewliyang·
added chatgpt to excel to reversing 19 tools (ostensibly) but nothing new here their sandbox is pretty involved and bundles a WASM build of QuickJS for the "sandboxed" code exec. data flows thru 3 realms (host <-> iframe <-> qjs) reimplementation of the sandbox + tools in the repo below
Li Yang tweet media
English
3
0
17
1.9K
Li Yang
Li Yang@hewliyang·
@markatgradient sounds like you are speaking on the UX side of things? i view observability as part of UX/app layer rather than the agent layer haha
English
1
0
0
31
Li Yang
Li Yang@hewliyang·
OpenAI is training finetunes of Codex for financial tasks (probably excel) under the codename "basispoints" 👀
Li Yang tweet media
English
3
2
66
7.8K
Mark Huang
Mark Huang@markatgradient·
@hewliyang Maybe we’re aligned in that the interesting layer is the interplay on reward shaping for the underlying behavior or risk or opaque actions. I often get workbooks with varying degrees of values and traceable formulas and would prefer bias for formulas.
English
1
0
0
38
Mark Huang
Mark Huang@markatgradient·
@hewliyang I was thinking more from the fine grained workbook control perspective. Most lean heavily on coarse actions and use the code_execution + exec_officejs escape hatch. But if you’re doing multiturn with a human formula referential treatment on formulas is a must
English
1
0
0
40
Mark Huang
Mark Huang@markatgradient·
@hewliyang You think so? I was hoping would have more interesting state action space
English
1
0
0
68
Li Yang
Li Yang@hewliyang·
@markatgradient yes sir. nothing too interesting though except for their sandboxing method. i think excel harnesses have more or less converged. all depends on the model now
English
1
0
0
212
swyx
swyx@swyx·
ok are there any open source Claude Cowork clones because I can no longer function without a cowork pls recommend or i will build
swyx tweet media
English
59
3
116
64.9K
Mark Huang
Mark Huang@markatgradient·
@adocomplete Interesting and context as to what prompted reverting back? Is "Ultrathink" now the only trigger world (previous there was also "think", etc.)
English
0
0
0
133
Ado
Ado@adocomplete·
Ultrathink is back!
English
156
120
2.3K
228.8K
Mark Huang
Mark Huang@markatgradient·
@RLanceMartin QQ so does the new pre-filtering for web_search leverage PTC?
English
0
0
0
44
Mark Huang
Mark Huang@markatgradient·
@OfficialLoganK @GoogleDeepMind Impressive bump in multimodal. Any insight that you’re allowed to share about the weaker (vs Flash) behavior on FACTs and longer ctx
English
0
0
2
143
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Introducing Gemini 3.1 Flash-Lite 🔦, a huge step forward on the boundary of intelligence, beating 2.5 Flash on many tasks.
Logan Kilpatrick tweet media
English
362
222
3.3K
298.7K
Junyang Lin
Junyang Lin@JustinLin610·
me stepping down. bye my beloved qwen.
English
1.7K
740
13.6K
6.5M
Mark Huang
Mark Huang@markatgradient·
@bcherny Is this basically the next iteration of the code-simplifier plugin?
English
0
0
4
662
Boris Cherny
Boris Cherny@bcherny·
/simplify Use parallel agents to improve code quality, tune code efficiency, and ensure CLAUDE.md compliance. Usage: "hey claude make this code change then run /simplify"
English
32
26
1.1K
131.6K
Boris Cherny
Boris Cherny@bcherny·
In the next version of Claude Code.. We're introducing two new Skills: /simplify and /batch. I have been using both daily, and am excited to share them with everyone. Combined, these kills automate much of the work it used to take to (1) shepherd a pull request to production and (2) perform straightforward, parallelizable code migrations.
Boris Cherny tweet media
English
432
842
12.9K
2.5M