Chris Clark

1.3K posts

Chris Clark

Chris Clark

@cclark

Co-founder & COO @OpenRouterAI

Charleston, SC Katılım Nisan 2007
717 Takip Edilen1K Takipçiler
adam thurlow
adam thurlow@5rb6jj7wtx·
@deedydas @cclark Right, but surely it read existing engines, I’d question the use of the word novel here
English
1
0
0
61
Deedy
Deedy@deedydas·
I just "vibecoded" a Chess master (~2250 ELO) from scratch that runs locally on a Mac in Rust. I used to play chess semi-competitively, and I'm flabbergasted that you can just speak a 98% percentile chess engine into existence.
English
57
48
665
77.7K
Chris Clark
Chris Clark@cclark·
thanks to coding agents it's never been easier to get started, and never been harder to get finished.
English
0
1
4
137
Chris Clark
Chris Clark@cclark·
In a moment of frustration, I banned my 8-year-old from saying “I’m bored” and he now has to say “time to figure out a new activity” and it’s been weirdly effective. Also I’ve threatened to take away dessert if he says it. That also is def part of the success recipe.
English
0
0
1
84
Chris Clark
Chris Clark@cclark·
Looks great! I have not read the Chinmayananda version, but I have the Easwaran translation of the Gita and it seems more approachable. Not sure if it's public domain though. Chinmayananda: What did the sons of Pandu and also my people do when, desirous to fight, they assembled together on the holy plain of Kurukshetra, O Sanjaya? Easwaran: O Sanjaya, tell me what happened at Kurukshetra, the field of dharma, where my family and the Pandavas gathered to fight.
English
1
0
3
502
Deedy
Deedy@deedydas·
These three books outsold every novel, outlasted every empire, and are the calling for 70% of the world. But you can't find a good copy on the internet. And you can't take them offline with you. And you can't read them on a plane. So I made a little thing:
Deedy tweet media
English
28
12
283
36.1K
Chris Clark
Chris Clark@cclark·
If the east wing ballroom had been constructed by Obama, what kind of impact would that have had on the plot of White House Down?
English
0
0
2
77
Chris Clark
Chris Clark@cclark·
Bullish on the Workdays of the world. Well-structured line-of-business software and effective systems of record are not going anywhere. Good data structures, with mature APIs, are the perfect systems for agents to interact with, and not create a mess in their wake. AI doesn't need to live inside the tool, and building properly governed enterprise software is not trivial.
English
1
0
2
102
Chris Clark
Chris Clark@cclark·
With models training on other model outputs — feels like only a matter of time — before model outputs — are primarily — dominated — by ———
English
0
0
3
75
Chris Clark
Chris Clark@cclark·
@thdxr @pingToven @charlesdotai @alexatallah Credit where credit is due - I think you noticed and did something about tool call variability between providers before anyone (including us) understood it well. We wouldn’t be at this point without that work. Thank you!
English
0
0
1
16
Chris Clark
Chris Clark@cclark·
need a word for 'mindshare' but for llms & agents. 'weightshare'?
English
0
0
1
67
Chris Clark
Chris Clark@cclark·
me: *replaces em-dash with semicolon* ...they'll never know
English
1
0
1
359
Chris Clark
Chris Clark@cclark·
You need an insane level of precision to responsibly scale digital ad spend. But it's just not necessarily if you're not spending in that channel. As a result, I'd posit that businesses that rely on direct-response ads have wildly better internal analytics than those that don't.
English
0
0
2
81
Chris Clark
Chris Clark@cclark·
sorry buddy :( that does sound tricky > Migrate from Next.js to vinext ⏺ This is a large, complex monorepo with a heavily customized Next.js setup ... ⏺ The project has heavy Next.js customization ... ⏺ This is a highly complex migration
English
0
0
3
163
Chris Clark
Chris Clark@cclark·
Yes open weight models eventually catch today's frontier, but with each frontier improvement there is massive TAM expansion. Open weight will take the tokens of yesterday's agents, but b/c of the TAM expansion it's not market limiting for the frontier labs; they don't need to 'stack the cohorts' to continue growing exponentially.
English
1
0
1
84
Toven
Toven@pingToven·
@thdxr would've been a banger
Toven tweet media
English
1
0
3
972
dax
dax@thdxr·
we have a channel for posts we stopped ourselves from posting on x it's so good
dax tweet media
English
29
3
480
26.8K
Chris Clark
Chris Clark@cclark·
Two trends at the tipping point, driving massive @OpenRouter growth: 1. The fundamental unit of AI work has shifted from "text completion" to "agentic loop". More requests, with more context, to smarter models. 2. Everyone is using agents (not just developers). As recently as the fall, only devs had agentic tools. Now analysts, marketers, consumers, etc are all interacting with agentic systems. Higher average unit price + expanding TAM = outsized growth.
English
3
6
19
5.4K
Chris Clark
Chris Clark@cclark·
Sometimes I think "How hard could it be to find a diamond? Go to a place they might exist and look around." but then I lose my cell phone in my 200sqft hotel room.
English
0
0
3
104
Chris Clark
Chris Clark@cclark·
The knowledge and attention to detail of the @OpenRouter team
Toven@pingToven

re: what “OpenAI compatible” actually means in 2025 through the lens of gpt-oss reasoning_effort the term gets used a lot across the industry, and it carries way more implied guarantees than it should historically, “OpenAI compatible” has meant support for an OpenAI-style chat completions API. same high-level schema, same messages array, same basic parameters. this shape became the default largely due to OpenAI's first-mover advantage that API worked well enough to start. it was built quickly, for the moment, before tool calling, hybrid reasoning, structured outputs, or multimodal inputs were common. it's hard to fault the original design given the capabilities at the time but once you try to apply that same API shape across 60+ providers, inference engines, and model families, the cracks show up very fast. the practical result is that the same request can succeed, fail, or subtly change model behavior depending on where it runs. let's start with the messages array: on paper it's simple: ordered turns, each with a role and some content. in practice, this array is handled wildly differently. some providers support arrays of content types per turn which is typically used with text+image in a single turn, but could just be multiple text strings. others throw errors when you try to do this - others silently concatenate. some models were trained for it, others weren't, which means you may get degraded performance even when the request “works.” role ordering is another source of variance - for example, some providers accept a messages array with a single turn of system role, while others require at least one user turn. some allow assistant prefill and correctly continue generation - others allow prefill but with a specific parameter, and that parameter differs. others ignore it or throw errors. all of this can happen on the same model depending on where it's hosted that's before you touch sampling parameters even temperature ranges differ - some cap at 1, some allow higher. logprobs can come back in different shapes. newer OpenAI models don't allow modifying temperature or top-p at all, while open-source models still rely on them heavily. compatibility here often means 'best effort' structured outputs add another layer json object vs json schema, partial json schema support, streaming sometimes supported, sometimes not. some providers support reasoning plus structured outputs, others don't tool calling is where layers of variance really add up: tool calling is structured output plus special tokens plus a parser plus chat templating plus finish reasons. tool parsers are frequently incorrect, and that's not always the provider's fault - even when a model lab works with the popular engines like vllm and sglang, we see tool call parser issues well after launch. the kimi k2 vendor verifier project uncovered various problems in the inference engine implementations weeks after model launch tool_choice support varies by model and provider. auto usually works. none often breaks in subtle ways. forced is rare. function-by-name works on some stacks and not others. finish_reason=tool_call is not guaranteed even tool call IDs are inconsistent - regex expectations differ, length limits differ. reuse the same ID across providers and you will eventually hit a hard error somewhere. at this point, "compatible" describes the shape of the request, not the semantics reasoning_effort is a newer example of the same pattern, brought to the limelight by @xeophon's work with gpt-oss benchmarks. he surfaced a measurable variance caused by provider-level incompatibilities OpenAI introduced a new enum parameter, reasoning_effort. initially the enum was low, medium, high. then minimal, none, extra high. gpt-oss only supports a subset (low, medium, high). meanwhile, most open source reasoning models only support reasoning enabled or disabled (think GLM family, DeepSeek after v3.1). when gpt-oss released, none of the inference providers had support for the reasoning_effort parameter, as it was mostly used on OpenAI's proprietary models. everyone rushed to launch gpt-oss, and the parameter and its impact on the amount of reasoning was swept under the hood for months. eventually, @xeophon published a tweet of a benchmark he ran through OpenRouter showing a ton of providers not changing the amount of reasoning based on the effort value sent to the OpenRouter API. Xeophon originally blamed the providers, but as soon as I saw it I realized the issue was largely our fault - we hadn't implemented support for it for each provider. this was a miss on both the OpenRouter team and the providers given how new and underspecified the parameter was - to this day, most providers do not have the parameter or supported values documented anywhere, and most providers did not communicate with us when support was added. we often have to chase teams down to tell us about their apis, as we need a deep understanding of implementations to be able to properly support user intent being transformed to upstream acceptable values. the fix for this could have been that we added piecemeal support for each provider - but i wanted to avoid this problem entirely in the future, so instead i spent a few weeks refactoring a ton of code to implement model-level reasoning configs. this means that in our database i can now specify a few things: - whether a model supports reasoning effort - what values of the enum it supports (out of none/minimal/low/medium/high/xhigh) - and what the default value should be this solves multiple issues for us in a scalable way: - it prevents upstream APIs from throwing errors if we were to pass reasoning_effort param to models that don't support it - it prevents upstream APIs from throwing errors if we were to pass a value that is not supported - and it normalizes the default effort value across providers to ensure that if the user doesn't specify, the behavior is consistent. once this model level config implementation was done, we still needed to ensure that we were plumbing the values into the right fields for different providers that expected it in different places. some expect it in chat template kwargs, others in a top-level reasoning object. once all this was done, @xeophon was able to get consistent reasoning effort behavior across all providers (minor caveat for bedrock API, which we fixed after communicating with their team about their unique param implementation.) so now, for reasoning_effort specifically, the OpenRouter experience should be much better. and in the future, any new models with effort support will be much easier for us to support, and we will be able to move more quickly with them. and all of that's just the request side. here’s a quick, non-exhaustive sample: token accounting differs - counts for cached tokens, reasoning tokens, image tokens etc differ. sometimes reasoning is returned separately. sometimes wrapped in tags. finish reasons are not consistent even across proprietary APIs. what we do at OpenRouter all day is deal with this reality. we benchmark constantly and try hard not to change model behavior. and we recently added debug tooling so you can see the exact upstream request if something looks wrong (check that out here: #debugging" target="_blank" rel="nofollow noopener">openrouter.ai/docs/api/refer…) special thanks to @xeophon for digging into this, working super hard on it, and helping our team fix the issues on our API and even help upstream providers realize how important it is to document these nuances. the takeaway isn't that anyone is doing things badly. this space is just evolving faster than the original API shape ever anticipated. most of this complexity is invisible when things work, and painfully obvious when they don't. if you find this stuff interesting, or if you enjoy untangling invisible incompatibilities that break real systems, we're hiring!

English
0
0
3
182