chris

4.3K posts

chris banner
chris

chris

@hingeloss

optimism of the will, pessimism of the intellect. words at https://t.co/qJiOeUmgte

NYC Katılım Aralık 2016
1.2K Takip Edilen3.8K Takipçiler
Sabitlenmiş Tweet
chris
chris@hingeloss·
Presenting: the world's fastest AI voice chat - 500ms latency, running locally, 2x faster than anyone else. How is this possible? 👇
English
107
228
2.4K
664K
chris
chris@hingeloss·
@negroprogrammer Who's monitoring the situation for the monitoring the situation bar
English
0
0
1
47
Regynald
Regynald@negroprogrammer·
Any bad bitches at the polymarket bar rn? Trying to figure out the next move tn
English
1
0
11
593
chris retweetledi
zach
zach@zachleft·
zach tweet media
ZXX
270
19.4K
154.8K
2.2M
You Jiacheng
You Jiacheng@YouJiacheng·
By "not inference efficient", I mean both compute efficiency AND memory efficiency. Submission can somewhat "expand" its weight during evaluation and use much much more memory than its artifact size. In other words, 16MB artifact ≠ can run on 16MB memory edge device.
You Jiacheng@YouJiacheng

Will GPT-5.5 surpass top1? "16MB artifact size" will induce some parameter efficient but not inference efficient methods. This is also interesting, cuz it targets Kolmogorov complexity. (I'm not very interested in this setting tho)

English
3
1
24
3.1K
chris
chris@hingeloss·
@yungmetronome @ghosttyped soon you will learn which 737 exit rows don't recline and if you can feel the 787 air pressure difference and notice when the cute plushy store has a newly stocked model. Turn back now before it's too late
English
0
0
1
18
knight of cups
knight of cups@yungmetronome·
@ghosttyped I personally think so…best international coverage for vacay, and EWR SFO both hubs. United status for free premium economy is actually useful
English
2
0
6
189
Dylan
Dylan@textqldylan·
@TheEthanDing The world is going to look back and laugh at the times when people paid for seats to create a dashboard, and separately to view a dashboard.
English
1
0
5
515
ethan ding 📊
ethan ding 📊@TheEthanDing·
Seat-based SaaS stocks are tanking. Wall Street is scared. We decided to speed it up. Introducing $0/ Seat Dashboards by TextQL $0/viewer seats $0/editor seats $0/admin seats Unlimited seats. Forever. Our agents build dashboards directly on your Datawarehouses, APIs, MCPS. Even on top of Tableau and Power BI. Dashboards have always been built. From now on, they're generated. Build a dashboard and get $100 in credits Link below:
English
9
9
102
44.4K
chris
chris@hingeloss·
It's telling that Block's share price is back to 2019 levels after laying off 4000 people - but today with 50% more headcount than then. SWE dev velocity used to be a corporate bottleneck - more hands write code faster so you can ship features on time. LLMs remove that bottleneck. What are the limiting factors now? - Product ideation - can you come up with good product ideas consistently? Just because you can write infinite slop, doesn't mean you should ship it - Experimentation - if you're shipping more ideas, you need to effectively A/B test and understand which ones are doing well - Legal/regulatory/compliance review - 'moving fast' is particularly frowned upon by fintech regulators - TAM - if your customers can't spend, building more features can't unlock more value Jack is right that smaller teams leveraging AI can match the velocity of larger teams. But large teams with AI should be able to expand **even more**. Drastic staff cuts (or CRM's $50B buybacks) is an admission that you're bottlenecked somewhere else now, and you don't know how to fix it. Highly ambitious companies with lots of surface area to tread will continue to find ways to utilize their employees.
chris tweet mediachris tweet media
English
0
0
12
1.4K
chris
chris@hingeloss·
@shahahmed silver! but it's permanently covered by one of these pot lid holders :)
chris tweet media
English
0
0
1
45
chris
chris@hingeloss·
Only stocks I'm worried about on this blizzard day
chris tweet mediachris tweet media
English
1
0
16
741
chris
chris@hingeloss·
@canzhi the snow is getting temperaturemogged
English
2
0
200
13.2K
Canzhi
Canzhi@canzhi·
NOTHING STICKING TO THE GROUND METEOROLOGISTS ONLY KNOW SKY THEY DONT KNOW GROUND
English
25
76
3K
224.9K
chris
chris@hingeloss·
@zephyr_z9 Now we just need a good voice model
English
0
0
3
440
chris
chris@hingeloss·
@canzhi Not enough N to smooth over?
English
1
0
0
360
Canzhi
Canzhi@canzhi·
why do we always talk about hedging as if it's free are team owners really this poor that they need to lock into these -ev trades
Tarek Mansour@mansourtarek_

On sports hedging. The sports insurance and re-insurance industry is big: the annual market is around $9 billion and is projected to double by 2030. There are a variety of insurance products including brand sponsorships, game cancellations, team/player performance, off player compensation, and more. We just announced a partnership with sports insurance broker Game Point Capital. Game point capital issues hundreds of millions in sports insurance per year. Their most popular product is team and player performance bonus insurance: sports teams often structure large payouts to coaches and players that get triggered if they achieve certain milestones (winning championship, making the playoffs, scoring records, etc.). The bill is often large and teams smooth out their finances by hedging it. Game Point hedged for two different teams against performance bonuses on basketball with Kalshi last week: 1. One is a hedge for a bonus if the team makes the post-season (Kalshi price=6%, OTC price ~ 12-13%) 2. One is a hedge for a bonus if the team advances to the second round (price= 2%, OTC price ~7-8%). Why did they do this on Kalshi? Insurers like Game Point need to offload the risk they take on somewhere else. Typically, they go to traditional re-insurance companies like Lloyd's of London, which is Over-the-counter (OTC): you negotiate price and terms 1:1 with them (instead of an open/competitive market). Like in all OTC markets, the issue is the re-insurers are restrictive in what risks they take: they like to avoid volatile, higher risk contracts, so they offer prices that are opaque and prohibitively high. Exchanges are a better alternative because they expand liquidity and bring competition: multiple counterparties compete in an open marketplace to improve the price. Exchanges are harder to build than OTC because they need to have enough liquidity. Over the past year, we’ve massively increased the liquidity on our sports markets. During the Super Bowl, Kalshi could have processed a $22 million trade without moving the price meaningfully. At this level of liquidity, Kalshi is now very attractive for Game Point and other similar companies: there’s more liquidity available, it's cheaper, and the price is more transparent. We expect to process tens of millions in similar hedges from Game Point alone in the coming months. Onwards.

English
7
1
34
10.8K
chris
chris@hingeloss·
lucidrains github gone? RIP sweet prince
chris tweet media
English
13
11
126
22K
chris
chris@hingeloss·
@maxsloef There is a cleaner trade you can do 😉
English
1
0
3
360
max!
max!@maxsloef·
is anyone running a fund that manufactures synthetic anthropic exposure through public markets? the trade: long ZM, short a SaaS basket, lever up the spread. zoom owns ~0.8% of anthropic ($3B embedded in a $27B company). hedge out the core business and you're left with pure anthropic + cash. i would buy in, secondary market is a PITA
English
2
0
9
710
chris
chris@hingeloss·
@TheEthanDing Groundhog Day: Benioff seen cope tweeting, another 6 weeks of shorting CRM
English
0
0
0
153
chris
chris@hingeloss·
@leerob cool, pretty close! and yeah I think being able to compare Composer models vs the rest is useful. congrats on the release!
English
0
0
0
231
chris
chris@hingeloss·
Cursor's Composer-1.5 claims to be the best coding model - without any third party benchmarks. Is there another player in town?? On TerminalBench 2, I see 48.3% +/- 6% (*). Comparisons on the high end: GPT 5.2, Opus 4.5, and Gemini 3 Pro. On the low end: Sonnet 4.5 and Kimi 2.5. Pretty solid showing - would support a view that Cursor is on par with the Chinese frontier and about 3-4 months behind the US frontier. More benchmarking necessary tho! (*) - higher variance because each task was only run once. I ran this with the Cursor CLI harness, which also affects output quality, and we have no comparisons otherwise. This took 150M tokens (144M cache read, 2.5M output) for API pricing of $25 or ~40% of my $20 Cursor sub (implies sub is ~3x cheaper than API). Composer 1.5 is priced in between Sonnet and Opus tier too, also suggesting that Cursor ranks it similarly, internally.
chris tweet mediachris tweet media
English
9
1
81
14.6K
chris
chris@hingeloss·
@F2aldi i dont think the claude models are particularly benchmaxed (ie in my vibe tests they perform better than would be implied by these evals). this is also a single eval. my guess is that it's in the ballpark of sonnet, which is very impressive!
English
0
0
0
256
λL-D1 | AI for Buzzer 🍉
@hingeloss Interesting, does this mean this slightly better than sonnet? base on TerminalBench 2 ? but also interesting while we see that Sonent 4.5 have low score compare to Gemini 3 Flash or even GPT-5.
English
1
0
0
652
chris
chris@hingeloss·
@kylejeong I used their own agent harness. And yes, internal benchmarks are fine, but not very helpful when it only compares against composer 1 without any other datapoints?
English
1
0
2
787
Kyle Jeong
Kyle Jeong@kylejeong·
@hingeloss the model is prob RL'd in the cursor agent so terminal bench maybe not the best benchmark? they have their own internal benchmark but thats not really a good evaluator either
English
2
0
2
972