
I’m getting more and more interested in prediction markets.
This MM wallet is all over my feed: @k9Q2mX4L8A7ZP3R" target="_blank" rel="nofollow noopener">polymarket.com/@k9Q2mX4L8A7ZP…. On top of its impressive PnL (~$1.5mn on ~$175m volume since Dec ‘25 at time of writing) + high sharpe, it’s also farmed ~$75k of maker rebates.
It’s a sophisticated strat which I don’t think is replicable with vibecode. I think there are a couple main ways to have edge here.
1) Better latency. Are you the fastest at picking up crypto moves from the leading venue and transmitting that to Polymarket’s AWS regions in EU-West-1/2? Part of this is literally just buying the fastest lines from providers and sending the data down those, but at this stage of the game, I’m pretty sure that’s not enough. I have to redact a lot of the specific techniques here because every firm might not be doing all of these (and we might be missing a couple too), but the sophisticated firms will have invested a lot of time and resources into optimizing the data feeds from various exchanges, which could happen anywhere from the main crypto exchanges in Asia to CME futures in Illinois, to get information about that tick first.
2) Better modeling. What if you don’t have the fastest line, but with the data you do have, you’re better at generating alpha? I think this is secondary to latency, but the general thesis is that retail traders are speculating on contract prices at some future point in time, but these contracts are basically digital options priced in greater size on Deribit (small put/call spreads divided by strike difference -> the derivative of this surface taking the limit), so piping that into your pricing model should generate a fair value for these bands which could be pretty dislocated from where it’s trading. The problem is the massive noise in backing out the implied 10 minute vol from a 1 day expiry option. I tried modeling this, but didn’t put it into production - blending in features from a model like HAR RV could work pretty well (this should weight the diurnal fluctuations more, so you can price things like the US open/close better). This prices the ATMs pretty well but underweights the wings - empirical return fits alone miss some combination of tail-insurance demand, spot/vol correlation, and vol-convexity premia. There could also be lower hanging fruit in trades which people aren’t really looking at - what if there’s a Kalshi <> Polymarket arb? (I sanity checked this too and after fees and liquidity on different Kalshi buckets, it’s not worth my time).
This isn’t the area of quant research that I focus on, but for those interested, the HAR-RV is basically doing 3 things:
1) Looking at intraday realized variance. I pulled a week of 1-min closes and looked at log returns, aggregated this into 288 5-minute blocks of realized variance (which is just the sum of squares of returns in that block).
2) Intraday vol is basically a U-shape but can quantify this - average the realized variance (RV) for each 5 minute bucket across the 7 days, and normalize these averages so they average to 1. You can then smooth this U curve with something like a 3 point rolling kernel. If a bucket has a value of 4.5, it has 4.5x the average variance (e.g open). If it's 0.5, it’s a quiet APAC morning with less vol. We can then see what the deseasonalized variance is by dividing out this factor to get the deseasonalized block RVs (DRVs).
3) We can then aggregate these DRVs into daily and weekly averages and fit a linear model to predict the next day’s DRV. The coefficient of the daily term is intuitively saying “yesterday was volatile so today should be too” - vol is autocorrelated, and the coefficient of the weekly term is saying vol tends to mean revert. For my fit, this was -0.378 -> the weekly average is elevated but we expect reversion tomorrow.
Putting these together, you can “deaggregate” this predicted DRV by dividing by 288 to get back the 5 minute variance, and sum the variances in the contract expiry window multiplied by the factors in 2). Annualizing this then gives a decent-ish forecast for the vol in that window.
To caveat all of this, the modeling is all trying to forecast realized variance using a real-world (P measure) distribution, and then mapping this into a price, but this will differ from market prices because of risk premia/convexity, and we aren’t modeling using this risk neutral distribution (Q-measure), but I did adjust the model to fit closer to it.
You can inject skew into this model too. If you took those 5 minute buckets and computed an empirical CDF, that would be an improvement, but over a week, there are ~2015 buckets, so at 3sigma that’s around 5, and around 0.13 at 4sigma, so way too noisy to fit to this. A student-t replacement for the normal would be better, where a degrees of freedom parameter is fitted from the data with an MLE estimator. Roughly, a kurtosis of 5 puts dof at 5.2, which is more fat tailed. The problem with this is that the tails are symmetric, which is not what we observe - downward moves tend to be sharper than upward. The market tends to be structurally net long on leverage with positive funding rates, so we see more liquidation cascades to the downside. Any bad news, like hacks, new regulation, macro tends to cause instant sharper reactions, whereas good news like more adoption, ETF approvals etc. tend to play out more slowly. The downside also tends to be structurally less liquid than the upside - more people sell into strength when the market gaps up. This is all part of the risk neutral measure world we need to add adjustments for. Using something like Hansen’s skew-t can improve the fit to return asymmetry, but it still won’t fully capture the dynamic surface effects that drive listed skew in practice.
English









