jon becker

3.5K posts

jon becker

@beckerrjon

any sufficiently advanced technology is indistinguishable from magic. senior software engineer @coinbase

New York, NY Katılım Aralık 2019

829 Takip Edilen5.3K Takipçiler

Sabitlenmiş Tweet

jon becker@beckerrjon·18 Oca

0/ i analyzed every single trade on kalshi from 2021 to 2025. i found a systematic wealth transfer where "takers" pay a massive premium for affirmative outcomes, and "makers" harvest the edge without needing to predict the future. here is the data

English

139

31.9K

jon becker@beckerrjon·5d

@ArchiveExplorer 🫡

QME

125

Archive@ArchiveExplorer·5d

This guy works at coinbase. in his spare time he built the largest open dataset for polymarket and kalshi 72,100,000 trades. 7,680,000 markets. open source - figured out who actually makes money on prediction markets - wrote a research paper with the math to prove it - 3,100 stars on github just for that also built heimdall-rs - rust toolkit for evm bytecode analysis. 1,500 stars. serious dev 3,008 contributions in the last year while everyone's posting takes about polymarket - he's building the infrastructure underneath it → github.com/Jon-Becker like + bookmark. you'll need this when you build your first polymarket bot

self.dll@seelffff

x.com/i/article/2049…

English

116.2K

jon becker@beckerrjon·5d

@krtbgb @ArchiveExplorer i was wondering where the followers were coming from

English

Kurt Bugbee@krtbgb·5d

@ArchiveExplorer Can confirm @beckerrjon is a legend

English

146

jon becker@beckerrjon·1 Nis

@AlterEgo_eth nice

English

120

Alter Ego@AlterEgo_eth·31 Mar

I found the best GitHub dataset for trading on Polymarket 36GB of real trade history from Polymarket and Kalshi - open source and free to use If you want to build strategies on real data rather than guessing, this is where to start Jon-Becker/prediction-market-analysis The largest public dataset of Polymarket and Kalshi trade history. Researchers are already publishing papers on top of it - and you can use the same data to build your own strategies Repository: github.com/Jon-Becker/pre… What's inside > Full trade history from Polymarket and Kalshi > Market metadata across all markets > Tools for collecting fresh data via API and blockchain >Framework for running custom analysis scripts How to get started > make setup - downloads and extracts the dataset > make index - collects fresh data from the API > make analyze - runs analytics, results saved to output/ Output includes PNG, PDF, CSV and JSON files - ready to use for strategy building

Alter Ego@AlterEgo_eth

Best GitHub repository for trading on Polymarket right now TauricResearch/TradingAgents lets you build a complete trading system - from market analysis to the final position decision It's an entire trading firm made of LLM agents, each playing their own role: • Fundamentals analyst evaluates data and metrics • Sentiment analyst monitors market mood • Technical analyst looks at patterns and indicators • Bull and bear debate each other before every trade • Risk manager evaluates the position and gives final approval A position only opens after all agents have reached a decision How to use it on Polymarket: • Pick an open market: "Will the Fed cut rates in June?" • Feed the question to the agents instead of researching it yourself • Fundamentalist pulls macro data, sentiment scans what's being said online • Bull and bear argue the position from both sides • Risk manager sizes the position based on the debate • You get a reasoned decision with full context, not a gut feeling Supports GPT-5, Claude, Grok, Gemini - you can mix models for different tasks Repository: github.com/TauricResearch…

English

670

129.5K

jon becker@beckerrjon·1 Nis

@jdietztweets hey! yeah, that could definitely be a factor. financial markets tend to be drier topics with clearer resolution rules, and participants can usually make more informed trades since they’re tied to concrete, quantifiable outcomes (e.g. spx closing above x)

English

Jeremy Dietz@jdietztweets·1 Nis

@beckerrjon Your Kalshi microstructure research showed Finance markets are nearly efficient (0.17pp maker-taker gap) while World Events hit 7.32pp. You attributed it to participant selection but could resolution complexity also a factor in? For instance, finance markets have unambiguous triggers (S&P closes above X) vs. geopolitics markets have layered exclusions and edge cases etc. I'm focused on building enrichment data such that resolution data is better structured.

English

Jeremy Dietz@jdietztweets·1 Nis

Something I've been working re: prediction markets...🧵

English

jon becker@beckerrjon·25 Mar

@androolloyd @Yaugourt lfg!!!

androolloyd.hl@androolloyd·25 Mar

@Yaugourt You paved the way champ. Also shoutout to the heimdall-rs devs.

English

590

Yaugourt.hl@Yaugourt·24 Mar

Update on HIP-4, V2 contract found. After posting the V1 research, @androolloyd and I kept digging. He reverse-engineered a second contract deployed at genesis by the same team wallet. V2: 0x6d86b21e853758F5719408633e6BcB2cfd50cf07 Team wallet: 0xe21c78037329d06fe0d6fefc4221aaa67cb0d135 Full bytecode decompiled, Solidity reconstructed. 24/24 function selectors verified against on-chain bytecode. Important: All live prediction markets currently trading on HyperCore are linked to the V1 contract, NOT V2. V2 has no active markets yet. It exists on-chain at genesis but appears unused so far. What changed: Security V1 had no protection. V2 adds the full OpenZeppelin stack: Ownable2Step (2-phase ownership transfer), ReentrancyGuard on every financial function, Pausable as a circuit breaker. Claim system reworked, V1 leaf: hash(contestId, sideId, address). V2 leaf: hash(index, recipient, amount). Payout amounts are now inside the Merkle proof. Enables ranked payouts, weighted rewards, not just proportional splits. Bitmap tracking instead of mappings, cheaper gas. Fee model V1: Hardcoded 0.9% + sweepUnclaimed takes everything. V2: admin publishes a rewardPool with the Merkle root, withdraws precise amounts via withdrawPlatformFee. Fee can vary per contest. Deposit now takes a deadline param. Protection against stale mempool txs. Same: Same owner, same genesis, HYPE only, renounce disabled, zero interaction with CoreWriter or precompiles. Also cracked V1's mystery selector 0xb2447e34, it was withdrawPlatformFee all along. V2 research → liquidterminal.xyz/hip4/home Credit @androolloyd for the V2 decompilation 🤝 Hyperliquid

Yaugourt.hl@Yaugourt

Yesterday I posted about HIP4 being the first HIP to use HyperEVM. Full research → liquidterminal.xyz/hip4/home HIP4 has no official documentation. No verified source. No ABI. So we reverse-engineered the contract from bytecode and calldata on testnet. What we mapped: → Full reconstructed ABI (selectors, signatures, access control) → Every event (DepositReceived, Claimed, ContestCreated, ContestFinalized, MerkleRootPublished) → All revert strings mined from bytecode → Storage layout (owner, mappings, initialization flags) → Complete contest lifecycle: createContest → deposit → publishMerkleRoot → claim → sweepUnclaimed → Bridge architecture L1↔EVM (asset index formula, outcome token mapping) → Real decoded testnet transactions → JS + Python code examples Some findings: - Pre-deployed at genesis, not a standard deployment - renounceOwnership always reverts, admin is permanent by design - Merkle-based claims, 0.9% platform fee on reward pool - Three market types: custom, priceBinary, recurring liquidterminal.xyz/hip4/home Testnet only. This is v1, early test from the team, raw design, and some things might be off. Nothing is final. If you spot errors or have insights, feedback is very much appreciated. Hyperliquid.

English

142

23.6K

jon becker@beckerrjon·16 Mar

@WhiteHatMage heimdall mentioned 🥹

English

188

WhiteHatMage@WhiteHatMage·16 Mar

Here are some thoughts after spending many long sessions reading bytecode, decompiled Yul, and decompiled Solidity: . EVM programs are simple, and so is the generated bytecode. Security by obscurity doesn't really work. . Current decompilers work quite well. I'd pick Heimdall for Solidity and sevm for Yul. . Decompilers aren't perfect, though. I also ran into bugs that produced incorrect outputs. . Reading decompiled code or raw bytecode takes far more effort than high-level source code, and it gets exhausting quickly. . There are many unnecessary checks and conversions that could be stripped out to make the logic clearer when hunting for business logic bugs. --- . Most serious projects verify their contracts. Still, I believe checking the deployed bytecode is worth the effort for contracts holding really big bags. . Any bugs in verified contracts would most likely only come from compiler issues. . Compilers keep evolving, and newer versions may fix previously unknown bugs. However, any vulnerable bytecode that's already deployed on the blockchain stays exactly the same. . For older contracts, I'd cross-check their deployed bytecode against the verified source code. --- . There are still plenty of unverified contracts out there. . Some publish their code on GitHub. Others choose not to, like certain CEX-related contracts. . The rest tend to be on small side-chains or from smaller projects. Most of them don't offer any bug bounties. --- . Detecting flawed access control is trivial once you decompile the bytecode. . I believe you could build a robust static analyzer on top of the decompiled code without much effort -- or even an AI-powered one. . There are no strong incentives for good actors to do so, though. Projects with bounties mostly have verified code. Only blackhats would be motivated to build such tools. . Building something like this could be a good candidate for a grant to secure a chain, although operating it might be complicated. --- . Vyper produces much cleaner bytecode than Solidity. --- Overall, I learned some tricks even though it wasn't the first time I've analyzed decompiled code, and I gained a deeper understanding of where certain specific bugs might appear. I'd recommend it to everyone interested in understanding EVM programs better. I'd also advise developers working on projects with millions at stake to do a manual review of their old deployed codebases. There's always more than meets the eye when checking the actual bytecode.

WhiteHatMage@WhiteHatMage

I'll take a week to perform an interesting and probably stupid experiment: Hunting for live EVM bugs by checking the deployed bytecode. I'm allowing myself to cheat a little bit by checking the verified code to quickly understand what's going on. I'll also use a Yul decompiler for complex contracts and try a disassembler for simpler ones. There are critical contracts out there holding really big bags that are worth the effort. My main goal though is just to understand what's going on under the hood, and maybe get some inspiration for any potential unknown vectors. Also for understanding what's needed to get a clean input for any automated tools to perform further analysis. I don't expect to find any bugs honestly. It will be painful, but fun at the same time. I just love having the freedom to navigate any crazy paths I choose 🧙‍♂️

English

9.9K

jon becker@beckerrjon·3 Mar

@martkiro @Cloudflare lfg!!

jon becker@beckerrjon·25 Şub

@Alpha_618 @martkiro @Polymarket this is orderbook data

English

116

Cryptx@Alpha_618·25 Şub

@martkiro @beckerrjon @Polymarket How does this dataset differ from Jon’s original post?

English

104

Martin@martkiro·24 Şub

I just published a data dump of full order book data from @Polymarket The data is maximally granular. There is no filtering whatsoever. Every order book change and trade is saved. Across all markets Updates are hourly. Each snapshot contains ~30M rows. Snapshots are downloaded as parquet files. Each file is approx. 500MB-1GB large. The data dump is already 2B+ rows large and growing fast. But this is just part 1/3. Coming soon is a much bigger dump that also includes @Kalshi / @opinionlabsxyz / @trylimitless etc I started collecting this data because I noticed I couldn't get it from Dome API. Their historical order book data was filtered limiting its usefulness. Also now with the acquisition there's a lot of uncertainty about whether they will continue operating

English

110

104

1.3K

163.8K

jon becker@beckerrjon·25 Şub

@C0NWIC @martkiro @Polymarket i don’t include orderbook data

English

Conwic@C0NWIC·25 Şub

@martkiro @Polymarket x.com/beckerrjon/sta…

jon becker@beckerrjon

added polymarket data to the public dataset. 400m+ trades going back to 2020. 36gb compressed. MIT licensed, free to download via @Cloudflare R2.

QME

1.4K

jon becker@beckerrjon·13 Şub

@_ilovemodels @Cloudflare W this is massive

English

i love models@_ilovemodels·13 Şub

@beckerrjon @Cloudflare Its live! x.com/_ilovemodels/s…

i love models@_ilovemodels

Yesterday @beckerrjon did gods work by indexing and sharing 50gb of historical Polymarket and Kalshi data. I vibe coded an app to query and visualize this data in natural language. Open source, works with any AI model. GitHub repo in the next tweet.

English

194

jon becker@beckerrjon·10 Şub

added polymarket data to the public dataset. 400m+ trades going back to 2020. 36gb compressed. MIT licensed, free to download via @Cloudflare R2.

English

129

242

759.8K

jon becker@beckerrjon·13 Şub

@_ilovemodels LFG

192

i love models@_ilovemodels·13 Şub

jon becker@beckerrjon

added polymarket data to the public dataset. 400m+ trades going back to 2020. 36gb compressed. MIT licensed, free to download via @Cloudflare R2.

English

816

jon becker@beckerrjon·11 Şub

@adf_energy_twt @Cloudflare i used @ankr as the rpc provider

English

4.8K

Alex@adf_energy_twt·11 Şub

@beckerrjon @Cloudflare yeah so I see you use polygon-rpc but that's got a fairly strict rate limit too. Did you use a dedicated RPC provider?

English

275

jon becker@beckerrjon·11 Şub

@_ilovemodels @Cloudflare huge

English

631

i love models@_ilovemodels·11 Şub

@beckerrjon @Cloudflare Cooking smthng so that anyone can query and analyze the data in natural language. Will open source it tmrw!

English

696

jon becker@beckerrjon·11 Şub

@aiden0x4 @Cloudflare gemini is claiming 39 cents for the month but that surely can’t be right. im well under free tier limits right now according to the dash