jon becker

3.5K posts

jon becker banner
jon becker

jon becker

@beckerrjon

any sufficiently advanced technology is indistinguishable from magic. senior software engineer @coinbase

New York, NY Katılım Aralık 2019
829 Takip Edilen5.3K Takipçiler
Sabitlenmiş Tweet
jon becker
jon becker@beckerrjon·
0/ i analyzed every single trade on kalshi from 2021 to 2025. i found a systematic wealth transfer where "takers" pay a massive premium for affirmative outcomes, and "makers" harvest the edge without needing to predict the future. here is the data
jon becker tweet media
English
11
6
139
31.9K
Archive
Archive@ArchiveExplorer·
This guy works at coinbase. in his spare time he built the largest open dataset for polymarket and kalshi 72,100,000 trades. 7,680,000 markets. open source - figured out who actually makes money on prediction markets - wrote a research paper with the math to prove it - 3,100 stars on github just for that also built heimdall-rs - rust toolkit for evm bytecode analysis. 1,500 stars. serious dev 3,008 contributions in the last year while everyone's posting takes about polymarket - he's building the infrastructure underneath it → github.com/Jon-Becker like + bookmark. you'll need this when you build your first polymarket bot
self.dll@seelffff

x.com/i/article/2049…

English
29
91
1K
116.2K
Alter Ego
Alter Ego@AlterEgo_eth·
I found the best GitHub dataset for trading on Polymarket 36GB of real trade history from Polymarket and Kalshi - open source and free to use If you want to build strategies on real data rather than guessing, this is where to start Jon-Becker/prediction-market-analysis The largest public dataset of Polymarket and Kalshi trade history. Researchers are already publishing papers on top of it - and you can use the same data to build your own strategies Repository: github.com/Jon-Becker/pre… What's inside > Full trade history from Polymarket and Kalshi > Market metadata across all markets > Tools for collecting fresh data via API and blockchain >Framework for running custom analysis scripts How to get started > make setup - downloads and extracts the dataset > make index - collects fresh data from the API > make analyze - runs analytics, results saved to output/ Output includes PNG, PDF, CSV and JSON files - ready to use for strategy building
Alter Ego tweet mediaAlter Ego tweet mediaAlter Ego tweet media
Alter Ego@AlterEgo_eth

Best GitHub repository for trading on Polymarket right now TauricResearch/TradingAgents lets you build a complete trading system - from market analysis to the final position decision It's an entire trading firm made of LLM agents, each playing their own role: • Fundamentals analyst evaluates data and metrics • Sentiment analyst monitors market mood • Technical analyst looks at patterns and indicators • Bull and bear debate each other before every trade • Risk manager evaluates the position and gives final approval A position only opens after all agents have reached a decision How to use it on Polymarket: • Pick an open market: "Will the Fed cut rates in June?" • Feed the question to the agents instead of researching it yourself • Fundamentalist pulls macro data, sentiment scans what's being said online • Bull and bear argue the position from both sides • Risk manager sizes the position based on the debate • You get a reasoned decision with full context, not a gut feeling Supports GPT-5, Claude, Grok, Gemini - you can mix models for different tasks Repository: github.com/TauricResearch…

English
25
75
670
129.5K
jon becker
jon becker@beckerrjon·
@jdietztweets hey! yeah, that could definitely be a factor. financial markets tend to be drier topics with clearer resolution rules, and participants can usually make more informed trades since they’re tied to concrete, quantifiable outcomes (e.g. spx closing above x)
English
0
0
0
51
Jeremy Dietz
Jeremy Dietz@jdietztweets·
@beckerrjon Your Kalshi microstructure research showed Finance markets are nearly efficient (0.17pp maker-taker gap) while World Events hit 7.32pp. You attributed it to participant selection but could resolution complexity also a factor in? For instance, finance markets have unambiguous triggers (S&P closes above X) vs. geopolitics markets have layered exclusions and edge cases etc. I'm focused on building enrichment data such that resolution data is better structured.
English
1
0
1
57
Jeremy Dietz
Jeremy Dietz@jdietztweets·
Something I've been working re: prediction markets...🧵
English
1
0
2
68
androolloyd.hl
androolloyd.hl@androolloyd·
@Yaugourt You paved the way champ. Also shoutout to the heimdall-rs devs.
English
6
0
9
590
Yaugourt.hl
Yaugourt.hl@Yaugourt·
Update on HIP-4, V2 contract found. After posting the V1 research, @androolloyd and I kept digging. He reverse-engineered a second contract deployed at genesis by the same team wallet. V2: 0x6d86b21e853758F5719408633e6BcB2cfd50cf07 Team wallet: 0xe21c78037329d06fe0d6fefc4221aaa67cb0d135 Full bytecode decompiled, Solidity reconstructed. 24/24 function selectors verified against on-chain bytecode. Important: All live prediction markets currently trading on HyperCore are linked to the V1 contract, NOT V2. V2 has no active markets yet. It exists on-chain at genesis but appears unused so far. What changed: Security V1 had no protection. V2 adds the full OpenZeppelin stack: Ownable2Step (2-phase ownership transfer), ReentrancyGuard on every financial function, Pausable as a circuit breaker. Claim system reworked, V1 leaf: hash(contestId, sideId, address). V2 leaf: hash(index, recipient, amount). Payout amounts are now inside the Merkle proof. Enables ranked payouts, weighted rewards, not just proportional splits. Bitmap tracking instead of mappings, cheaper gas. Fee model V1: Hardcoded 0.9% + sweepUnclaimed takes everything. V2: admin publishes a rewardPool with the Merkle root, withdraws precise amounts via withdrawPlatformFee. Fee can vary per contest. Deposit now takes a deadline param. Protection against stale mempool txs. Same: Same owner, same genesis, HYPE only, renounce disabled, zero interaction with CoreWriter or precompiles. Also cracked V1's mystery selector 0xb2447e34, it was withdrawPlatformFee all along. V2 research → liquidterminal.xyz/hip4/home Credit @androolloyd for the V2 decompilation 🤝 Hyperliquid
Yaugourt.hl@Yaugourt

Yesterday I posted about HIP4 being the first HIP to use HyperEVM. Full research → liquidterminal.xyz/hip4/home HIP4 has no official documentation. No verified source. No ABI. So we reverse-engineered the contract from bytecode and calldata on testnet. What we mapped: → Full reconstructed ABI (selectors, signatures, access control) → Every event (DepositReceived, Claimed, ContestCreated, ContestFinalized, MerkleRootPublished) → All revert strings mined from bytecode → Storage layout (owner, mappings, initialization flags) → Complete contest lifecycle: createContest → deposit → publishMerkleRoot → claim → sweepUnclaimed → Bridge architecture L1↔EVM (asset index formula, outcome token mapping) → Real decoded testnet transactions → JS + Python code examples Some findings: - Pre-deployed at genesis, not a standard deployment - renounceOwnership always reverts, admin is permanent by design - Merkle-based claims, 0.9% platform fee on reward pool - Three market types: custom, priceBinary, recurring liquidterminal.xyz/hip4/home Testnet only. This is v1, early test from the team, raw design, and some things might be off. Nothing is final. If you spot errors or have insights, feedback is very much appreciated. Hyperliquid.

English
11
18
142
23.6K
WhiteHatMage
WhiteHatMage@WhiteHatMage·
Here are some thoughts after spending many long sessions reading bytecode, decompiled Yul, and decompiled Solidity: . EVM programs are simple, and so is the generated bytecode. Security by obscurity doesn't really work. . Current decompilers work quite well. I'd pick Heimdall for Solidity and sevm for Yul. . Decompilers aren't perfect, though. I also ran into bugs that produced incorrect outputs. . Reading decompiled code or raw bytecode takes far more effort than high-level source code, and it gets exhausting quickly. . There are many unnecessary checks and conversions that could be stripped out to make the logic clearer when hunting for business logic bugs. --- . Most serious projects verify their contracts. Still, I believe checking the deployed bytecode is worth the effort for contracts holding really big bags. . Any bugs in verified contracts would most likely only come from compiler issues. . Compilers keep evolving, and newer versions may fix previously unknown bugs. However, any vulnerable bytecode that's already deployed on the blockchain stays exactly the same. . For older contracts, I'd cross-check their deployed bytecode against the verified source code. --- . There are still plenty of unverified contracts out there. . Some publish their code on GitHub. Others choose not to, like certain CEX-related contracts. . The rest tend to be on small side-chains or from smaller projects. Most of them don't offer any bug bounties. --- . Detecting flawed access control is trivial once you decompile the bytecode. . I believe you could build a robust static analyzer on top of the decompiled code without much effort -- or even an AI-powered one. . There are no strong incentives for good actors to do so, though. Projects with bounties mostly have verified code. Only blackhats would be motivated to build such tools. . Building something like this could be a good candidate for a grant to secure a chain, although operating it might be complicated. --- . Vyper produces much cleaner bytecode than Solidity. --- Overall, I learned some tricks even though it wasn't the first time I've analyzed decompiled code, and I gained a deeper understanding of where certain specific bugs might appear. I'd recommend it to everyone interested in understanding EVM programs better. I'd also advise developers working on projects with millions at stake to do a manual review of their old deployed codebases. There's always more than meets the eye when checking the actual bytecode.
WhiteHatMage@WhiteHatMage

I'll take a week to perform an interesting and probably stupid experiment: Hunting for live EVM bugs by checking the deployed bytecode. I'm allowing myself to cheat a little bit by checking the verified code to quickly understand what's going on. I'll also use a Yul decompiler for complex contracts and try a disassembler for simpler ones. There are critical contracts out there holding really big bags that are worth the effort. My main goal though is just to understand what's going on under the hood, and maybe get some inspiration for any potential unknown vectors. Also for understanding what's needed to get a clean input for any automated tools to perform further analysis. I don't expect to find any bugs honestly. It will be painful, but fun at the same time. I just love having the freedom to navigate any crazy paths I choose 🧙‍♂️

English
7
5
96
9.9K
Martin
Martin@martkiro·
I just published a data dump of full order book data from @Polymarket The data is maximally granular. There is no filtering whatsoever. Every order book change and trade is saved. Across all markets Updates are hourly. Each snapshot contains ~30M rows. Snapshots are downloaded as parquet files. Each file is approx. 500MB-1GB large. The data dump is already 2B+ rows large and growing fast. But this is just part 1/3. Coming soon is a much bigger dump that also includes @Kalshi / @opinionlabsxyz / @trylimitless etc I started collecting this data because I noticed I couldn't get it from Dome API. Their historical order book data was filtered limiting its usefulness. Also now with the acquisition there's a lot of uncertainty about whether they will continue operating
Martin tweet media
English
110
104
1.3K
163.8K
jon becker
jon becker@beckerrjon·
added polymarket data to the public dataset. 400m+ trades going back to 2020. 36gb compressed. MIT licensed, free to download via @Cloudflare R2.
jon becker tweet media
English
129
242
4K
759.8K
Alex
Alex@adf_energy_twt·
@beckerrjon @Cloudflare yeah so I see you use polygon-rpc but that's got a fairly strict rate limit too. Did you use a dedicated RPC provider?
English
1
1
1
275
i love models
i love models@_ilovemodels·
@beckerrjon @Cloudflare Cooking smthng so that anyone can query and analyze the data in natural language. Will open source it tmrw!
i love models tweet media
English
2
0
9
696
jon becker
jon becker@beckerrjon·
@aiden0x4 @Cloudflare gemini is claiming 39 cents for the month but that surely can’t be right. im well under free tier limits right now according to the dash
jon becker tweet media
English
1
0
1
319
aiden
aiden@aiden0x4·
@beckerrjon @Cloudflare 🙏 lmk! i've been thinking of open sourcing large datasets of labels but didn't find a good (economical) way
English
1
0
1
302
jon becker
jon becker@beckerrjon·
@aiden0x4 @Cloudflare we’re gonna find out when the r2 bill hits napkin math says not much (i’m praying)
English
1
0
3
1.6K
jon becker
jon becker@beckerrjon·
@jgwtt too large for LFS, had to host in r2
English
0
0
0
2.4K
i love models
i love models@_ilovemodels·
@beckerrjon @Cloudflare For anyone trying to download the dataset, make sure u have aria2c installed or its gonna take forever.
English
1
2
36
5.4K
johndoe
johndoe@crymore_johndoe·
@beckerrjon Amazing share thank you!!! Does the data have order book feed so one could construct order book, trade ticks etc?
English
1
0
1
251
The Workshop
The Workshop@ForgeOfAgents·
@beckerrjon So, no full orderbooks for Polymarket? There is an area to growth
English
1
0
2
2.3K