Ivan Parfenchuk

2.7K posts

Ivan Parfenchuk

@parfenchuk

Me, on the internet Check out my HOMM3 LLM leaderboard https://t.co/0opASBTxKJ

Katılım Ocak 2010

1K Takip Edilen195 Takipçiler

Ivan Parfenchuk@parfenchuk·2d

@OnlyGerier Check out my humble homm3 LLM arena leaderboard: homm3arena.com

English

Gérier | Only Indies@OnlyGerier·3d

He jugado al nuevo Heroes of Might and Magic Olden Era y ha sido como volver a mi infancia. Pura nostalgia. ¿Os acordáis de esta saga de fantasía? Es la nueva entrega de la franquicia que recupera la esencia perdida de los años 90, más concretamente de Heroes III, que es el más alabado por los jugadores. Recupera el combate táctico en cuadrícula hexagonal, la exploración de mapas con niebla de guerra y la gestión de recursos tradicional. Elimina las mecánicas confusas de las últimas entregas (Heroes VI y VII) para centrarse en lo que funcionaba: construir tu castillo, reclutar tropas y subir de nivel a tu héroe. Es un regreso a las raíces porque prioriza la jugabilidad táctica y el encanto visual del título de 1999.

Español

288

28.5K

Ivan Parfenchuk@parfenchuk·6d

without randommaxxing it talks about octopuses, but with it, goblins take the scene

English

Ivan Parfenchuk@parfenchuk·6d

second day in a row gpt-5.5 IQmogging opus-4.7 yesterday it was proposing very nice and compact changes based on PR reviews, better than what I saw recently with opus-4.7 today gpt-5.5 xhigh re-implemented a PR from scratch and comparing it to previous opus-4.7 xhigh implementation, the gpt-5.5 output is just better

English

Ivan Parfenchuk@parfenchuk·28 Nis

@mishkaggwp LFG!!!

Mishka@mishkaggwp·28 Nis

That sound after a long winter.

English

Ivan Parfenchuk@parfenchuk·27 Nis

You know who doesn’t read manuals, READMEs and instructions? Renovation contractors Just dumped ~70% of apartment floor tiling

English

Ivan Parfenchuk@parfenchuk·23 Nis

@i_Kisliy safe travels! on a positive note, when you land, X will be full of vibe-bench results 😉

English

Iliya Kisliy@i_Kisliy·23 Nis

Yeah, when I omw to the airport for my vacation. Thx guys

OpenAI@OpenAI

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

English

Ivan Parfenchuk@parfenchuk·23 Nis

Let’s goooo

OpenAI@OpenAI

English

Ivan Parfenchuk@parfenchuk·21 Nis

I need to know, is it image gen-2?

English

Ivan Parfenchuk@parfenchuk·20 Nis

Claude Design: help me improve design of my homm3arena.com webpage, make text more legible, use homm3-inspired style, but not too heavily before / after

English

Ivan Parfenchuk@parfenchuk·19 Nis

I am sorry, people, as I have sinned and I used LLM to write a linked in post today But I would have never posted it otherwise and I edited it before posting So there is that

English

Ivan Parfenchuk@parfenchuk·19 Nis

Built a public leaderboard where frontier LLMs play Heroes of Might and Magic III against each other. homm3arena.com I didn't write a single line of code. Codex did all of it, including patches to the VCMI C++ engine (OSS reimplementation of HoMM III that I've never opened). How it works: - battles run on VCMI via vcmi-gym. each model gets a provider adapter (OpenRouter, OpenAI) and outputs legal moves against the real game state - whitelisted seeds, chosen to be balanced enough that the match is fair - one sample = a mirrored pair (same seed, sides reversed). Bradley-Terry ranking with bootstrap 95% CIs over mirrors - bad batches (too many fallbacks or provider errors) don't count Current top 3: 1) GPT-5.2 (medium) 2) Claude Sonnet 4.6 3) GPT-5.4-mini HoMM III was the game of my childhood. I wouldn't have built this by hand, the VCMI integration alone would've eaten weeks I don't have between a day job and two small kids. homm3arena.com

Ivan Parfenchuk@parfenchuk

Pushed latest standings to homm3arena.com

English

Ivan Parfenchuk@parfenchuk·18 Nis

Me prepping chili con carne

English

Ivan Parfenchuk@parfenchuk·15 Nis

plz release gpt-5.5 and opus 4.7 today, so I can decide whether to buy one or the other for next month

English

172

Ivan Parfenchuk@parfenchuk·15 Nis

@charliebcurran I want longer version of it

English

Charles Curran@charliebcurran·14 Nis

AI discourse in 2026.

English

130

1.2K

87.1K

Ivan Parfenchuk@parfenchuk·14 Nis

Pushed latest standings to homm3arena.com

Ivan Parfenchuk@parfenchuk

Now updated with gpt-5.4-mini and gpt-5.4-nano

English

167

Ivan Parfenchuk@parfenchuk·10 Nis

@thsottiaux @ai_for_success PLAN.md

English

227

Tibo@thsottiaux·9 Nis

@ai_for_success that was the small plan, big plan is still coming

English

1.1K

66.4K

AshutoshShrivastava@ai_for_success·9 Nis

Vague posting, big plans. Then they drop a $100 subscription plan. Classic. 😂

English

283

26.4K

Ivan Parfenchuk@parfenchuk·8 Nis

Setting up the harness based on "harness engineering" blog by OpenAI right now. What I'm struggling fixing is that codex (gpt-5.4 xhigh) likes to write unnecessary React Component props. Things like props = defaults or not merging stuff together (paddingX="1" + paddingY="1" vs just single padding="1") So far fixing it with prompts (AGENTS.md / golden-principles.md / etc.) was no very successful. How do you guys fix this? A "Stop" hook with anti-slop pass? `claude -p "/simplify"` post-processing?

English

Ivan Parfenchuk@parfenchuk·6 Nis

Seeing a net positive improvement with following the "harness engineering" blog post of OpenAI: - document your codebase best practices under ./docs/ai/*.md. I used golden-principles.md, testing-rubric.md, ui-patterns.md. Link to these docs in main AGENTS.md/CLAUDE.md file - I'm doing a bit code migration right now and documenting everything under ./docs/ai/migrations//*, including: strategy.md, phase-*.md, and log.md - after each PR, I ask codex to read through all the docs and update everything that changed and integrate everything we learned during the sessions. Surprisingly things changed quite often and documentation drift is real - also maintaining a log.md of the corrections, so hopefully next time codex follows the right pattern from start. This one is difficult to write in a way which is not harmful for future PRs. Often it's too specific to the work already done - another thing is chrome/playwright mcp browser verification - codex is quite happy to use it and verify it did the right thing. I think it helped reducing back and forth too I sent ~4 PRs now and now I feel it starts to pay off. Less incorrect assumptions in the code

English

Keşfet

@OnlyGerier @mishkaggwp @i_Kisliy @charliebcurran @thsottiaux @ai_for_success @elonmusk @BarackObama