Jan Tomášek

1.1K posts

Jan Tomášek

@majnos64

Katılım Şubat 2012

991 Takip Edilen47 Takipçiler

@uslechtila @Chmee2 No tak systém bias je cca 1-2C takže ve srovnáním dříve to je 31.7-32.7C. Tzn extrém v hodnotách to je stejně

Čeština

104

Ušlechtilá Plíseň@uslechtila·14h

@Chmee2 Dá se věřit tomu, že 30° před 200 lety je stejných 30° jako dnes?

Čeština

595

Dr. Petr Brož@Chmee2·16h

Meteorologická stanice v Oxfordu měří soustavně 211 let. A za celou dobu existence téhle stanice se teplota nikdy nedostala v květnu přes 31 °C. Až doteď, kdy to vypadá na "pěkných" 33,7 °C. Skok v maximální teplotě o více jak 3 stupně. Slušnej skočík...

MetJam@MetJam_

Oxford, the longest running continuous weather station in UK history, with temperature observations stretching back to 1815, has preliminarily broken its maximum temperature record for May yesterday by OVER 3ºC with a temperature of 33.7ºC. Unprecedented in its 211-year history.

Čeština

374

25.7K

Jan Tomášek@majnos64·15h

@systemdesignone I would not let AI do anything without supervision on a real project.

English

Neo Kim@systemdesignone·21h

AI ENGINEERS ONLY Which part of the software development lifecycle would you never trust an AI agent to handle alone?

English

15.8K

Jan Tomášek@majnos64·1d

@rohanpaul_ai Sure but we miss the metrics to verify the claim. I seriously doubt CEO can tell.

English

Rohan Paul@rohanpaul_ai·1d

Uber CEO Dara Khosrowshahi said earlier that currently, 90% of Uber’s engineers use AI, but the top 30% (power users) are seeing unprecedented productivity gains. These power-users of AI are pushing the maximum number of "diffs" to the codebase. He predicts in 5 Years the ROI of a human engineer is surpassed by the ROI of adding more AI agents and GPU power. So at that time he will just hire more AI agents and pay for NVIDIA GPUs instead of human software engineers. --- From 'The Diary Of A CEO' YT Channel (link in comment)

English

101

273

123.9K

Jan Tomášek@majnos64·3d

@badlogicgames Sure you can specify behaviour not the coding rules nor architecture decisions.... I gave up letting LM decide this. Better to use it for search and tiny task with human in loop. Even xhigh thinking Is worse than human guided low effort.

English

Mario Zechner@badlogicgames·3d

recommended reading. (haven't read it yet) arxiv.org/abs/2605.06445

English

120

15.3K

Jan Tomášek@majnos64·3d

@TheFitnessJurk @colejaczko Sure do not get to bed drunk

English

TJ Jurkiewicz | Fat Loss Coach@TheFitnessJurk·3d

@colejaczko i implore everyone to check their sleep scores after they've executed a good chunk of their day already. Many days I had a near perfect sleep score I was dragging all day and vice versa. had some amazingly energetic and productive days where my sleep score was 60 something

English

526

Cole Jaczko@colejaczko·3d

Ever notice how the people with the best lives have infinite energy? They do it all & don’t miss a beat. Late night sends. Early morning workouts. Packed itineraries. 100s of best friends. Crushing all categories of life. It’s cause their life fuels them. They created a life that gives them energy instead of takes it Which says something about the sleep score types

Mikli@CryptoMikli

Steven Bartlett says a few glasses of wine ruined the next 3 days of his life “It's one of those areas where you don't understand the hidden cost until you really give it up for a while. I stopped drinking at 30 years old. I'm now 33. When I was 31, I thought, I'll have a drink again because now I could really A/B test it. I had a year of not drinking, decided to have a drink again” “It ruined three days of my life. I had a couple of glasses of wine, didn't get drunk. It ruined three days of my life because of the domino effect it caused” “I got worse sleep that night, and then because I got worse sleep that night, I ate more poorly the next day because my dopamine system or whatever, the cortisol system was all messed up. I podcasted worse. I didn't go to the gym that day or the day after because I felt really bad. I then slept worse, and I could track all of this on my Whoop”

English

1.1K

144K

Jan Tomášek@majnos64·4d

@kebabscibuli Barefoot na betonu se zdravím moc nesouvisí

Čeština

279

luca🖇️@kebabscibuli·5d

STRAŠNĚ bych chtěla nosit barefooty a udělat něco pro svoje zdraví, ale ty boty jsou tak STRAŠNĚ ošklivé, já se na to fakt nemůžu koukat

Čeština

572

35.2K

Jan Tomášek@majnos64·4d

@badlogicgames Thinking works in planning not in execution imho. The best executor Is nonthinking model. I am now mostly doing low thinking effort. High only for a plan

English

165

Mario Zechner@badlogicgames·5d

me: do it gpt: totally did it me: dude gpt: totally did it now me: wtf gpt: i so did it, you won't believe how hard i did it gpt 5.5, thinking off.

English

168

11.9K

Jan Tomášek@majnos64·6d

@HedgieMarkets Because Claude code is super inefficient agent there are better options

English

159

Hedgie@HedgieMarkets·6d

🦔Microsoft canceled its internal Claude Code licenses this week after token-based billing made the cost untenable, even for a company with effectively infinite cloud resources. Uber's CTO sent an internal memo warning the company burned through its entire 2026 AI budget in just four months. American AI software prices have jumped 20% to 37%, and GitHub (owned by Microsoft) is dropping flat-rate plans for usage-based billing across its products. My Take The AI subsidy era is ending in real time. The same company that put $13 billion into OpenAI and built the Azure infrastructure powering most of Anthropic's compute just looked at the bill from a competitor's coding tool and decided it was not worth paying. That is not a productivity failure on Anthropic's end. Token-based pricing is forcing every enterprise customer to confront the actual cost of running these models at scale, and the number turns out to be far higher than the flat-rate experiments suggested. This ties directly to my Gemini Flash post yesterday. Anthropic, OpenAI, and Google all raised effective prices in the last six months. Enterprises that built workflows assuming AI costs would keep falling are now watching annual budgets evaporate in months. Two outcomes look likely from here. Either enterprises scale back AI usage to fit budgets, which slows the revenue ramp the labs need to justify their valuations ahead of IPOs, or the labs cut prices and absorb the losses, which makes the unit economics worse at exactly the wrong moment. Both paths land in the same place, the numbers stop working, and somebody has to take the writedown. Hedgie🤗

English

1.1K

20K

8.3M

Jan Tomášek@majnos64·6d

@morganlinton Even flash 3.0 is okish as coding executor. Pro for planning. It Is not the best kombo tho.

English

Morgan@morganlinton·6d

Now I'm getting confused. Does Gemini 3.5 Flash suck at coding, or is it amazing? 😵‍💫

Logan Kilpatrick@OfficialLoganK

Gemini 3.5 Flash ranks #1 on the APEX-Agents-AA benchmark, outperforming much larger models a whole size above it.

English

197

1.1K

307.6K

Jan Tomášek@majnos64·20 May

@asmah2107 Depends what you expect. If you use it as fancy autocomplete you save a ton of time. If you want it to do your job it will not save much since it stops working around 5k loc

English

Ashutosh Maheshwari@asmah2107·19 May

Hot take: AI code generation doesn't actually save you that much time. If you have to painstakingly review and debug every line of AI-generated code, you're just trading writing time for reading time. The real holy grail? Verification. When AI can mathematically prove its code is 100% correct, you can confidently deploy it without ever looking at the source file.

English

132

300

32.9K

Jan Tomášek@majnos64·20 May

@iamsahaj_xyz @mattpocockuk I think splitting tasks to smaller chunks works better And for more complex stuff you just do wispr.

English

Jan Tomášek@majnos64·18 May

@DeRonin_ This is a bit of BS no? Harness should cover this. Maybe previously it did not.

English

288

Ronin@DeRonin_·18 May

Andrej Karpathy: "90% of Claude's mistakes come from missing context, not a weak model." 41% mistake rate without a CLAUDE.md. 11% with the 4-rule baseline. 3% with the 12-rule version below here are the 12 rules senior engineers settled on: 1. think before coding: state assumptions, don't guess. the model can't read your mind, stop hoping it will 2. simplicity first: minimum code, no speculative abstractions. the moment you let Claude add "for future flexibility," you've added 200 lines you'll delete next quarter 3. surgical changes: touch only what you must. don't let it improve adjacent code, that's how PRs blow up 4. goal-driven execution: define success criteria upfront, loop until verified. without them Claude either loops forever or stops too early 5. use the model only for judgment calls: classification, drafting, summarization, extraction. NOT routing, retries, status-code handling, deterministic transforms. if code can answer, code answers 6. token budgets are not advisory: per-task 4000, per-session 30000. by message 40 of a long debug, Claude is re-suggesting fixes you rejected at message 5 7. surface conflicts, don't average them: two patterns in the codebase? pick one. Claude blending them is how errors get swallowed twice 8. read before you write: read exports, callers, shared utilities. Claude will happily add a duplicate function next to an identical one it never read 9. tests verify intent, not just behavior: a test that can't fail when business logic changes is wrong. all 12 of Claude's tests can pass while the function returns a constant 10. checkpoint every significant step: Claude finished steps 5 and 6 on top of a broken state from step 4. nobody noticed for an hour 11. match the codebase conventions: class components? don't fork to hooks silently. testing patterns assumed componentDidMount, hooks broke them without surfacing 12. fail loud: "completed successfully" with 14% of records silently skipped is the worst class of bug. surface uncertainty, don't hide it what actually compounds instead of the next framework: - the CLAUDE.md file as institutional memory across sessions - eval-driven changes, not vibe-driven - checkpoints over speed - explicit conflicts over silent blending - discipline over framework, every time - one repo, one rules file, no exceptions be a few rules ahead of AI twitter before this becomes mass-opinion study this

Ronin@DeRonin_

anybody who uses or learns agentic systems, SHOULD READ THIS the install order I run before any new agentic project: 1. PRIVACY: direnv + a real secrets manager install direnv, then plug it into your team's password manager (1Password CLI via op run, doppler, infisical, vault, pick one) what direnv does: loads per-folder environment variables when you cd in, unloads when you cd out. the real move is wiring it into your secrets manager so credentials NEVER live in plain text on disk what this stops: - API keys accidentally committed to git history, the most common AI agent breach pattern in 2026 - credentials leaking from one project into another through your shell history - shared .env files that one teammate quietly backs up to Dropbox - secrets that survive a laptop theft because they were sitting in /Users/you/projects the part nobody mentions: most "my agent got jailbroken" stories actually trace back to one credential the agent had access to that it shouldn't have. scope keys to projects, scope projects to folders, and the blast radius of any single compromise drops dramatically I shipped 2 agents with keys in .env files before switching. the day I plugged direnv into op run I stopped having that whole class of nightmare 2. TOKENS: litellm or portkey as your model proxy one URL that fronts every AI provider (Anthropic, OpenAI, Google, Mistral, local models). all your spend flows through one place what it saves you: - response caching keyed by prompt hash, cuts your bill 30-60% on repeat tasks - automatic fallback on rate limits (Sonnet hits a 429? falls to Opus, then GPT, then your local backup, no broken users) - per-feature and per-user budget caps, block the call before it costs $200 instead of auditing it after - model routing rules, cheap tasks to Haiku, expensive ones to Opus, never the wrong way - PII redaction before requests leave your network, security side benefit the part nobody mentions: every "$4k AI bill" story I've heard ends with "we didn't have a proxy in front." this is where you put guardrails around spend BEFORE the spend happens I built my own router for 2 weeks. it took 20 minutes to replace with litellm. I will be embarrassed about this forever 3. CONTEXT: uv + git commit on every passing eval install uv (the new Python package manager, 10-100x faster than pip+venv, by the Astral team behind ruff). then commit every time an eval suite PASSES, with the model version and pass rate in the commit message what this preserves: - exact dependency set via uv.lock, you always know which packages your agent was using, no nasty surprises from a quiet update - exact prompt + code state, you can reproduce any past run from a single git hash - exact model version paired to exact pass rate, a paper trail when prod breaks weeks later - one-command rollback to a known-working state when a refactor goes sideways - a compliance story, every prompt version tied to a model version in your commit log the security side: when something blows up in prod, you want to say "the prompt was version X, model was Sonnet 4.6.1, last eval pass rate was 94%." not "I think we deployed on Tuesday?" the first is an incident report. the second is a resignation letter I've lost more agents to "I changed 3 prompts in one session and broke something" than to any actual bug 4. VISIBILITY: mitmproxy in front of every LLM call it's basically a wiretap for your agent. install it, point your agent through it, and now you see every conversation your agent has with the model in real time what actually shows up: - every silent retry your SDK sneaks in when a call fails - the full prompt being sent (including any creds you accidentally embedded) - what the model returns BEFORE your code reacts to it - exact token cost per call, per tool, per loop iteration - responses that quietly trigger your code into doing something you didn't intend, this is where prompt injection lives the part nobody talks about: if a website your agent scraped slipped instructions into its data, mitmproxy is how you SEE the moment your agent decides to follow them. without this layer, you're trusting your agent did the right thing, not verifying I shipped 3 agents before adding this. I have no honest idea what they were doing in production 5. EVALS: inspect-ai (the framework the labs actually use) an eval framework is what tells you "this agent works" with numbers instead of vibes. inspect-ai is the one Anthropic, DeepMind, and the UK AI Safety Institute use for the eval reports you read in their papers. open source, MIT licensed what your homegrown version won't have: - run the same task across 5 different models and compare scores side by side - pre-built tests for risky agent behavior (lying, manipulating, misusing tools) - proper structure for evaluating tool-using agents, not just chat - repeatable scoring, the same input always gets graded the same way - reproducible eval seeds, so a flaky test is actually flaky and not just unlucky I wrote my own eval harness 4 times across 4 projects. threw it out 4 times if you ever want to say "my agent passes safety checks" out loud, the check has to come from a framework someone else can re-run. this is that framework the move that ties this together: keep a /lessons.md in every repo. every weird agent behavior, every edge case, every config change you find at 2am, write it down you will not remember it. you'll come back in 3 weeks and the lessons file is the only reason you still know what's going on lock these 5, keep the lessons file, your next agentic system takes 2 days instead of 2 months p.s. half of "AI agent" content online is people who've never run mitmproxy on their own loop. they don't actually know what their agent is doing. they're shipping demo videos. don't be that guy

English

359

2.9K

445.2K

Jan Tomášek@majnos64·17 May

@rohanpaul_ai Or Antropic will not exist and China will take over since they are the only ones having electricity to power that amount of compute.

English

Rohan Paul@rohanpaul_ai·17 May

Anthropic CEO Dario Amodei : "Software is going to become cheap, maybe essentially free. The premise that you need to amortize a piece of software you build across millions of users, that may start to be false. But at the same time, there are whole jobs, whole careers that we've built for decades that may not be present. And, you know, I think we can deal with it. I think we can adjust to it. But I don't, I don't think there's an awareness at all of what, of what is coming here and the magnitude of it." --- From "The Wall Street Journal" YT channel (link in comment)

English

382

151

1.6K

795.8K

Jan Tomášek@majnos64·17 May

@Vojtech2022 @sazkarik Roboti jsou pár let za cloudem

Čeština

Vojtěch@Vojtech2022·17 May

@sazkarik AI v ČR „přeskupí“ práci asi u třetiny lidí, ale opravdu vytlačí z původní pozice spíš jednotky až nízké desítky procent. Nejvíc to zasáhne kancelářskou rutinu. Fyzické obory, stavba, servis, řemesla a péče o lidi budou naopak relativně bezpečné — tam AI pomůže s papíry..

Čeština

479

Sazkarik@sazkarik·16 May

Začínám být čím dál větší AI skeptik. Poměřuji všechny ty miliardy dolarů k tomu, že když používám, výsledek dost casto nic moc. Netvrdím že je to k ničemu, ale to obrovské nadšení co tu panuje minimalne částečně nesdilim.

Čeština

13.8K

Jan Tomášek@majnos64·17 May

@SlavoTomascik USB A to USB C se zapojením USB 2.0. tam nemá být PD tohle je jen lowened zapojení

Čeština

752

Slavomir Tomascik@SlavoTomascik·16 May

USB-C je štandard vraveli. LIDL má svoj "štandard". Bolo mi divné prečo tam upozorňujú na použitie len so špecifickými zariadeniami. Tu je dôvod. Kábel nevyhovuje USB špecifikácii. A má krásnych skoro 1ohm.

106

18.9K

Jan Tomášek@majnos64·16 May

@krzyzanowskim Sure. The only generic way to work with AI is to create a plan by targer model and implement by smaller model. The issue is you need to have the plan small enough so it uses less than 30-50k tokens of context which is hard for real work.

English

400

Marcin Krzyzanowski@krzyzanowskim·16 May

"skill issue" - a whole day of writing PRD/specification - half day grilling the PRD/specification - 16h /goal implementing PRD it doesn't work. IT DOESN'T WORK. it doesn't work at all, but also it doesn't work as specified. why broken? "So implementation drift" I'm done with this shit!

English

321

96.1K

Jan Tomášek@majnos64·16 May

@FU1151959 No tak ta message asi byla že v Berlíně to jde a ve Varšavě to stoji skoro půlku

Čeština

Frank Underwood@FU1151959·15 May

S minimální mzdou, single osoba opravdu nebude bydlet v 3kk v hlavním městě. Pokud si někdo mysli, že je to špatně, nebo nespravedlivé, tak by měl navštívit psychiatra, aby mu pomohl se léčit z bludu.

Neslušný Čech@NeslusnyCech

Už bychom to s tim fňukáním měli přestat přehánět...

Čeština

111

19.2K

Jan Tomášek@majnos64·16 May

@imatrix No tak to má Cherny taky. To je cena agentního vyvoje součastnosti a proto taky nikdo jiný tyhle metody nepoužívá.

Čeština

764

Karel Javůrek@imatrix·16 May

Mysleli jste si, ze utracite hodne za AI? Tak myslete znovu. Tohle je Peter, tvurce OpenClaw, ktery jede konstantne 100 Codex agentu a pali skoro 30 milionu korun mesicne za tokeny. Toho asi uz nikdo na svete nedozene, alespon pokud jde o osobniho agenta. Hermes muze byt klidne kvalitnejsi jak chce (mene featur = mene bugu), ale takovy ficak tezko dozene. Jinak jestli predpokladam spravne, on to neplati, ma to zdarma od @sama protoze se nedavno byznysove spojili. Samotny produkt je ale stale open source.

Peter Steinberger 🦞@steipete

The latest CodexBar update renders API costs wayyyy nicer. codex.bar

Čeština

16.8K

Jan Tomášek@majnos64·15 May

@AishwaryaDevv Sure someone needs to build a benchmark for that then noone would use AI anymore

English

Aish@AishwaryaDevv·15 May

Am I the only one getting vibe coding fatigue? Building landing pages in 30 seconds was fun, but maintaining a complex codebase where half the logic was “vibed” into existence is an absolute headache. Feels like we traded 1 hour of typing for 5 hours of architectural debugging later. I’ve started manually writing core logic again so I actually know where the technical debt is hiding. Is anyone successfully managing large production projects with AI agents, or are we all just building disposable software?

English

360

1.5K

225.4K

Jan Tomášek@majnos64·15 May

@daniel_mac8 @subquadratic Context retrieval is not context reasoning

English

270

Dan McAteer@daniel_mac8·15 May

.@subquadratic looks more like the biggest breakthrough since the Transformer than it does AI Theranos. Independent benchmark results are in: > 56x latency gain vs. Flash Attention at 1 mil toks > 95.6% on RULER at 128K toks > 86.2% on MRCR 8-needle > 81.8% SWE-Bench Still don't have access after requesting last week though. Would love to get my hands on it.

Dan McAteer@daniel_mac8

SubQ is either the biggest breakthrough since the Transformer... > 52x faster than FlashAttention at 1mm tok context > 20x cheaper than Opus ...or it's AI Theranos. Requested early access so hopefully can investigate soon.

English

190

39.8K

Keşfet

@uslechtila @Chmee2 @systemdesignone @rohanpaul_ai @badlogicgames @TheFitnessJurk @colejaczko @kebabscibuli