Jan Tomášek

1.1K posts

Jan Tomášek

Jan Tomášek

@majnos64

Katılım Şubat 2012
991 Takip Edilen47 Takipçiler
Jan Tomášek
Jan Tomášek@majnos64·
@uslechtila @Chmee2 No tak systém bias je cca 1-2C takže ve srovnáním dříve to je 31.7-32.7C. Tzn extrém v hodnotách to je stejně
Čeština
0
0
0
104
Dr. Petr Brož
Dr. Petr Brož@Chmee2·
Meteorologická stanice v Oxfordu měří soustavně 211 let. A za celou dobu existence téhle stanice se teplota nikdy nedostala v květnu přes 31 °C. Až doteď, kdy to vypadá na "pěkných" 33,7 °C. Skok v maximální teplotě o více jak 3 stupně. Slušnej skočík...
MetJam@MetJam_

Oxford, the longest running continuous weather station in UK history, with temperature observations stretching back to 1815, has preliminarily broken its maximum temperature record for May yesterday by OVER 3ºC with a temperature of 33.7ºC. Unprecedented in its 211-year history.

Čeština
44
27
374
25.7K
Neo Kim
Neo Kim@systemdesignone·
AI ENGINEERS ONLY Which part of the software development lifecycle would you never trust an AI agent to handle alone?
English
35
2
40
15.8K
Jan Tomášek
Jan Tomášek@majnos64·
@rohanpaul_ai Sure but we miss the metrics to verify the claim. I seriously doubt CEO can tell.
English
0
0
0
6
Rohan Paul
Rohan Paul@rohanpaul_ai·
Uber CEO Dara Khosrowshahi said earlier that currently, 90% of Uber’s engineers use AI, but the top 30% (power users) are seeing unprecedented productivity gains. These power-users of AI are pushing the maximum number of "diffs" to the codebase. He predicts in 5 Years the ROI of a human engineer is surpassed by the ROI of adding more AI agents and GPU power. So at that time he will just hire more AI agents and pay for NVIDIA GPUs instead of human software engineers. --- From 'The Diary Of A CEO' YT Channel (link in comment)
English
101
43
273
123.9K
Jan Tomášek
Jan Tomášek@majnos64·
@badlogicgames Sure you can specify behaviour not the coding rules nor architecture decisions.... I gave up letting LM decide this. Better to use it for search and tiny task with human in loop. Even xhigh thinking Is worse than human guided low effort.
English
0
0
0
79
TJ Jurkiewicz | Fat Loss Coach
@colejaczko i implore everyone to check their sleep scores after they've executed a good chunk of their day already. Many days I had a near perfect sleep score I was dragging all day and vice versa. had some amazingly energetic and productive days where my sleep score was 60 something
English
1
0
0
526
luca🖇️
luca🖇️@kebabscibuli·
STRAŠNĚ bych chtěla nosit barefooty a udělat něco pro svoje zdraví, ale ty boty jsou tak STRAŠNĚ ošklivé, já se na to fakt nemůžu koukat
Čeština
53
3
572
35.2K
Jan Tomášek
Jan Tomášek@majnos64·
@badlogicgames Thinking works in planning not in execution imho. The best executor Is nonthinking model. I am now mostly doing low thinking effort. High only for a plan
English
0
0
0
165
Mario Zechner
Mario Zechner@badlogicgames·
me: do it gpt: totally did it me: dude gpt: totally did it now me: wtf gpt: i so did it, you won't believe how hard i did it gpt 5.5, thinking off.
English
18
2
168
11.9K
Hedgie
Hedgie@HedgieMarkets·
🦔Microsoft canceled its internal Claude Code licenses this week after token-based billing made the cost untenable, even for a company with effectively infinite cloud resources. Uber's CTO sent an internal memo warning the company burned through its entire 2026 AI budget in just four months. American AI software prices have jumped 20% to 37%, and GitHub (owned by Microsoft) is dropping flat-rate plans for usage-based billing across its products. My Take The AI subsidy era is ending in real time. The same company that put $13 billion into OpenAI and built the Azure infrastructure powering most of Anthropic's compute just looked at the bill from a competitor's coding tool and decided it was not worth paying. That is not a productivity failure on Anthropic's end. Token-based pricing is forcing every enterprise customer to confront the actual cost of running these models at scale, and the number turns out to be far higher than the flat-rate experiments suggested. This ties directly to my Gemini Flash post yesterday. Anthropic, OpenAI, and Google all raised effective prices in the last six months. Enterprises that built workflows assuming AI costs would keep falling are now watching annual budgets evaporate in months. Two outcomes look likely from here. Either enterprises scale back AI usage to fit budgets, which slows the revenue ramp the labs need to justify their valuations ahead of IPOs, or the labs cut prices and absorb the losses, which makes the unit economics worse at exactly the wrong moment. Both paths land in the same place, the numbers stop working, and somebody has to take the writedown. Hedgie🤗
Hedgie tweet media
English
1.1K
4K
20K
8.3M
Jan Tomášek
Jan Tomášek@majnos64·
@morganlinton Even flash 3.0 is okish as coding executor. Pro for planning. It Is not the best kombo tho.
English
0
0
0
9
Jan Tomášek
Jan Tomášek@majnos64·
@asmah2107 Depends what you expect. If you use it as fancy autocomplete you save a ton of time. If you want it to do your job it will not save much since it stops working around 5k loc
English
0
0
0
6
Ashutosh Maheshwari
Ashutosh Maheshwari@asmah2107·
Hot take: AI code generation doesn't actually save you that much time. If you have to painstakingly review and debug every line of AI-generated code, you're just trading writing time for reading time. The real holy grail? Verification. When AI can mathematically prove its code is 100% correct, you can confidently deploy it without ever looking at the source file.
English
132
16
300
32.9K
Jan Tomášek
Jan Tomášek@majnos64·
@DeRonin_ This is a bit of BS no? Harness should cover this. Maybe previously it did not.
English
0
0
1
288
Ronin
Ronin@DeRonin_·
Andrej Karpathy: "90% of Claude's mistakes come from missing context, not a weak model." 41% mistake rate without a CLAUDE.md. 11% with the 4-rule baseline. 3% with the 12-rule version below here are the 12 rules senior engineers settled on: 1. think before coding: state assumptions, don't guess. the model can't read your mind, stop hoping it will 2. simplicity first: minimum code, no speculative abstractions. the moment you let Claude add "for future flexibility," you've added 200 lines you'll delete next quarter 3. surgical changes: touch only what you must. don't let it improve adjacent code, that's how PRs blow up 4. goal-driven execution: define success criteria upfront, loop until verified. without them Claude either loops forever or stops too early 5. use the model only for judgment calls: classification, drafting, summarization, extraction. NOT routing, retries, status-code handling, deterministic transforms. if code can answer, code answers 6. token budgets are not advisory: per-task 4000, per-session 30000. by message 40 of a long debug, Claude is re-suggesting fixes you rejected at message 5 7. surface conflicts, don't average them: two patterns in the codebase? pick one. Claude blending them is how errors get swallowed twice 8. read before you write: read exports, callers, shared utilities. Claude will happily add a duplicate function next to an identical one it never read 9. tests verify intent, not just behavior: a test that can't fail when business logic changes is wrong. all 12 of Claude's tests can pass while the function returns a constant 10. checkpoint every significant step: Claude finished steps 5 and 6 on top of a broken state from step 4. nobody noticed for an hour 11. match the codebase conventions: class components? don't fork to hooks silently. testing patterns assumed componentDidMount, hooks broke them without surfacing 12. fail loud: "completed successfully" with 14% of records silently skipped is the worst class of bug. surface uncertainty, don't hide it what actually compounds instead of the next framework: - the CLAUDE.md file as institutional memory across sessions - eval-driven changes, not vibe-driven - checkpoints over speed - explicit conflicts over silent blending - discipline over framework, every time - one repo, one rules file, no exceptions be a few rules ahead of AI twitter before this becomes mass-opinion study this
Ronin@DeRonin_

anybody who uses or learns agentic systems, SHOULD READ THIS the install order I run before any new agentic project: 1. PRIVACY: direnv + a real secrets manager install direnv, then plug it into your team's password manager (1Password CLI via op run, doppler, infisical, vault, pick one) what direnv does: loads per-folder environment variables when you cd in, unloads when you cd out. the real move is wiring it into your secrets manager so credentials NEVER live in plain text on disk what this stops: - API keys accidentally committed to git history, the most common AI agent breach pattern in 2026 - credentials leaking from one project into another through your shell history - shared .env files that one teammate quietly backs up to Dropbox - secrets that survive a laptop theft because they were sitting in /Users/you/projects the part nobody mentions: most "my agent got jailbroken" stories actually trace back to one credential the agent had access to that it shouldn't have. scope keys to projects, scope projects to folders, and the blast radius of any single compromise drops dramatically I shipped 2 agents with keys in .env files before switching. the day I plugged direnv into op run I stopped having that whole class of nightmare 2. TOKENS: litellm or portkey as your model proxy one URL that fronts every AI provider (Anthropic, OpenAI, Google, Mistral, local models). all your spend flows through one place what it saves you: - response caching keyed by prompt hash, cuts your bill 30-60% on repeat tasks - automatic fallback on rate limits (Sonnet hits a 429? falls to Opus, then GPT, then your local backup, no broken users) - per-feature and per-user budget caps, block the call before it costs $200 instead of auditing it after - model routing rules, cheap tasks to Haiku, expensive ones to Opus, never the wrong way - PII redaction before requests leave your network, security side benefit the part nobody mentions: every "$4k AI bill" story I've heard ends with "we didn't have a proxy in front." this is where you put guardrails around spend BEFORE the spend happens I built my own router for 2 weeks. it took 20 minutes to replace with litellm. I will be embarrassed about this forever 3. CONTEXT: uv + git commit on every passing eval install uv (the new Python package manager, 10-100x faster than pip+venv, by the Astral team behind ruff). then commit every time an eval suite PASSES, with the model version and pass rate in the commit message what this preserves: - exact dependency set via uv.lock, you always know which packages your agent was using, no nasty surprises from a quiet update - exact prompt + code state, you can reproduce any past run from a single git hash - exact model version paired to exact pass rate, a paper trail when prod breaks weeks later - one-command rollback to a known-working state when a refactor goes sideways - a compliance story, every prompt version tied to a model version in your commit log the security side: when something blows up in prod, you want to say "the prompt was version X, model was Sonnet 4.6.1, last eval pass rate was 94%." not "I think we deployed on Tuesday?" the first is an incident report. the second is a resignation letter I've lost more agents to "I changed 3 prompts in one session and broke something" than to any actual bug 4. VISIBILITY: mitmproxy in front of every LLM call it's basically a wiretap for your agent. install it, point your agent through it, and now you see every conversation your agent has with the model in real time what actually shows up: - every silent retry your SDK sneaks in when a call fails - the full prompt being sent (including any creds you accidentally embedded) - what the model returns BEFORE your code reacts to it - exact token cost per call, per tool, per loop iteration - responses that quietly trigger your code into doing something you didn't intend, this is where prompt injection lives the part nobody talks about: if a website your agent scraped slipped instructions into its data, mitmproxy is how you SEE the moment your agent decides to follow them. without this layer, you're trusting your agent did the right thing, not verifying I shipped 3 agents before adding this. I have no honest idea what they were doing in production 5. EVALS: inspect-ai (the framework the labs actually use) an eval framework is what tells you "this agent works" with numbers instead of vibes. inspect-ai is the one Anthropic, DeepMind, and the UK AI Safety Institute use for the eval reports you read in their papers. open source, MIT licensed what your homegrown version won't have: - run the same task across 5 different models and compare scores side by side - pre-built tests for risky agent behavior (lying, manipulating, misusing tools) - proper structure for evaluating tool-using agents, not just chat - repeatable scoring, the same input always gets graded the same way - reproducible eval seeds, so a flaky test is actually flaky and not just unlucky I wrote my own eval harness 4 times across 4 projects. threw it out 4 times if you ever want to say "my agent passes safety checks" out loud, the check has to come from a framework someone else can re-run. this is that framework the move that ties this together: keep a /lessons.md in every repo. every weird agent behavior, every edge case, every config change you find at 2am, write it down you will not remember it. you'll come back in 3 weeks and the lessons file is the only reason you still know what's going on lock these 5, keep the lessons file, your next agentic system takes 2 days instead of 2 months p.s. half of "AI agent" content online is people who've never run mitmproxy on their own loop. they don't actually know what their agent is doing. they're shipping demo videos. don't be that guy

English
62
359
2.9K
445.2K
Jan Tomášek
Jan Tomášek@majnos64·
@rohanpaul_ai Or Antropic will not exist and China will take over since they are the only ones having electricity to power that amount of compute.
English
0
0
0
48
Rohan Paul
Rohan Paul@rohanpaul_ai·
Anthropic CEO Dario Amodei : "Software is going to become cheap, maybe essentially free. The premise that you need to amortize a piece of software you build across millions of users, that may start to be false. But at the same time, there are whole jobs, whole careers that we've built for decades that may not be present. And, you know, I think we can deal with it. I think we can adjust to it. But I don't, I don't think there's an awareness at all of what, of what is coming here and the magnitude of it." --- From "The Wall Street Journal" YT channel (link in comment)
English
382
151
1.6K
795.8K
Vojtěch
Vojtěch@Vojtech2022·
@sazkarik AI v ČR „přeskupí“ práci asi u třetiny lidí, ale opravdu vytlačí z původní pozice spíš jednotky až nízké desítky procent. Nejvíc to zasáhne kancelářskou rutinu. Fyzické obory, stavba, servis, řemesla a péče o lidi budou naopak relativně bezpečné — tam AI pomůže s papíry..
Čeština
4
0
2
479
Sazkarik
Sazkarik@sazkarik·
Začínám být čím dál větší AI skeptik. Poměřuji všechny ty miliardy dolarů k tomu, že když používám, výsledek dost casto nic moc. Netvrdím že je to k ničemu, ale to obrovské nadšení co tu panuje minimalne částečně nesdilim.
Čeština
28
0
64
13.8K
Jan Tomášek
Jan Tomášek@majnos64·
@SlavoTomascik USB A to USB C se zapojením USB 2.0. tam nemá být PD tohle je jen lowened zapojení
Čeština
0
0
1
752
Slavomir Tomascik
Slavomir Tomascik@SlavoTomascik·
USB-C je štandard vraveli. LIDL má svoj "štandard". Bolo mi divné prečo tam upozorňujú na použitie len so špecifickými zariadeniami. Tu je dôvod. Kábel nevyhovuje USB špecifikácii. A má krásnych skoro 1ohm.
Slavomir Tomascik tweet media
15
1
106
18.9K
Jan Tomášek
Jan Tomášek@majnos64·
@krzyzanowskim Sure. The only generic way to work with AI is to create a plan by targer model and implement by smaller model. The issue is you need to have the plan small enough so it uses less than 30-50k tokens of context which is hard for real work.
English
0
0
0
400
Marcin Krzyzanowski
Marcin Krzyzanowski@krzyzanowskim·
"skill issue" - a whole day of writing PRD/specification - half day grilling the PRD/specification - 16h /goal implementing PRD it doesn't work. IT DOESN'T WORK. it doesn't work at all, but also it doesn't work as specified. why broken? "So implementation drift" I'm done with this shit!
English
69
7
321
96.1K
Jan Tomášek
Jan Tomášek@majnos64·
@FU1151959 No tak ta message asi byla že v Berlíně to jde a ve Varšavě to stoji skoro půlku
Čeština
0
0
0
8
Jan Tomášek
Jan Tomášek@majnos64·
@imatrix No tak to má Cherny taky. To je cena agentního vyvoje součastnosti a proto taky nikdo jiný tyhle metody nepoužívá.
Čeština
0
0
0
764
Karel Javůrek
Karel Javůrek@imatrix·
Mysleli jste si, ze utracite hodne za AI? Tak myslete znovu. Tohle je Peter, tvurce OpenClaw, ktery jede konstantne 100 Codex agentu a pali skoro 30 milionu korun mesicne za tokeny. Toho asi uz nikdo na svete nedozene, alespon pokud jde o osobniho agenta. Hermes muze byt klidne kvalitnejsi jak chce (mene featur = mene bugu), ale takovy ficak tezko dozene. Jinak jestli predpokladam spravne, on to neplati, ma to zdarma od @sama protoze se nedavno byznysove spojili. Samotny produkt je ale stale open source.
Peter Steinberger 🦞@steipete

The latest CodexBar update renders API costs wayyyy nicer. codex.bar

Čeština
7
1
26
16.8K
Jan Tomášek
Jan Tomášek@majnos64·
@AishwaryaDevv Sure someone needs to build a benchmark for that then noone would use AI anymore
English
0
0
0
9
Aish
Aish@AishwaryaDevv·
Am I the only one getting vibe coding fatigue? Building landing pages in 30 seconds was fun, but maintaining a complex codebase where half the logic was “vibed” into existence is an absolute headache. Feels like we traded 1 hour of typing for 5 hours of architectural debugging later. I’ve started manually writing core logic again so I actually know where the technical debt is hiding. Is anyone successfully managing large production projects with AI agents, or are we all just building disposable software?
English
360
74
1.5K
225.4K
Dan McAteer
Dan McAteer@daniel_mac8·
.@subquadratic looks more like the biggest breakthrough since the Transformer than it does AI Theranos. Independent benchmark results are in: > 56x latency gain vs. Flash Attention at 1 mil toks > 95.6% on RULER at 128K toks > 86.2% on MRCR 8-needle > 81.8% SWE-Bench Still don't have access after requesting last week though. Would love to get my hands on it.
Dan McAteer@daniel_mac8

SubQ is either the biggest breakthrough since the Transformer... > 52x faster than FlashAttention at 1mm tok context > 20x cheaper than Opus ...or it's AI Theranos. Requested early access so hopefully can investigate soon.

English
18
13
190
39.8K