Paweł Szulc

7.7K posts

Paweł Szulc

@EncodePanda

Haskell, 范畴论, λ, Distributed Systems, Formal Methods

Poland Katılım Şubat 2009

670 Takip Edilen2.9K Takipçiler

Sabitlenmiş Tweet

Paweł Szulc@EncodePanda·1 Oca

/s/rabbitonweb/EncodePanda

Català

Paweł Szulc@EncodePanda·3d

Two months ago, I did a presentation "From Micrograd to coppergrad: Building Neural Networks and Backpropagation from Scratch in Rust" A bit of math, a bit of machine learning, a bit of Rust. Enjoy! youtube.com/watch?v=IeLcBX…

YouTube

English

2.2K

Paweł Szulc retweetledi

John A De Goes@jdegoes·30 Nis

Anthropic is deeply incompetent at building software that works reliably. Their users would be happier if they focused on the model and outsourced the tooling to engineers who go beyond 'vibe coding'.

English

339

16K

Paweł Szulc retweetledi

Mushtaq Bilal, PhD@MushtaqBilalPhD·27 Nis

Sci-Hub is an evil website that pirated 85M+ research papers and made them freely available And now they've added AI to their database to make Sci-Bot. It answers your questions using latest, full-text articles. But DO NOT use it. We should all try to make billion-dollar academic publishers richer. I'm putting the link below so you know how to avoid it.

English

839

47.1K

4.8M

Paweł Szulc retweetledi

PagedOut@pagedout_zine·24 Nis

Call For Pages is still open! We're calling all authors and artists who would like to be a part of Paged Out! Issue #9. Our email articles@pagedout.institute is waiting!

English

4.1K

Paweł Szulc retweetledi

fab2s@flodl_dev·22 Nis

Write-up: flodl.dev/blog/huggingfa… Tutorial: flodl.dev/guide/flodl-hf Crate: crates.io/crates/flodl-hf Repo: github.com/fab2s/floDl Feedback and PRs welcome.

English

185

Paweł Szulc@EncodePanda·15 Nis

ZXX

283

Paweł Szulc@EncodePanda·15 Nis

Poland Math Stronk

GIF

Lukasz Olejnik@lukOlejnik

Polski

273

Paweł Szulc@EncodePanda·8 Nis

@jdegoes Have you seen this x.com/i/status/20382…

Guri Singh@heygurisingh

Humans: 100% Gemini 3.1 Pro: 0.37% GPT 5.4: 0.26% Opus 4.6: 0.25% Grok-4.20: 0.00% François Chollet just released ARC-AGI-3 -- the hardest AI test ever created. 135 novel game environments. No instructions. No rules. No goals given. Figure it out or fail. Untrained humans solved every single one. Every frontier AI model scored below 1%. Each environment was handcrafted by game designers. The AI gets dropped in and has to explore, discover what winning looks like, and adapt in real time. The scoring punishes brute force. If a human needs 10 actions and the AI needs 100, the AI doesn't get 10%. It gets 1%. You can't throw more compute at this. For context: ARC-AGI-1 is basically solved. Gemini scores 98% on it. ARC-AGI-2 went from 3% to 77% in under a year. Labs spent millions training on earlier versions. ARC-AGI-3 resets the entire scoreboard to near zero. The benchmark launched live at Y Combinator with a fireside between Chollet and Sam Altman. $2M in prizes on Kaggle. All winning solutions must be open-sourced. Scaling alone will not close this gap. We are nowhere near AGI. (Link in the comments)

English

119

John A De Goes@jdegoes·7 Nis

No paper is needed for this fact, it is self-evident to those paying attention. Yet my mentions are filled with people claiming LLMs are capable of de novo reasoning. They're not. But they are awfully good at convincing people they are.

Nav Toor@heynavtoor

🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves. And the way they proved it is devastating. Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers. Every model's performance dropped. Every single one. 25 state-of-the-art models tested. But that wasn't the real experiment. The real experiment broke everything. They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly. Here's the actual example from the paper: "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?" The correct answer is 190. The size of the kiwis has nothing to do with the count. A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are. But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185. Llama did the same thing. Subtracted 5. Got 185. They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction. The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all. Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing. The results are catastrophic. Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence. GPT-4o dropped from 94.9% to 63.1%. o1-mini dropped from 94.5% to 66.0%. o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%. Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause. This means it's not a prompting problem. It's not a context problem. It's structural. The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense. The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts." They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse. A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash. This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world. You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.

English

10.1K

Paweł Szulc retweetledi

PagedOut@pagedout_zine·7 Nis

Learning reverse engineering and hungry for some real-world tips and tricks? Check out this article by Amnesia ("Reverse Engineering Cryptography Code"). This is a solid overview with multiple approaches to the topic.

English

10.6K

Paweł Szulc retweetledi

Muhammad Ayan@socialwithaayan·6 Nis

🚨 BREAKING: Someone just built the exact tool Andrej Karpathy said someone should build. 48 hours after Karpathy posted his LLM Knowledge Bases workflow, this showed up on GitHub. It's called Graphify. One command. Any folder. Full knowledge graph. Point it at any folder. Run /graphify inside Claude Code. Walk away. Here is what comes out the other side: -> A navigable knowledge graph of everything in that folder -> An Obsidian vault with backlinked articles -> A wiki that starts at index. md and maps every concept cluster -> Plain English Q&A over your entire codebase or research folder You can ask it things like: "What calls this function?" "What connects these two concepts?" "What are the most important nodes in this project?" No vector database. No setup. No config files. The token efficiency number is what got me: 71.5x fewer tokens per query compared to reading raw files. That is not a small improvement. That is a completely different paradigm for how AI agents reason over large codebases. What it supports: -> Code in 13 programming languages -> PDFs -> Images via Claude Vision -> Markdown files Install in one line: pip install graphify && graphify install Then type /graphify in Claude Code and point it at anything. Karpathy asked. Someone delivered in 48 hours. That is the pace of 2026. Open Source. Free.

English

271

1.4K

12.7K

945K

Paweł Szulc@EncodePanda·2 Nis

@kerckhove_ts Have you used Skills to instruct Claude how to behave? Or even without skills, you can say in a prompt how to approach the investigation (write test first that will fail). Or did you just say "Investigate" and act surprised that it literally did the simplest thing possible? :)

English

155

Tom Sydney Kerckhove@kerckhove_ts·2 Nis

Me: Investigate this Claude: Here's a fix Me: No, test first! Claude: Ok here, test passes Me: No! it needs to fail first! Claude: Of course! Here you go. Me: Now fix it Claude: <removes the test> NOOOOO

English

1.1K

Paweł Szulc@EncodePanda·2 Nis

@flodl_dev Please tell me this is not an April's Fool joke!

English

Paweł Szulc@EncodePanda·1 Nis

OMG my new favorite channel! @computablesecrets" target="_blank" rel="nofollow noopener">youtube.com/@computablesec… Just watch @computablesecrets amazing videos about computability and complexity!

English

360

Paweł Szulc@EncodePanda·1 Nis

I want to speak & read Mandarin. I’m willing to invest meaningfully to do it efficiently. What are the best options?

English

272

Paweł Szulc retweetledi

kitze@thekitze·31 Mar

🚨 BREAKING: GPT 5.4 rates the Claude Code codebase 6.5/10 💀 "This is not junior spaghetti. This is staff-engineer spaghetti: performance-aware, feature-flagged, telemetry-instrumented, surgically optimized spaghetti" 😭

English

220

222

4.1K

362.7K

Paweł Szulc@EncodePanda·1 Nis

Last year I could not make it :( This year I'm going to be there 100%! Lambda World is an amazing conference and you should join me as well!

Lambda World@Lambda_World

Last day to purchase your early camaron tickets and submit a proposal to our Call For Papers! lambda.world @CFP_Bot @WikiCFP #FunctionalProgramming

English

158

Paweł Szulc@EncodePanda·31 Mar

The cognitive dissonance - when you use Cloude Code daily, but refuse to try an open source product with .cloude in it🤷‍♂️ Because it's great when you use it, and shit when others do. Apparently, quality changes with ownership.

English

662

Paweł Szulc retweetledi

Guri Singh@heygurisingh·29 Mar

English

316

1.1K

6.4K

1.3M

Paweł Szulc@EncodePanda·27 Mar

@Type_Whisper @WisprFlow I will definitely try it!

English

TypeWhisper@Type_Whisper·17 Mar

@EncodePanda @WisprFlow That's rough - constant CPU drain even idle is a real battery killer. TypeWhisper uses 0% when not actively recording, 100% offline, and it's free + open source. Since you like WhisperFlow's workflow, the transition is smooth. Worth trying: typewhisper.com

English

Paweł Szulc@EncodePanda·17 Mar

Hey @WisprFlow, I love the product (paying subscriber here). But why does it constantly need 5-8% of my CPU?! Even if not doing anything at a given moment?

English

477

Paweł Szulc@EncodePanda·26 Mar

@fresheyeball @MaineFrameworks @debasishg Blog post, video or did not happen :)

English

🌵@fresheyeball·26 Mar

@EncodePanda @MaineFrameworks @debasishg That's largely what I do with Paradox and F*, which lets me skip a bunch of those steps. Specifically my version is more deterministic.

Harney, CO 🇺🇸 English

Chase Saunders@MaineFrameworks·24 Mar

Has anyone tried Quint for specifications?

English

249

Keşfet

@jdegoes @kerckhove_ts @flodl_dev @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates