Sebastian

1.5K posts

Sebastian

@Danmoreng

https://t.co/GGiy86lGS7

Katılım Ocak 2013

95 Takip Edilen25 Takipçiler

Sebastian@Danmoreng·22h

@VictorTaelin Gemini 3.1 pro being number 2 doesn’t sit well with me. I have exclusively used Gemini-Cli & codex-cli and codex-cli just solved way more issues.

English

334

Taelin@VictorTaelin·1d

Introducing LamBench . . . You asked me to make a benchmark, so I made it. It is a simple, old style Q&A consisting of 120 fresh λ-calculus programming questions. Some are easy, like "implement add for λ-encoded nats". Some are harder, like "derive a generic fold for arbitrary λ-encodings". It measures: - intelligence (% tasks completed) - elegance (BLC-length of solutions) - speed (completion time) Basically what I care about, other than long context. I made it today because I was excited about GPT 5.5. It didn't do too well ): (My first-day impression is that I can't tell the difference between GPT 5.5 and GPT 5.4. I would be lying if I said otherwise. I'd not be able to distinguish in a blind test. I need more time. It is much faster though.) This is a new, simple bench, so expect be bugs. Specially on OpenRouter models. I'll retest soon. Also, it was born saturated. V2 will be harder... ↓ Link and more charts below ↓

English

875

46.6K

Sebastian@Danmoreng·1d

@thsottiaux Not for EU enterprise plan yet 👀

English

174

Tibo@thsottiaux·1d

Rollout will be complete to 100% of paid users in the next 5 mins.

English

174

939

86.9K

Tibo@thsottiaux·1d

Stop tweeting for a hot minute and update your Codex App to find full browser use, global dictation, non-dev mode, a new auto-review mode that is much safer than yolo, in-app docs and PDF viewer, and ... GPT-5.5.

OpenAI Developers@OpenAIDevs

With GPT-5.5, Codex now gets more of the job done across the browser, files, docs, and your computer. We've expanded browser use so Codex can interact with web apps, and test flows, click through pages, capture screenshots, and iterate on what it sees until it completes the task.

English

298

190

3.5K

288.7K

Sebastian@Danmoreng·2d

@davideciffa I actually just linked the benchmark results. In short: 267 t/s vs llama.cpp 222 t/s.

English

256

mrciffa@davideciffa·2d

@Danmoreng Great 🚀 how much did you get?

English

1.3K

mrciffa@davideciffa·2d

I love tinygrad, but with our megakernel you can go to 415 tok/s in decoding speed 🚄

the tiny corp@__tinygrad__

We set out to replicate Kimi's 193 tok/s Qwen3.5-0.8B on M3 Max. Our baseline is already 178 tok/s, beating LMStudio (160) and llama.cpp (140) out of the box, but with tinygrad's custom kernel feature Claude cranked it to 195.7!

English

274

90.5K

Sebastian@Danmoreng·2d

@pupposandro Q4 will be to big for 16GB gpu, right? 🥲

English

736

Sandro@pupposandro·2d

The new Qwen3.6-27B now runs on Luce DFlash. Up to 2x throughput on a single RTX 3090. Qwen3.6-27B ships the same Qwen35 architecture string and identical layer/head dims as 3.5, so the existing DFlash draft + DDTree stack loads it as-is. Throughput is lower than on 3.5. Looking forward for the updated version from the DFlash team to implement it as well! Repo in the first comment ⬇️

English

473

118.8K

Sebastian@Danmoreng·2d

@__tinygrad__ Really interesting, tried last two evenings to get it running fast on CUDA by letting Codex implement inference from scratch. Currently I am at ~180 t/s while llama.cpp gives me 215 t/s on my RTX 5080 mobile. should try tinygrad as well probably.

English

801

the tiny corp@__tinygrad__·2d

English

525

74.1K

Sebastian@Danmoreng·3d

@antirez I agree on react and angular being cumbersome. I love vuejs as framework though - but only for complex applications. Websites which don't need lots of reusable components or centralised state management are doing just fine with vanilla JavaScript.

English

182

antirez@antirez·3d

My POV on front-end of 2026

English

749

251.5K

Sebastian@Danmoreng·16 Nis

@ngxson I don’t have the comparison with Claude Code, but I’ve worked quite a bit with gemini-cli and codex-cli and sadly must say that codex 5.3 is superior to gemini 3.1 pro at the moment. Simply gets more things in C++ & Kotlin development right.

English

110

Xuan-Son Nguyen@ngxson·16 Nis

I stopped using claude code on all of my llama.cpp workflows for the past few days. The quality degradation is just too significant. Experimenting on a mixed usage between Gemma 4 26B-A4B and Gemini 3.1 Pro, so far much better than what anthropic can offer.

Simon Willison@simonw

Shocking result on my pelican benchmark this morning, I got a better pelican from a 21GB local Qwen3.6-35B-A3B running on my laptop than I did from the new Opus 4.7! Qwen on the left, Opus on the right

English

2.3K

Sebastian@Danmoreng·13 Nis

@ngxson What about voxtral? I saw your issue about planning for that and left a comment github.com/ggml-org/llama… Also I would be very interested in TTS model support.

English

151

Xuan-Son Nguyen@ngxson·13 Nis

llama.cpp now supports Qwen3-ASR, Qwen3-Omni and Gemma 4 audio/vision input 🔥 Mixed modalities is the future 😼😼

English

4.1K

Sebastian@Danmoreng·12 Nis

@zeeg The biggest gap is the included knowledge, simply because of model size - other than that I'd argue yes Gpt 4 level. Conversational and reasoning capabilities are really really good. For example Qwen3.5 35B-A3B (22GB) or Gemma4 26B-A4B (17GB).

English

101

David Cramer@zeeg·12 Nis

@Danmoreng i dont think thats actually true? you might be able to do some fun childrens stories, but i dont think local hardware runs an equiv to gpt.. 4?

English

837

David Cramer@zeeg·11 Nis

an awful lot of people promote local models when they're unusable (hardware wise, perf wise, or simple outcomes) one of the many small litmus tests of "does this person have anything to contribute to the conversation"

English

113

69.1K

Sebastian@Danmoreng·11 Nis

@Bassmaster187 Wenn du jetzt noch llama.cpp und ein aktuelles Model verwendest, wirst du dich wundern, was tatsächlich schon geht. github.com/Danmoreng/loca…

Deutsch

Christian P.@Bassmaster187·11 Nis

Wenn ich mir anschaue, was kostenlose KI LLMs können, die man Lokal verwenden kann (anständige Grafikkarte vorausgesetzt), dann wird es echt schwer die hunderte Milliarden, die OpenAI, xAi und Anthropic, wieder einzunehmen. Die ersten Schritte, die ich gemacht habe, waren nicht schlechter als Codex oder Opus. Ich werde die Tage berichten. Qwen 2.5-coder 14b ist für mein Setup ziemlich schnell.

Deutsch

7.7K

Sebastian@Danmoreng·10 Nis

@TheEthanDing As someone who has to use workday, this worries me for Anthropic software quality. 🙃

English

133

ethan ding 📊@TheEthanDing·8 Nis

HOW IS IT NOT CONSIDERED NEWS AT ALL THAT PETER BAILIS, CTO OF WORKDAY, JUST LEFT TO JOIN ANTHROPIC AS AN ENGINEER

Matt Slotnick@matt_slotnick

did they really have to rub it in with the title like this

English

138

328

13K

2.9M

Sebastian@Danmoreng·10 Nis

@karpathy I count myself to the second group as a daily Codex user for lots of programming. While current coding agents are crazy good already, they still lack A LOT of good software engineering. You need to steer closely to avoid rookie mistakes - without SWE knowledge you cannot do that.

English

Andrej Karpathy@karpathy·9 Nis

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English

1.1K

2.5K

20.3K

4.2M

Sebastian@Danmoreng·8 Nis

@wesbos Remind me in ~3 years, when all major browsers have this in stable release...

English

414

Wes Bos@wesbos·8 Nis

Wobble Buttons made with HTML in Canvas I can't stop clicking these

Wes Bos@wesbos

HTML in Canvas API is NUTS

English

510

53K

Sebastian@Danmoreng·8 Nis

@VictorTaelin You might mean this, though I didn't get it to work for myself yet, with any model. x.com/Danmoreng/stat…

Sebastian@Danmoreng

@ivanfioravanti Im pretty sure it’s about this PR github.com/ggml-org/llama… documented how to use here: #n-gram-mod-ngram-mod" target="_blank" rel="nofollow noopener">github.com/ggml-org/llama…

English

608

Taelin@VictorTaelin·8 Nis

Any local llm nerd around? I'm trying to run speculative encoding on Gemma 26B A4B. I'm newbie to running that stuff locally, got it to 200 B tokens/s on B200, I wonder if I could make it much faster?

English

197

24.5K

Sebastian@Danmoreng·7 Nis

@LyalinDotCom Until a salesman presents customer a solution that he vibe-coded and the customer wants exactly that until he hears what the actual timeline and costs for a proper implementation would be from the engineers.🥲

English

Dmitry Lyalin@LyalinDotCom·7 Nis

Yes. This.

Ivan Burazin@ivanburazin

The founder of Postman says you have to kill your existing org chart, especially if you're still operating with a pre ai hierarchy arrangement. The modern org chart, according to @a85: - wide span of control (even within exec team) - work directly with ICs, not through layers - either you're building, or you're selling Projects are led by staff/principal engineers with high agency. They see across the board as well as deep in the stack. Product managers are building APIs and prototyping in Claude instead of writing PRDs. Designers are shipping PRs through Cursor directly instead of relying solely on Figma. Everyone is building. And the management's job is to develop better judgment.

English

2.2K

Sebastian@Danmoreng·6 Nis

@zeeg What do you need WSL for though? Just use codex-cli in normal Powershell, works fine. Codex app didn't work for me when I tried though.

English

103

David Cramer@zeeg·6 Nis

can someone besides Microsoft please make a coding harness (UI) that works with WSL? I realize I'm a unicorn over here running Windows, but you too one day will get fed up with iOS as a desktop OS.

English

129

26.1K

Sebastian@Danmoreng·6 Nis

@antirez @BlackOpsREPL For one-shot it is, but it still works okayish. Just tried that this weekend: extract how googles android app edge gallery uses Gemma 4, to implement that into my own app. Worked pretty well, until I tried GPU instead of CPU. Had Codex look again: it forgot the GPU libraries.

English

antirez@antirez·6 Nis

@BlackOpsREPL I believe "extract xyz" -> MD file is too lossy.

English

1.9K

antirez@antirez·6 Nis

One of the most powerful automatic coding (autocoding) trick that almost nobody uses: "Create this new project as specified in SPEC.md using as guide for coding style, design sensibility, comments, ..., the code at /foo/bar/". Style transfer is very powerful.

English

357

27.2K

Sebastian@Danmoreng·4 Nis

English

932

Ivan Fioravanti ᯅ@ivanfioravanti·4 Nis

Is anyone able to tell me the llama-server arguments used to create this? I tried with many possible combinations without success. Thanks for anyone willing to help 🙏 I love this idea and I'd like to dig deeper.

Georgi Gerganov@ggerganov

Let me demonstrate the true power of llama.cpp: - Running on Mac Studio M2 Ultra (3 years old) - Gemma 4 26B A4B Q8_0 (full quality) - Built-in WebUI (ships with llama.cpp) - MCP support out of the box (web-search, HF, github, etc.) - Prompt speculative decoding The result: 300t/s (realtime video)

English

13.6K

Keşfet

@VictorTaelin @thsottiaux @davideciffa @pupposandro @__tinygrad__ @antirez @ngxson @zeeg