Sebastian

1.5K posts

Sebastian banner
Sebastian

Sebastian

@Danmoreng

https://t.co/GGiy86lGS7

Katılım Ocak 2013
95 Takip Edilen25 Takipçiler
Sebastian
Sebastian@Danmoreng·
@VictorTaelin Gemini 3.1 pro being number 2 doesn’t sit well with me. I have exclusively used Gemini-Cli & codex-cli and codex-cli just solved way more issues.
English
0
0
0
334
Taelin
Taelin@VictorTaelin·
Introducing LamBench . . . You asked me to make a benchmark, so I made it. It is a simple, old style Q&A consisting of 120 fresh λ-calculus programming questions. Some are easy, like "implement add for λ-encoded nats". Some are harder, like "derive a generic fold for arbitrary λ-encodings". It measures: - intelligence (% tasks completed) - elegance (BLC-length of solutions) - speed (completion time) Basically what I care about, other than long context. I made it today because I was excited about GPT 5.5. It didn't do too well ): (My first-day impression is that I can't tell the difference between GPT 5.5 and GPT 5.4. I would be lying if I said otherwise. I'd not be able to distinguish in a blind test. I need more time. It is much faster though.) This is a new, simple bench, so expect be bugs. Specially on OpenRouter models. I'll retest soon. Also, it was born saturated. V2 will be harder... ↓ Link and more charts below ↓
Taelin tweet media
English
57
50
875
46.6K
Tibo
Tibo@thsottiaux·
Rollout will be complete to 100% of paid users in the next 5 mins.
English
174
9
939
86.9K
Tibo
Tibo@thsottiaux·
Stop tweeting for a hot minute and update your Codex App to find full browser use, global dictation, non-dev mode, a new auto-review mode that is much safer than yolo, in-app docs and PDF viewer, and ... GPT-5.5.
OpenAI Developers@OpenAIDevs

With GPT-5.5, Codex now gets more of the job done across the browser, files, docs, and your computer. We've expanded browser use so Codex can interact with web apps, and test flows, click through pages, capture screenshots, and iterate on what it sees until it completes the task.

English
298
190
3.5K
288.7K
Sebastian
Sebastian@Danmoreng·
@davideciffa I actually just linked the benchmark results. In short: 267 t/s vs llama.cpp 222 t/s.
English
0
0
2
256
Sandro
Sandro@pupposandro·
The new Qwen3.6-27B now runs on Luce DFlash. Up to 2x throughput on a single RTX 3090. Qwen3.6-27B ships the same Qwen35 architecture string and identical layer/head dims as 3.5, so the existing DFlash draft + DDTree stack loads it as-is. Throughput is lower than on 3.5. Looking forward for the updated version from the DFlash team to implement it as well! Repo in the first comment ⬇️
Sandro tweet media
English
34
41
473
118.8K
Sebastian
Sebastian@Danmoreng·
@__tinygrad__ Really interesting, tried last two evenings to get it running fast on CUDA by letting Codex implement inference from scratch. Currently I am at ~180 t/s while llama.cpp gives me 215 t/s on my RTX 5080 mobile. should try tinygrad as well probably.
English
0
0
0
801
the tiny corp
the tiny corp@__tinygrad__·
We set out to replicate Kimi's 193 tok/s Qwen3.5-0.8B on M3 Max. Our baseline is already 178 tok/s, beating LMStudio (160) and llama.cpp (140) out of the box, but with tinygrad's custom kernel feature Claude cranked it to 195.7!
the tiny corp tweet media
English
20
14
525
74.1K
Sebastian
Sebastian@Danmoreng·
@antirez I agree on react and angular being cumbersome. I love vuejs as framework though - but only for complex applications. Websites which don't need lots of reusable components or centralised state management are doing just fine with vanilla JavaScript.
English
0
0
0
182
antirez
antirez@antirez·
My POV on front-end of 2026
antirez tweet media
English
55
48
749
251.5K
Sebastian
Sebastian@Danmoreng·
@ngxson I don’t have the comparison with Claude Code, but I’ve worked quite a bit with gemini-cli and codex-cli and sadly must say that codex 5.3 is superior to gemini 3.1 pro at the moment. Simply gets more things in C++ & Kotlin development right.
English
0
0
2
110
Xuan-Son Nguyen
Xuan-Son Nguyen@ngxson·
I stopped using claude code on all of my llama.cpp workflows for the past few days. The quality degradation is just too significant. Experimenting on a mixed usage between Gemma 4 26B-A4B and Gemini 3.1 Pro, so far much better than what anthropic can offer.
Simon Willison@simonw

Shocking result on my pelican benchmark this morning, I got a better pelican from a 21GB local Qwen3.6-35B-A3B running on my laptop than I did from the new Opus 4.7! Qwen on the left, Opus on the right

English
2
1
29
2.3K
Xuan-Son Nguyen
Xuan-Son Nguyen@ngxson·
llama.cpp now supports Qwen3-ASR, Qwen3-Omni and Gemma 4 audio/vision input 🔥 Mixed modalities is the future 😼😼
Xuan-Son Nguyen tweet media
English
4
11
97
4.1K
Sebastian
Sebastian@Danmoreng·
@zeeg The biggest gap is the included knowledge, simply because of model size - other than that I'd argue yes Gpt 4 level. Conversational and reasoning capabilities are really really good. For example Qwen3.5 35B-A3B (22GB) or Gemma4 26B-A4B (17GB).
English
0
0
0
101
David Cramer
David Cramer@zeeg·
@Danmoreng i dont think thats actually true? you might be able to do some fun childrens stories, but i dont think local hardware runs an equiv to gpt.. 4?
English
3
0
2
837
David Cramer
David Cramer@zeeg·
an awful lot of people promote local models when they're unusable (hardware wise, perf wise, or simple outcomes) one of the many small litmus tests of "does this person have anything to contribute to the conversation"
English
23
5
113
69.1K
Christian P.
Christian P.@Bassmaster187·
Wenn ich mir anschaue, was kostenlose KI LLMs können, die man Lokal verwenden kann (anständige Grafikkarte vorausgesetzt), dann wird es echt schwer die hunderte Milliarden, die OpenAI, xAi und Anthropic, wieder einzunehmen. Die ersten Schritte, die ich gemacht habe, waren nicht schlechter als Codex oder Opus. Ich werde die Tage berichten. Qwen 2.5-coder 14b ist für mein Setup ziemlich schnell.
Christian P. tweet mediaChristian P. tweet mediaChristian P. tweet mediaChristian P. tweet media
Deutsch
25
1
60
7.7K
Sebastian
Sebastian@Danmoreng·
@TheEthanDing As someone who has to use workday, this worries me for Anthropic software quality. 🙃
English
0
0
0
133
Sebastian
Sebastian@Danmoreng·
@karpathy I count myself to the second group as a daily Codex user for lots of programming. While current coding agents are crazy good already, they still lack A LOT of good software engineering. You need to steer closely to avoid rookie mistakes - without SWE knowledge you cannot do that.
English
0
0
0
93
Andrej Karpathy
Andrej Karpathy@karpathy·
Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English
1.1K
2.5K
20.3K
4.2M
Sebastian
Sebastian@Danmoreng·
@wesbos Remind me in ~3 years, when all major browsers have this in stable release...
English
1
0
1
414
Taelin
Taelin@VictorTaelin·
Any local llm nerd around? I'm trying to run speculative encoding on Gemma 26B A4B. I'm newbie to running that stuff locally, got it to 200 B tokens/s on B200, I wonder if I could make it much faster?
English
36
4
197
24.5K
Sebastian
Sebastian@Danmoreng·
@LyalinDotCom Until a salesman presents customer a solution that he vibe-coded and the customer wants exactly that until he hears what the actual timeline and costs for a proper implementation would be from the engineers.🥲
English
0
0
0
9
Dmitry Lyalin
Dmitry Lyalin@LyalinDotCom·
Yes. This.
Ivan Burazin@ivanburazin

The founder of Postman says you have to kill your existing org chart, especially if you're still operating with a pre ai hierarchy arrangement. The modern org chart, according to @a85: - wide span of control (even within exec team) - work directly with ICs, not through layers - either you're building, or you're selling Projects are led by staff/principal engineers with high agency. They see across the board as well as deep in the stack. Product managers are building APIs and prototyping in Claude instead of writing PRDs. Designers are shipping PRs through Cursor directly instead of relying solely on Figma. Everyone is building. And the management's job is to develop better judgment.

English
2
0
5
2.2K
Sebastian
Sebastian@Danmoreng·
@zeeg What do you need WSL for though? Just use codex-cli in normal Powershell, works fine. Codex app didn't work for me when I tried though.
English
0
0
0
103
David Cramer
David Cramer@zeeg·
can someone besides Microsoft please make a coding harness (UI) that works with WSL? I realize I'm a unicorn over here running Windows, but you too one day will get fed up with iOS as a desktop OS.
English
70
1
129
26.1K
Sebastian
Sebastian@Danmoreng·
@antirez @BlackOpsREPL For one-shot it is, but it still works okayish. Just tried that this weekend: extract how googles android app edge gallery uses Gemma 4, to implement that into my own app. Worked pretty well, until I tried GPU instead of CPU. Had Codex look again: it forgot the GPU libraries.
English
1
0
2
60
antirez
antirez@antirez·
@BlackOpsREPL I believe "extract xyz" -> MD file is too lossy.
English
2
0
11
1.9K
antirez
antirez@antirez·
One of the most powerful automatic coding (autocoding) trick that almost nobody uses: "Create this new project as specified in SPEC.md using as guide for coding style, design sensibility, comments, ..., the code at /foo/bar/". Style transfer is very powerful.
English
19
13
357
27.2K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
Is anyone able to tell me the llama-server arguments used to create this? I tried with many possible combinations without success. Thanks for anyone willing to help 🙏 I love this idea and I'd like to dig deeper.
Georgi Gerganov@ggerganov

Let me demonstrate the true power of llama.cpp: - Running on Mac Studio M2 Ultra (3 years old) - Gemma 4 26B A4B Q8_0 (full quality) - Built-in WebUI (ships with llama.cpp) - MCP support out of the box (web-search, HF, github, etc.) - Prompt speculative decoding The result: 300t/s (realtime video)

English
12
7
64
13.6K