Aivan Monceller

1K posts

Aivan Monceller banner
Aivan Monceller

Aivan Monceller

@aivandroid

I share interesting things I find, what I’m learning, and projects I’m trying. Follow me if you like tech, creative stuff, and learning new things. INFJ-T

Sinapore ⇄ Philippines Katılım Mart 2018
1K Takip Edilen187 Takipçiler
송준 Jun Song
송준 Jun Song@songjunkr·
DO NOT ASK AI ANYTHING ABOUT LOCAL LLM. They are not up to date. 😡 Q : what is optimal local llm for 3090 gpu? Gemini-3.1-pro : Qwen 2.5, Llama 3.1 GPT-Instant : Qwen3.6 35b, Qwen3 30b Sonnet-4.6 : Qwen3 14b, Qwen3.5 27b, Deepseek R1 Grok-Fast : Qwen3.5, Qwen3, GLM-4.7-Flash None of these are correct answer. Same results from Opus-4.7, Grok4.3, Gemini Deepthink. Only GPT5.5-PRO got the right answer : Qwen3.6-27b Now I know why people keep saying local LLM is stupid. 😮‍💨
송준 Jun Song tweet media송준 Jun Song tweet media송준 Jun Song tweet media송준 Jun Song tweet media
English
54
8
152
14.4K
Catalin
Catalin@catalinmpit·
Now that I have the ChatGPT Pro plan, give me your best Codex tips and tricks.
Catalin tweet media
English
35
4
90
15.8K
Theo - t3.gg
Theo - t3.gg@theo·
I sent a single message on Copilot and it did over 60m tokens. It's still going. $30 of inference so far. In their current billing model, you get 1,500 messages, regardless of how expensive each is. I'm pretty sure I can do $45,000 of messaging on this plan
Theo - t3.gg tweet media
English
169
17
2.5K
193.1K
Aivan Monceller
Aivan Monceller@aivandroid·
@alicalimli_dev This is a game changer, I have done so many hacks in the past to just get that scrollbar stable
English
0
0
0
15
Ali
Ali@alicalimli_dev·
This one line of CSS will fix the annoying layout shift that scrollbars cause. This happens when a non-scrollable container becomes scrollable due to its content. This gets rid of that problem: .container { scrollbar-gutter: stable; } With that, space is reserved for the scrollbar before it even appears. So there's no layout shifts when content grows. Use both-edges if your content is centered. It mirrors the reserved space on both sides of the container to keep the layout balanced. If you found this one useful, follow for more. ❤️
English
12
77
1K
61.3K
Tibo
Tibo@thsottiaux·
Tell your neighbor they can just codex things. Then come back and share their reaction with me here. How confused are they on a scale of 1 to 10.
English
246
11
1.2K
68.5K
Aivan Monceller
Aivan Monceller@aivandroid·
Eight years in, custom-elements-ts just got its biggest update: a reactive html/render() runtime with deeply-proxied @State, in addition to the 6-decorator API I shipped in 2018. Native Web Components in TypeScript. Zero dependencies. Framework free. geocine.github.io/custom-element…
English
0
0
1
38
Aivan Monceller
Aivan Monceller@aivandroid·
I built and have been maintaining custom-elements-ts since 2018 . TS decorators for native Web Components, zero deps. It sat quietly for years. This weekend GPT-5.5 + Codex helped me ship the part I never finished: a reactive html/render() runtime with deeply proxied @State, plus a todo-dashboard demo and a live showcase site. Same 2018 API. New reactive core. Still framework free. geocine.github.io/custom-element…
English
0
0
1
78
Vaibhav (VB) Srivastav
Weekend hack: Build with GPT-5.5 + Codex. Drop your demo in the replies. #1 by likes: 1 year of ChatGPT Pro 2 runner-ups: 6 months each Bonus: Codex picks a wild card winner. Enjoy!
English
206
28
574
60.2K
Aivan Monceller
Aivan Monceller@aivandroid·
@kohya_tech I hope you get it back , Pro is expensive and I can't even imagine life nowadays without ChatGPT
English
1
0
0
60
Kohya Tech
Kohya Tech@kohya_tech·
契約期間がまだ残ってるし、バグだな……。チャットボットと会話したら人間にエスカレーションされたけどしばらく使えないのはめっちゃ困るぜ。
日本語
2
0
2
1.8K
Kohya Tech
Kohya Tech@kohya_tech·
なんかChatGPTのWebが突然Freeプランになった(;・∀・) Pro契約してるんだけど…。
日本語
2
0
17
6.3K
Aivan Monceller
Aivan Monceller@aivandroid·
I built a fork of llama-swap that turns llama.cpp into a full OpenAI + Anthropic compatible server with reliable hot-swapping. Added protected web UI with persistent chats and image support, one-click GPU deploy, model edit/duplicate/delete, Codex support, and encrypted activity captures. github.com/geocine/llama-…
English
0
0
0
260
Aivan Monceller
Aivan Monceller@aivandroid·
if you aren’t running e2e tests on your TUI, you’re just ship and praying. ​i finally got this full terminal emulator automation harness working on my local. no more manual clicking to see if the UI broke. ​how are you guys testing terminal layouts? or are we all just "eyeballing it" in 2026?
Chris Tate@ctatedev

Terminal automation + e2e testing solved Now as simple as snapshot, click, type: – wterm renders terminal-in-html, every cell in the a11y tree – agent-browser automates pages via the a11y tree Here's opencode in one browser driving Claude Code in another

English
0
0
1
57
Aivan Monceller
Aivan Monceller@aivandroid·
@kohya_tech 8 tokens/sec is unusable though. But still impressive considering the 4GB VRAM
English
1
0
0
394
Kohya Tech
Kohya Tech@kohya_tech·
Gemma-4 26B-A4B、4GB VRAMのノートPCでも8tokens/secくらい出ている。
日本語
2
3
50
7.2K
Aivan Monceller
Aivan Monceller@aivandroid·
the era of the "code monkey" is officially over. if you're still measuring your value by lines of code, you're already a bottleneck. the next generation of 10x devs won't ship features. they'll ship the systems that ship features. from operator to architect.
Aivan Monceller tweet media
English
1
0
1
60
Tibo
Tibo@thsottiaux·
Send us feature requests for codex in the form of an images 2.0 generated image. It makes it easier for codex to implement if we decide to go for it. Saw some good ones today already that codex is cooking on.
English
626
51
2.3K
176.7K
Gorilla Rogue A.I.
Gorilla Rogue A.I.@GorillaRogueGam·
@sudoingX People can run Q8 Qwen 3.6 27b with full context in LM Studio easily as long as they have 64gb of system ram. Flash attention and KV cache for the win.
English
1
0
1
828
Sudo su
Sudo su@sudoingX·
"how do you fit qwen 3.6 27b q4 on 24gb at 262k context" lands in my dms 5 times a week. here is the exact memory math. model bytes at idle = 16gb (q4_k_m of 27b dense) kv cache at 262k context with q4_0 for both k and v = 5gb total = 21gb on the card headroom = 3gb for prompts and tool call traces the magic is the kv cache type. most people leave it at default fp16 or push to q8 thinking quality wins. on qwen 3.6 27b dense at 262k: - fp16 kv cache = does not fit at all - q8 kv cache = fits at 23gb but runs 3x slower (double penalty: more vram, less speed) - q4_0 kv cache = fits at 21gb at full speed (40 tok/s flat curve, same speed at 4k or 262k) most builders never test the kv cache type because tutorials never mention it. it is the single biggest unlock on consumer 24gb hardware. flags i run: ./llama-server -m Qwen3.6-27B-Q4_K_M.gguf -ngl 99 -c 262144 -np 1 -fa on --cache-type-k q4_0 --cache-type-v q4_0 what they do: -ngl 99 = offload everything to gpu -c 262144 = 262k context window -np 1 = single user slot (do not enable multi-slot, eats headroom) -fa on = flash attention on (memory and speed both win) --cache-type-k q4_0 --cache-type-v q4_0 = the unlock if you are sitting on 24gb and not running this config, you are leaving 250k of context on the table. or worse, you are running q8 kv cache and burning 3x your speed for nothing. q4 is not a compromise on consumer hardware. it is the right call.
English
85
109
1.3K
73K
Nigmat
Nigmat@OmniScopeBio·
@thsottiaux I love u tibo, I use codex everyday more than 10h ( run even when I'm sleeping )
English
2
0
7
2.2K
Tibo
Tibo@thsottiaux·
Looking at the traffic dashboard for Codex just now, it would be scary if we didn't have a lot more compute coming online in the coming weeks. All according to plan fortunately.
English
251
101
4.9K
195.8K