Aivan Monceller

1K posts

Aivan Monceller

@aivandroid

I share interesting things I find, what I’m learning, and projects I’m trying. Follow me if you like tech, creative stuff, and learning new things. INFJ-T

Sinapore ⇄ Philippines Katılım Mart 2018

1K Takip Edilen187 Takipçiler

Sabitlenmiş Tweet

Aivan Monceller@aivandroid·2 Ara

Created a video how to train Flux LoRAs using my convenience script utilizing ai-toolkit on @quickpodio youtu.be/0I06JHyvuwQ

YouTube

English

3.2K

Aivan Monceller@aivandroid·4h

@songjunkr SuperGrok should be able to get this right

English

송준 Jun Song@songjunkr·8h

DO NOT ASK AI ANYTHING ABOUT LOCAL LLM. They are not up to date. 😡 Q : what is optimal local llm for 3090 gpu? Gemini-3.1-pro : Qwen 2.5, Llama 3.1 GPT-Instant : Qwen3.6 35b, Qwen3 30b Sonnet-4.6 : Qwen3 14b, Qwen3.5 27b, Deepseek R1 Grok-Fast : Qwen3.5, Qwen3, GLM-4.7-Flash None of these are correct answer. Same results from Opus-4.7, Grok4.3, Gemini Deepthink. Only GPT5.5-PRO got the right answer : Qwen3.6-27b Now I know why people keep saying local LLM is stupid. 😮‍💨

English

152

14.4K

Aivan Monceller@aivandroid·4h

@catalinmpit Ask it to spawn multiple agents

English

429

Catalin@catalinmpit·9h

Now that I have the ChatGPT Pro plan, give me your best Codex tips and tricks.

English

15.8K

Aivan Monceller@aivandroid·4h

@theo so copilot beats codex in long running tasks?

English

336

Theo - t3.gg@theo·7h

I sent a single message on Copilot and it did over 60m tokens. It's still going. $30 of inference so far. In their current billing model, you get 1,500 messages, regardless of how expensive each is. I'm pretty sure I can do $45,000 of messaging on this plan

English

169

2.5K

193.1K

Aivan Monceller@aivandroid·4h

@davidfowl Can I use this to analyze perf of my uwp app?

English

David Fowler@davidfowl·10h

A CLI for helping agents do performance analysis on .NET projects github.com/adityamandalee… #dotnet #performance

English

105

Aivan Monceller@aivandroid·22h

@alicalimli_dev This is a game changer, I have done so many hacks in the past to just get that scrollbar stable

English

Ali@alicalimli_dev·2d

This one line of CSS will fix the annoying layout shift that scrollbars cause. This happens when a non-scrollable container becomes scrollable due to its content. This gets rid of that problem: .container { scrollbar-gutter: stable; } With that, space is reserved for the scrollbar before it even appears. So there's no layout shifts when content grows. Use both-edges if your content is centered. It mirrors the reserved space on both sides of the container to keep the layout balanced. If you found this one useful, follow for more. ❤️

English

61.3K

Aivan Monceller@aivandroid·22h

@thsottiaux I keep telling every person I talk to, how do I get Pro

English

Tibo@thsottiaux·3d

Tell your neighbor they can just codex things. Then come back and share their reaction with me here. How confused are they on a scale of 1 to 10.

English

246

1.2K

68.5K

Aivan Monceller@aivandroid·1d

Eight years in, custom-elements-ts just got its biggest update: a reactive html/render() runtime with deeply-proxied @State, in addition to the 6-decorator API I shipped in 2018. Native Web Components in TypeScript. Zero dependencies. Framework free. geocine.github.io/custom-element…

English

Aivan Monceller@aivandroid·1d

I built and have been maintaining custom-elements-ts since 2018 . TS decorators for native Web Components, zero deps. It sat quietly for years. This weekend GPT-5.5 + Codex helped me ship the part I never finished: a reactive html/render() runtime with deeply proxied @State, plus a todo-dashboard demo and a live showcase site. Same 2018 API. New reactive core. Still framework free. geocine.github.io/custom-element…

English

Vaibhav (VB) Srivastav@reach_vb·1d

Weekend hack: Build with GPT-5.5 + Codex. Drop your demo in the replies. #1 by likes: 1 year of ChatGPT Pro 2 runner-ups: 6 months each Bonus: Codex picks a wild card winner. Enjoy!

English

206

574

60.2K

Aivan Monceller@aivandroid·1d

@kohya_tech I hope you get it back , Pro is expensive and I can't even imagine life nowadays without ChatGPT

English

Kohya Tech@kohya_tech·1d

契約期間がまだ残ってるし、バグだな……。チャットボットと会話したら人間にエスカレーションされたけどしばらく使えないのはめっちゃ困るぜ。

日本語

1.8K

Kohya Tech@kohya_tech·1d

なんかChatGPTのWebが突然Freeプランになった(;･∀･)　Pro契約してるんだけど…。

日本語

6.3K

Aivan Monceller@aivandroid·1d

I built a fork of llama-swap that turns llama.cpp into a full OpenAI + Anthropic compatible server with reliable hot-swapping. Added protected web UI with persistent chats and image support, one-click GPU deploy, model edit/duplicate/delete, Codex support, and encrypted activity captures. github.com/geocine/llama-…

English

260

Aivan Monceller@aivandroid·1d

if you aren’t running e2e tests on your TUI, you’re just ship and praying. i finally got this full terminal emulator automation harness working on my local. no more manual clicking to see if the UI broke. how are you guys testing terminal layouts? or are we all just "eyeballing it" in 2026?

Chris Tate@ctatedev

Terminal automation + e2e testing solved Now as simple as snapshot, click, type: – wterm renders terminal-in-html, every cell in the a11y tree – agent-browser automates pages via the a11y tree Here's opencode in one browser driving Claude Code in another

English

Aivan Monceller@aivandroid·2d

@kohya_tech 8 tokens/sec is unusable though. But still impressive considering the 4GB VRAM

English

394

Kohya Tech@kohya_tech·2d

Gemma-4 26B-A4B、4GB VRAMのノートPCでも8tokens/secくらい出ている。

日本語

7.2K

Aivan Monceller@aivandroid·2d

@skane_eng 10x producing slop

English

skane_eng@skane_eng·3d

@aivandroid There are no 10x devs, there never was.

English

Aivan Monceller@aivandroid·4d

the era of the "code monkey" is officially over. if you're still measuring your value by lines of code, you're already a bottleneck. the next generation of 10x devs won't ship features. they'll ship the systems that ship features. from operator to architect.

English

Aivan Monceller@aivandroid·4d

@thsottiaux

QME

Tibo@thsottiaux·4d

Send us feature requests for codex in the form of an images 2.0 generated image. It makes it easier for codex to implement if we decide to go for it. Saw some good ones today already that codex is cooking on.

English

626

2.3K

176.7K

Aivan Monceller@aivandroid·5d

@reach_vb @_lewtun Can help investigate faster burning of tokens on codex app in windows, on cli all is good ..

English

Vaibhav (VB) Srivastav@reach_vb·5d

@_lewtun doubt npm i -g @openai/codex

English

2.4K

Lewis Tunstall@_lewtun·6d

We are so far from AGI

English

3.6K

Aivan Monceller@aivandroid·6d

@GorillaRogueGam @sudoingX What's the t/s for that setup?

English

Gorilla Rogue A.I.@GorillaRogueGam·6d

@sudoingX People can run Q8 Qwen 3.6 27b with full context in LM Studio easily as long as they have 64gb of system ram. Flash attention and KV cache for the win.

English

828

Sudo su@sudoingX·6d

"how do you fit qwen 3.6 27b q4 on 24gb at 262k context" lands in my dms 5 times a week. here is the exact memory math. model bytes at idle = 16gb (q4_k_m of 27b dense) kv cache at 262k context with q4_0 for both k and v = 5gb total = 21gb on the card headroom = 3gb for prompts and tool call traces the magic is the kv cache type. most people leave it at default fp16 or push to q8 thinking quality wins. on qwen 3.6 27b dense at 262k: - fp16 kv cache = does not fit at all - q8 kv cache = fits at 23gb but runs 3x slower (double penalty: more vram, less speed) - q4_0 kv cache = fits at 21gb at full speed (40 tok/s flat curve, same speed at 4k or 262k) most builders never test the kv cache type because tutorials never mention it. it is the single biggest unlock on consumer 24gb hardware. flags i run: ./llama-server -m Qwen3.6-27B-Q4_K_M.gguf -ngl 99 -c 262144 -np 1 -fa on --cache-type-k q4_0 --cache-type-v q4_0 what they do: -ngl 99 = offload everything to gpu -c 262144 = 262k context window -np 1 = single user slot (do not enable multi-slot, eats headroom) -fa on = flash attention on (memory and speed both win) --cache-type-k q4_0 --cache-type-v q4_0 = the unlock if you are sitting on 24gb and not running this config, you are leaving 250k of context on the table. or worse, you are running q8 kv cache and burning 3x your speed for nothing. q4 is not a compromise on consumer hardware. it is the right call.

English

109

1.3K

73K

Aivan Monceller@aivandroid·26 Nis

@OmniScopeBio @thsottiaux Which sub are you on?

English

Nigmat@OmniScopeBio·26 Nis

@thsottiaux I love u tibo, I use codex everyday more than 10h ( run even when I'm sleeping )

English

2.2K

Tibo@thsottiaux·26 Nis

Looking at the traffic dashboard for Codex just now, it would be scary if we didn't have a lot more compute coming online in the coming weeks. All according to plan fortunately.

English

251

101

4.9K

195.8K

Aivan Monceller@aivandroid·26 Nis

@thsottiaux Burning so much tokens on Windows codex app, compared to CLI, help

English

215

Keşfet

@songjunkr @catalinmpit @theo @davidfowl @alicalimli_dev @thsottiaux @kohya_tech @elonmusk