g023

5K posts

g023

@g023dev

developer/programmer/ai nerd

Canada Присоединился Ekim 2023

2.3K Подписки515 Подписчики

Закреплённый твит

g023@g023dev·25 Nis

So I optimized the model, i optimized the harness, now I'm optimizing the endpoint by making an openai api to deepseek endpoint proxy that has some context compression features automatically integrated to attempt to save $$$ (works well with copilot): gist.github.com/g023/c2bb7b540…

English

290

g023@g023dev·2m

@merlinaudio_ I prefer it for remembering what was done where, but still like to make excuses to burn my opus rips.

English

merlin@merlinaudio_·5h

in the age of AI at least some of us still code by hand

English

568

54.3K

g023@g023dev·27m

Check out Matthew Wesley on Facebook from Regina, Sask

English

g023@g023dev·1h

@n3r4 @vineerpasam Yep. They pointed at a mazak and said can you program this, and I was like yup and bam I got that job. Modern problems = modern solutions and all.

English

quick@n3r4·2h

that's a dumb interviewer. As a machinist/fabricator for 30+ years. we dont ask ppl to recite the machinery handbook, or specifics on weld amperage/wire speeds or pen depth.. or all of gd&t or cad/cam post processors. Even the most skilled 'engineers' cant answer them on the spot. bottom line is.. can they use their tools. id fire the interviewer. find someone that knows how to find talent.

English

127

Vineer@vineerpasam·8h

My vibe coder friend built multiple AI apps over the past year. He went into an interview yesterday thinking the company would be impressed by his project showcase. The interviewer asked him the difference between Git merge and Git rebase. My friend has never even pushed code without Claude Code's help 😭

English

669

137.1K

g023@g023dev·1h

Do you believe in Magic? (@deepseek_ai v4 powered)

English

g023@g023dev·1h

@tapodhana_ use llamacpp server. Way faster and more compatible.

English

tapodhana@tapodhana_·12h

Spent 2–3 hours setting up ollama, only to end up with dumb models that can’t even use hermes tools properly. Any local models that actually work well for instruction following and tool calling on 16gb ram and 6gb vram? Currently using gemma4:e2b(doesn't work!)

English

311

g023@g023dev·8h

@oota_yoshinori0 thats the beauty of open source... its an adventure.

English

Yoshinori Oota@oota_yoshinori0·15h

ほう。試してみるか。

ollama@ollama

Gemma 4 Quantization-Aware Training (QAT) weights are now available on Ollama! They reduce memory requirements while maintaining model quality. E2B: ollama run gemma4:e2b-it-qat E4B: ollama run gemma4:e4b-it-qat 12B: ollama run gemma4:12b-it-qat 26B: ollama run gemma4:26b-a4b-it-qat 31B: ollama run gemma4:31b-it-qat Try them with ollama launch integrations to use with your favorite tools 👇👇👇

日本語

g023@g023dev·8h

@RoyShilkrot ... also a lot less reading to see what it messed around with.

English

g023@g023dev·8h

@RoyShilkrot it truly does help to isolate and work on the problem as a component, rather than the whole, for speed and token efficiency. Especially when dealing with smaller models for tasks.

English

Roy Shilkrot@RoyShilkrot·1d

The bigger your software project is - the higher the context token cost is. Therefore the KISS principle in software dev still holds, 40 years after its inception. Intelligence is intelligence. Artificial or Human. Holding too much in context doesn’t scale. Keep small. Pay less

English

268

g023@g023dev·9h

@hyuki I mean 12b would be a bit large for that task for most people and might be really slow on large volumes of pics. You can find some nice ~2b models that can do a good enough job for most purposes.

English

129

結城浩 / Hiroshi Yuki@hyuki·21h

LM Studio + gemma-4-12b-qat でOpenAI コンパチなAPI持つローカルサーバ立ち上げると、無課金で画像処理AIが使える。たとえば大量のスクリーンショットや写真の分類整理やタグ付けにはぴったりではないだろうか。クラウドに出すのに抵抗があり、分量が多く、スピードと精度はそこそこで良い。

日本語

505

50.5K

g023@g023dev·9h

@antirez @ivanfioravanti I have little faith in those benchmarks. Real life is the best test.

English

237

antirez@antirez·9h

@ivanfioravanti No, if it misrepresents models in random ways, how is it good? Only because 5.5 happens to be on top?

English

antirez@antirez·10h

For days, many folks here are citing DeepSWE as the benchmark that restores reality only because it shows GPT 5.5 on top. But actually, it almost gets a single entry right: the top one, and all the rest is shuffled.

English

154

26.1K

g023@g023dev·9h

@lmrankhan depending on the task, cleaving out the subagents altogether gives some surprisingly good results.

English

Imran@lmrankhan·17h

A lot of people are talking about running tons of agents, parallel workflows, skills, and orchestration layers. Honestly, for building an app, I've found two coding agents running in async works perfectly fine, Codex for backend and Opus/Claude Code for frontend. Haven't had to use more than that, skills, or complex workflows. The bottleneck is usually figuring out what to build, not how many agents you're running or using any of the advanced workflows. I'm sure there are more advanced things people are doing, but for most MVPs or early stage products, simplicity works

English

321

31K

g023@g023dev·10h

@yuhasbeentaken wow thats a pretty good incentive to burn tokens

English

Yum⋆₊˚@yuhasbeentaken·22h

at tencent (china’s largest internet company), the token reimbursement quota is dynamic. the more you use, the more you get when it refreshes next month. so… it kinda looks like you’re incentivized to build side projects at work? 😂😂

Zack Korman@ZackKorman

Companies are like "we are spending all this money on AI but we don't know what the devs are even doing with it." Let me answer that for you: They're working on their personal side projects.

English

1.4K

g023@g023dev·10h

@djcows Does give a bit of an esteem bump for the day when some bigwig top-dawg gives a response to your random yellings on the internet.

English

djcows@djcows·14h

you can dm some of the smartest people on earth here and they'll sometimes just answer casually, it's honestly crazy and humbling

English

149

3.1K

g023@g023dev·10h

@RoguePoma I share a lot of my things in public, and yes sometimes they are pretty raw but useful to me. Always like to learn from others too.

English

Alex Poma 🏗️@RoguePoma·1d

I’m building construction SaaS in public while working full-time as an architect. I’d like to connect with more people doing the same kind of thing: - Building products. - Learning in public. - Sharing the messy times - Sharing the good times What are you building?

English

1.4K

g023@g023dev·10h

@shyamalanadkat I think what would qualify as AGI would be a session that is always on, has infinite history that doesn't need to be cleared, and carries out its business on its own, either seeding itself with roles, or being directed to a role as a seed role.

English

shyamal@shyamalanadkat·22h

early days of agi are going to be so special

English

g023@g023dev·10h

@rajyaligar @smhanov try using deepseek as a subagent and opus as orchestrator to stretch out the window

English

Raj@rajyaligar·13h

@smhanov Had to upgrade my codex sub this month from 5x to 20x cause the 5 hour window wasn’t cutting it for my workflows Big month for shipping

English

Steve Hanov@smhanov·1d

What was your AI bill? I've been using Claude Code and Hermes pretty heavily and up to $33 last month

English

273

g023@g023dev·11h

Made a deepseek powered agentic html editor tonite that runs amazing (of course because deepseek is amazing). Man we've come a long ways since Dreamweaver lol. Oh ya, deepseek made it too.

English

g023 ретвитнул

Viv@Vtrivedy10·10 Mar

x.com/i/article/2031…

ZXX

339

2.2K

793.3K

g023@g023dev·13h

@antoniolupetti I'm working on a concept: an agent that maintains a large, external, sparse key-value memory (not vector database, but differentiable memory like a sparse Transformer memory layer) that is updated during a single long session compressing past into mem tkns & retrieve w/attention

English

Antonio Lupetti@antoniolupetti·1d

"Graph Memory for LLM Agents" is a recent paper that explores an idea that I find quite interesting. Most AI memory systems treat remembering as a retrieval problem (the model searches its memory, retrieves relevant information, and then reasons about it). This paper argues that the process may be more dynamic than that and, instead of simply retrieving memories, an AI agent could reconstruct them during reasoning, following clues, associations, and intermediate evidence as they emerge. What I find interesting is the possibility that memory and reasoning may not be separate processes at all, but that remembering itself could be part of reasoning. arxiv.org/abs/2606.06036

English

2.5K

g023@g023dev·13h

@dosco Try the LFM2.5 models (especially the 8b A1B moe)

English

spacy@dosco·20h

my whole feed is local models after the big drops last week excited for this future it’s also exactly where DSPy and RLM wins

Alok@analogalok

a new 8GB VRAM GPU dense Local LLM leader was born yesterday runs on: RTX 4060 / RTX 3070 / RTX 2080. any 8GB card Qwen 3.5 9B (dense) was the go to for 6-8GB VRAM builds. Gemma 4 12B QAT (dense) just changed that. same llama.cpp + cuda 13.2. i7 12700H. 16GB RAM. same -ngl 99 flags. same 48k context. unsloth gemma-4-12b-it-Q4_K_M.gguf → 15 tok/sec @ 48k ctx unsloth gemma-4-12B-it-qat-UD-Q4_K_XL.gguf → 32 tok/sec @ 48k ctx → 26 tok/sec @ 64k ctx 64k context is a big deal. Hermes 3 agent requires 64k minimum to run. you're now getting full hermes compatible context on a budget consumer GPU at 26 tok/sec locally. 2.1x faster on identical hardware. and here's the part that breaks your brain: the QAT-UD-Q4_K_XL is actually SMALLER than the Q4_K_M "XL" why? QAT = Quantization Aware Training Google didn't train the model first and compress it later they trained it to be quantized from day one the weights already know how to survive low precision that's why you get more quality per byte llamacpp flags: -m gemma-4-12B-it-qat-UD-Q4_K_XL.gguf -cnv -ngl 99 -c 48000 -v fits in 8GB VRAM clean. no API. no cloud. no subscription. and this isn't even the MTP variant yet Gemma-4-E2B QAT runs on 3GB RAM, E4B on 5GB, 12B on 7GB, 26-A4B on 15GB and 31B on 18GB. I have benchmarked the 26b and 31b qat as well on a single RTX 4090, checkout the comments for details. If you have a 6GB or 8GB VRAM GPU, post your numbers. more benchmarks and configs coming soon

English

3.3K

Открыть

@merlinaudio_ @n3r4 @vineerpasam @deepseek_ai @tapodhana_ @oota_yoshinori0 @RoyShilkrot @hyuki