g023

5K posts

g023

@g023dev

developer/programmer/ai nerd

Canada Beigetreten Ekim 2023

2.3K Folgt514 Follower

Angehefteter Tweet

g023@g023dev·25 Nis

So I optimized the model, i optimized the harness, now I'm optimizing the endpoint by making an openai api to deepseek endpoint proxy that has some context compression features automatically integrated to attempt to save $$$ (works well with copilot): gist.github.com/g023/c2bb7b540…

English

285

g023@g023dev·6h

@oota_yoshinori0 thats the beauty of open source... its an adventure.

English

Yoshinori Oota@oota_yoshinori0·12h

ほう。試してみるか。

ollama@ollama

Gemma 4 Quantization-Aware Training (QAT) weights are now available on Ollama! They reduce memory requirements while maintaining model quality. E2B: ollama run gemma4:e2b-it-qat E4B: ollama run gemma4:e4b-it-qat 12B: ollama run gemma4:12b-it-qat 26B: ollama run gemma4:26b-a4b-it-qat 31B: ollama run gemma4:31b-it-qat Try them with ollama launch integrations to use with your favorite tools 👇👇👇

日本語

2.2K

g023@g023dev·6h

@RoyShilkrot ... also a lot less reading to see what it messed around with.

English

g023@g023dev·6h

@RoyShilkrot it truly does help to isolate and work on the problem as a component, rather than the whole, for speed and token efficiency. Especially when dealing with smaller models for tasks.

English

Roy Shilkrot@RoyShilkrot·23h

The bigger your software project is - the higher the context token cost is. Therefore the KISS principle in software dev still holds, 40 years after its inception. Intelligence is intelligence. Artificial or Human. Holding too much in context doesn’t scale. Keep small. Pay less

English

265

g023@g023dev·6h

@hyuki I mean 12b would be a bit large for that task for most people and might be really slow on large volumes of pics. You can find some nice ~2b models that can do a good enough job for most purposes.

English

103

結城浩 / Hiroshi Yuki@hyuki·18h

LM Studio + gemma-4-12b-qat でOpenAI コンパチなAPI持つローカルサーバ立ち上げると、無課金で画像処理AIが使える。たとえば大量のスクリーンショットや写真の分類整理やタグ付けにはぴったりではないだろうか。クラウドに出すのに抵抗があり、分量が多く、スピードと精度はそこそこで良い。

日本語

484

48.2K

g023@g023dev·6h

@antirez @ivanfioravanti I have little faith in those benchmarks. Real life is the best test.

English

213

antirez@antirez·6h

@ivanfioravanti No, if it misrepresents models in random ways, how is it good? Only because 5.5 happens to be on top?

English

1.8K

antirez@antirez·8h

For days, many folks here are citing DeepSWE as the benchmark that restores reality only because it shows GPT 5.5 on top. But actually, it almost gets a single entry right: the top one, and all the rest is shuffled.

English

136

21.9K

g023@g023dev·6h

@lmrankhan depending on the task, cleaving out the subagents altogether gives some surprisingly good results.

English

Imran@lmrankhan·14h

A lot of people are talking about running tons of agents, parallel workflows, skills, and orchestration layers. Honestly, for building an app, I've found two coding agents running in async works perfectly fine, Codex for backend and Opus/Claude Code for frontend. Haven't had to use more than that, skills, or complex workflows. The bottleneck is usually figuring out what to build, not how many agents you're running or using any of the advanced workflows. I'm sure there are more advanced things people are doing, but for most MVPs or early stage products, simplicity works

English

307

28.8K

g023@g023dev·7h

@yuhasbeentaken wow thats a pretty good incentive to burn tokens

English

Yum⋆₊˚@yuhasbeentaken·19h

at tencent (china’s largest internet company), the token reimbursement quota is dynamic. the more you use, the more you get when it refreshes next month. so… it kinda looks like you’re incentivized to build side projects at work? 😂😂

Zack Korman@ZackKorman

Companies are like "we are spending all this money on AI but we don't know what the devs are even doing with it." Let me answer that for you: They're working on their personal side projects.

English

1.4K

g023@g023dev·7h

@djcows Does give a bit of an esteem bump for the day when some bigwig top-dawg gives a response to your random yellings on the internet.

English

djcows@djcows·12h

you can dm some of the smartest people on earth here and they'll sometimes just answer casually, it's honestly crazy and humbling

English

138

2.8K

g023@g023dev·7h

@RoguePoma I share a lot of my things in public, and yes sometimes they are pretty raw but useful to me. Always like to learn from others too.

English

Alex Poma 🏗️@RoguePoma·1d

I’m building construction SaaS in public while working full-time as an architect. I’d like to connect with more people doing the same kind of thing: - Building products. - Learning in public. - Sharing the messy times - Sharing the good times What are you building?

English

1.4K

g023@g023dev·7h

@shyamalanadkat I think what would qualify as AGI would be a session that is always on, has infinite history that doesn't need to be cleared, and carries out its business on its own, either seeding itself with roles, or being directed to a role as a seed role.

English

shyamal@shyamalanadkat·19h

early days of agi are going to be so special

English

2.8K

g023@g023dev·7h

@rajyaligar @smhanov try using deepseek as a subagent and opus as orchestrator to stretch out the window

English

Raj@rajyaligar·10h

@smhanov Had to upgrade my codex sub this month from 5x to 20x cause the 5 hour window wasn’t cutting it for my workflows Big month for shipping

English

Steve Hanov@smhanov·1d

What was your AI bill? I've been using Claude Code and Hermes pretty heavily and up to $33 last month

English

261

g023@g023dev·8h

Made a deepseek powered agentic html editor tonite that runs amazing (of course because deepseek is amazing). Man we've come a long ways since Dreamweaver lol. Oh ya, deepseek made it too.

English

g023 retweetet

Viv@Vtrivedy10·10 Mar

x.com/i/article/2031…

ZXX

337

2.2K

784.5K

g023@g023dev·10h

@antoniolupetti I'm working on a concept: an agent that maintains a large, external, sparse key-value memory (not vector database, but differentiable memory like a sparse Transformer memory layer) that is updated during a single long session compressing past into mem tkns & retrieve w/attention

English

Antonio Lupetti@antoniolupetti·22h

"Graph Memory for LLM Agents" is a recent paper that explores an idea that I find quite interesting. Most AI memory systems treat remembering as a retrieval problem (the model searches its memory, retrieves relevant information, and then reasons about it). This paper argues that the process may be more dynamic than that and, instead of simply retrieving memories, an AI agent could reconstruct them during reasoning, following clues, associations, and intermediate evidence as they emerge. What I find interesting is the possibility that memory and reasoning may not be separate processes at all, but that remembering itself could be part of reasoning. arxiv.org/abs/2606.06036

English

2.4K

g023@g023dev·11h

@dosco Try the LFM2.5 models (especially the 8b A1B moe)

English

spacy@dosco·17h

my whole feed is local models after the big drops last week excited for this future it’s also exactly where DSPy and RLM wins

Alok@analogalok

a new 8GB VRAM GPU dense Local LLM leader was born yesterday runs on: RTX 4060 / RTX 3070 / RTX 2080. any 8GB card Qwen 3.5 9B (dense) was the go to for 6-8GB VRAM builds. Gemma 4 12B QAT (dense) just changed that. same llama.cpp + cuda 13.2. i7 12700H. 16GB RAM. same -ngl 99 flags. same 48k context. unsloth gemma-4-12b-it-Q4_K_M.gguf → 15 tok/sec @ 48k ctx unsloth gemma-4-12B-it-qat-UD-Q4_K_XL.gguf → 32 tok/sec @ 48k ctx → 26 tok/sec @ 64k ctx 64k context is a big deal. Hermes 3 agent requires 64k minimum to run. you're now getting full hermes compatible context on a budget consumer GPU at 26 tok/sec locally. 2.1x faster on identical hardware. and here's the part that breaks your brain: the QAT-UD-Q4_K_XL is actually SMALLER than the Q4_K_M "XL" why? QAT = Quantization Aware Training Google didn't train the model first and compress it later they trained it to be quantized from day one the weights already know how to survive low precision that's why you get more quality per byte llamacpp flags: -m gemma-4-12B-it-qat-UD-Q4_K_XL.gguf -cnv -ngl 99 -c 48000 -v fits in 8GB VRAM clean. no API. no cloud. no subscription. and this isn't even the MTP variant yet Gemma-4-E2B QAT runs on 3GB RAM, E4B on 5GB, 12B on 7GB, 26-A4B on 15GB and 31B on 18GB. I have benchmarked the 26b and 31b qat as well on a single RTX 4090, checkout the comments for details. If you have a 6GB or 8GB VRAM GPU, post your numbers. more benchmarks and configs coming soon

English

3.1K

g023@g023dev·11h

@ThePeterMick Haven't got me yet.

English

Peter Mick@ThePeterMick·1d

If you’re verified on X I want to follow you back Let me know if I haven’t followed you back

English

165

134

8.8K

g023@g023dev·11h

@Hikari_07_jp Proxmox?

English

Hikari∣LocalLLM⚡@Hikari_07_jp·13h

I'm in Tokyo for an AI-related conference. I'm 400 kilometers away from my home lab, but I can remotely connect using my Macbook and run experiments using VRAM anytime. To put it mildly, it's awesome✨

English

1.1K

g023@g023dev·11h

@TomTSEC the government is stealing money from the majority to give to a certain class of voters to buy their vote.

English

Tom Quiggin@TomTSEC·1d

Things have gotten so bad in Canada that the government is handing out money to people so they can afford groceries.

English

118

602

g023@g023dev·11h

@SolaTheAnalyst Try owning one in Calgary lol. Can't live without it, but you'll get taken to the cleaners.

English

Sola 🇨🇦🇳🇬@SolaTheAnalyst·1d

Owning a car in Toronto is a personality disorder. 🇨🇦 $200 insurance before you move it. $300 parking if you work downtown. The 401 on a Friday. The TTC is $156 a month. But sure. Keep the car.

English

135

327

62.5K

g023@g023dev·11h

@Sean_Speer Well considering AI is now being used in Alberta and BC to write all the police reports, guess what you'll be up against in court? These datacenters are for them, not you, but they'll be used against you for sure.

English

Sean Speer@Sean_Speer·1d

The Carney government gets it wrong on AI This week, the Carney government released AI for All, its long-awaited national artificial intelligence strategy. Although there are some useful aspects to the strategy—including the government’s recognition that Canada suffers too little AI adoption—its central premise is basically wrong. The document repeatedly frames AI through the lens of “sovereignty,” including the need for greater control over AI infrastructure, data, and advanced models. But sovereignty is a poor organizing principle for Canadian AI policy. Frontier AI development is increasingly concentrated among a handful of American and Chinese firms with capital budgets that exceed the annual spending of most national governments. The hyperscalers are investing hundreds of billions of dollars in chips, data centres, models, and talent. The notion that Ottawa can engineer a domestically controlled frontier AI ecosystem capable of competing head-to-head with those firms is an unserious starting point for Canadian policy. University of Toronto economist @Afinetheorem has made the point particularly well. In his view, countries such as Canada face a simple strategic choice: they must find a way to become essential to either the American or Chinese AI stack. Attempting to recreate a fully sovereign stack of our own is neither economically realistic nor technologically plausible. That insight exposes the main weakness of the government’s approach. The strategy contains pages of discussion about Canadian leadership, sovereignty, and domestic capacity. Yet it says comparatively little about how Canada will position itself within the global AI ecosystem that’s already emerging. There’s little discussion of guaranteed access to frontier models, Canada’s role in AI supply chains, or how Canadian firms can become indispensable partners to the companies building the world’s most advanced systems. Canada has genuine advantages. We possess abundant energy resources, a strong research base, world-class universities, significant mineral assets, and geographic proximity to the United States. The goal should be to leverage those strengths to attract investment, host infrastructure, develop specialized applications, and deepen our integration into the North American AI economy. Put simply: Canada’s AI future is more likely to depend on integration than independence. Yet if policymakers become so preoccupied with the political goal of sovereignty, they risk undermining the country’s place in the AI economy around taking shape.

The Hub@TheHubCanada

.@Sean_Speer: The Carney government gets it wrong on AI thehub.ca/2026/06/05/the…

English

121

18.2K

Entdecken

@oota_yoshinori0 @RoyShilkrot @hyuki @antirez @ivanfioravanti @lmrankhan @yuhasbeentaken @djcows