g023

5K posts

g023 banner
g023

g023

@g023dev

developer/programmer/ai nerd

Canada شامل ہوئے Ekim 2023
2.3K فالونگ515 فالوورز
پن کیا گیا ٹویٹ
g023
g023@g023dev·
So I optimized the model, i optimized the harness, now I'm optimizing the endpoint by making an openai api to deepseek endpoint proxy that has some context compression features automatically integrated to attempt to save $$$ (works well with copilot): gist.github.com/g023/c2bb7b540…
English
0
0
4
290
g023
g023@g023dev·
@merlinaudio_ I prefer it for remembering what was done where, but still like to make excuses to burn my opus rips.
English
0
0
0
1
merlin
merlin@merlinaudio_·
in the age of AI at least some of us still code by hand
English
39
29
565
53.7K
g023
g023@g023dev·
Check out Matthew Wesley on Facebook from Regina, Sask
g023 tweet media
English
0
0
0
8
g023
g023@g023dev·
@n3r4 @vineerpasam Yep. They pointed at a mazak and said can you program this, and I was like yup and bam I got that job. Modern problems = modern solutions and all.
English
0
0
1
13
quick
quick@n3r4·
that's a dumb interviewer. As a machinist/fabricator for 30+ years. we dont ask ppl to recite the machinery handbook, or specifics on weld amperage/wire speeds or pen depth.. or all of gd&t or cad/cam post processors. Even the most skilled 'engineers' cant answer them on the spot. bottom line is.. can they use their tools. id fire the interviewer. find someone that knows how to find talent.
English
1
0
1
127
Vineer
Vineer@vineerpasam·
My vibe coder friend built multiple AI apps over the past year. He went into an interview yesterday thinking the company would be impressed by his project showcase. The interviewer asked him the difference between Git merge and Git rebase. My friend has never even pushed code without Claude Code's help 😭
English
76
20
664
136.4K
g023
g023@g023dev·
Do you believe in Magic? (@deepseek_ai v4 powered)
English
0
0
1
12
g023
g023@g023dev·
@tapodhana_ use llamacpp server. Way faster and more compatible.
English
0
0
0
12
tapodhana
tapodhana@tapodhana_·
Spent 2–3 hours setting up ollama, only to end up with dumb models that can’t even use hermes tools properly. Any local models that actually work well for instruction following and tool calling on 16gb ram and 6gb vram? Currently using gemma4:e2b(doesn't work!)
tapodhana tweet media
English
2
0
5
310
g023
g023@g023dev·
@oota_yoshinori0 thats the beauty of open source... its an adventure.
English
0
0
0
13
g023
g023@g023dev·
@RoyShilkrot ... also a lot less reading to see what it messed around with.
English
0
0
0
10
g023
g023@g023dev·
@RoyShilkrot it truly does help to isolate and work on the problem as a component, rather than the whole, for speed and token efficiency. Especially when dealing with smaller models for tasks.
English
1
0
1
11
Roy Shilkrot
Roy Shilkrot@RoyShilkrot·
The bigger your software project is - the higher the context token cost is. Therefore the KISS principle in software dev still holds, 40 years after its inception. Intelligence is intelligence. Artificial or Human. Holding too much in context doesn’t scale. Keep small. Pay less
English
1
0
2
268
g023
g023@g023dev·
@hyuki I mean 12b would be a bit large for that task for most people and might be really slow on large volumes of pics. You can find some nice ~2b models that can do a good enough job for most purposes.
English
0
0
0
129
結城浩 / Hiroshi Yuki
LM Studio + gemma-4-12b-qat でOpenAI コンパチなAPI持つローカルサーバ立ち上げると、無課金で画像処理AIが使える。たとえば大量のスクリーンショットや写真の分類整理やタグ付けにはぴったりではないだろうか。クラウドに出すのに抵抗があり、分量が多く、スピードと精度はそこそこで良い。
日本語
12
77
505
50.5K
antirez
antirez@antirez·
@ivanfioravanti No, if it misrepresents models in random ways, how is it good? Only because 5.5 happens to be on top?
English
4
1
14
2K
antirez
antirez@antirez·
For days, many folks here are citing DeepSWE as the benchmark that restores reality only because it shows GPT 5.5 on top. But actually, it almost gets a single entry right: the top one, and all the rest is shuffled.
English
16
2
154
26.1K
g023
g023@g023dev·
@lmrankhan depending on the task, cleaving out the subagents altogether gives some surprisingly good results.
English
0
0
0
23
Imran
Imran@lmrankhan·
A lot of people are talking about running tons of agents, parallel workflows, skills, and orchestration layers. Honestly, for building an app, I've found two coding agents running in async works perfectly fine, Codex for backend and Opus/Claude Code for frontend. Haven't had to use more than that, skills, or complex workflows. The bottleneck is usually figuring out what to build, not how many agents you're running or using any of the advanced workflows. I'm sure there are more advanced things people are doing, but for most MVPs or early stage products, simplicity works
English
65
18
321
30.9K
g023
g023@g023dev·
@yuhasbeentaken wow thats a pretty good incentive to burn tokens
English
0
0
0
12
Yum⋆₊˚
Yum⋆₊˚@yuhasbeentaken·
at tencent (china’s largest internet company), the token reimbursement quota is dynamic. the more you use, the more you get when it refreshes next month. so… it kinda looks like you’re incentivized to build side projects at work? 😂😂
Zack Korman@ZackKorman

Companies are like "we are spending all this money on AI but we don't know what the devs are even doing with it." Let me answer that for you: They're working on their personal side projects.

English
3
0
12
1.4K
g023
g023@g023dev·
@djcows Does give a bit of an esteem bump for the day when some bigwig top-dawg gives a response to your random yellings on the internet.
English
0
0
0
31
djcows
djcows@djcows·
you can dm some of the smartest people on earth here and they'll sometimes just answer casually, it's honestly crazy and humbling
English
22
2
149
3.1K
g023
g023@g023dev·
@RoguePoma I share a lot of my things in public, and yes sometimes they are pretty raw but useful to me. Always like to learn from others too.
English
1
0
1
6
Alex Poma 🏗️
Alex Poma 🏗️@RoguePoma·
I’m building construction SaaS in public while working full-time as an architect. I’d like to connect with more people doing the same kind of thing: - Building products. - Learning in public. - Sharing the messy times - Sharing the good times What are you building?
English
45
0
37
1.4K
g023
g023@g023dev·
@shyamalanadkat I think what would qualify as AGI would be a session that is always on, has infinite history that doesn't need to be cleared, and carries out its business on its own, either seeding itself with roles, or being directed to a role as a seed role.
English
0
0
0
39
shyamal
shyamal@shyamalanadkat·
early days of agi are going to be so special
English
5
3
50
3K
g023
g023@g023dev·
@rajyaligar @smhanov try using deepseek as a subagent and opus as orchestrator to stretch out the window
English
0
0
0
15
Raj
Raj@rajyaligar·
@smhanov Had to upgrade my codex sub this month from 5x to 20x cause the 5 hour window wasn’t cutting it for my workflows Big month for shipping
English
2
0
0
22
Steve Hanov
Steve Hanov@smhanov·
What was your AI bill? I've been using Claude Code and Hermes pretty heavily and up to $33 last month
Steve Hanov tweet mediaSteve Hanov tweet media
English
2
0
5
273
g023
g023@g023dev·
Made a deepseek powered agentic html editor tonite that runs amazing (of course because deepseek is amazing). Man we've come a long ways since Dreamweaver lol. Oh ya, deepseek made it too.
g023 tweet media
English
0
0
1
29
g023
g023@g023dev·
@antoniolupetti I'm working on a concept: an agent that maintains a large, external, sparse key-value memory (not vector database, but differentiable memory like a sparse Transformer memory layer) that is updated during a single long session compressing past into mem tkns & retrieve w/attention
English
0
0
0
23
Antonio Lupetti
Antonio Lupetti@antoniolupetti·
"Graph Memory for LLM Agents" is a recent paper that explores an idea that I find quite interesting. Most AI memory systems treat remembering as a retrieval problem (the model searches its memory, retrieves relevant information, and then reasons about it). This paper argues that the process may be more dynamic than that and, instead of simply retrieving memories, an AI agent could reconstruct them during reasoning, following clues, associations, and intermediate evidence as they emerge. What I find interesting is the possibility that memory and reasoning may not be separate processes at all, but that remembering itself could be part of reasoning. arxiv.org/abs/2606.06036
Antonio Lupetti tweet media
English
5
2
54
2.5K
g023
g023@g023dev·
@dosco Try the LFM2.5 models (especially the 8b A1B moe)
English
1
0
0
27
spacy
spacy@dosco·
my whole feed is local models after the big drops last week excited for this future it’s also exactly where DSPy and RLM wins
Alok@analogalok

a new 8GB VRAM GPU dense Local LLM leader was born yesterday runs on: RTX 4060 / RTX 3070 / RTX 2080. any 8GB card Qwen 3.5 9B (dense) was the go to for 6-8GB VRAM builds. Gemma 4 12B QAT (dense) just changed that. same llama.cpp + cuda 13.2. i7 12700H. 16GB RAM. same -ngl 99 flags. same 48k context. unsloth gemma-4-12b-it-Q4_K_M.gguf → 15 tok/sec @ 48k ctx unsloth gemma-4-12B-it-qat-UD-Q4_K_XL.gguf → 32 tok/sec @ 48k ctx → 26 tok/sec @ 64k ctx 64k context is a big deal. Hermes 3 agent requires 64k minimum to run. you're now getting full hermes compatible context on a budget consumer GPU at 26 tok/sec locally. 2.1x faster on identical hardware. and here's the part that breaks your brain: the QAT-UD-Q4_K_XL is actually SMALLER than the Q4_K_M "XL" why? QAT = Quantization Aware Training Google didn't train the model first and compress it later they trained it to be quantized from day one the weights already know how to survive low precision that's why you get more quality per byte llamacpp flags: -m gemma-4-12B-it-qat-UD-Q4_K_XL.gguf -cnv -ngl 99 -c 48000 -v fits in 8GB VRAM clean. no API. no cloud. no subscription. and this isn't even the MTP variant yet Gemma-4-E2B QAT runs on 3GB RAM, E4B on 5GB, 12B on 7GB, 26-A4B on 15GB and 31B on 18GB. I have benchmarked the 26b and 31b qat as well on a single RTX 4090, checkout the comments for details. If you have a 6GB or 8GB VRAM GPU, post your numbers. more benchmarks and configs coming soon

English
2
1
28
3.3K