Akram Shehadi

107 posts

Akram Shehadi banner
Akram Shehadi

Akram Shehadi

@AkramShehadi

Katılım Şubat 2026
200 Takip Edilen5 Takipçiler
Kevin
Kevin@kcosr·
In my no/low code projects I'm not often referencing specific file, function or variable names so speech is sufficient. I mostly use two-way voice when I'm away from my computer, which is most of the time for design discussions where I'll walk around or do other things like housework. If I need to communicate more specifics I'm probably sitting at my machine and in that case would just be typing and reading. I'm using Kokoro for TTS and Parakeet for STT, running on a 4090 in my personal assistant app.
English
1
0
1
24
Santiago
Santiago@svpino·
I'm now using two separate skills to generate a plan before I give it to Claude Code: designing → planning The "designing" skill does the following: 1. Gathers context by looking at local files, commits, and documentation. 2. Analyzes my request 3. Interviews me with clarifying questions 4. Proposes alternative solutions 5. Writes a specification document Then, the "planning" skill takes over: 1. Reads the specification document 2. Breaks it down into small tasks 3. Generates a plan specifying how to complete each task The output of running these two skills is a set of markdown files I can give the agent to implement (sometimes in parallel, sometimes sequentially). These plans are very prescriptive. Here is an example of what a potential plan could look like: """ Goal: Implement add() function File: src/calculator .py Description: Implement an add() function that takes two values and returns the sum. Step 1: Implement a filing test in tests/test_calculator .py Step 2: Run the test and ensure it fails Step 3: Implement the add() function Step 4: Run the test and ensure it passes Step 5: Run a code review Step 6: Commit the code """ I've found that these agentic coding tools love detailed, bite-sized instructions.
English
30
16
178
13.8K
Julien Chaumond
Julien Chaumond@julien_c·
This is where we are right now. And i’m not gonna lie it feels pretty magical 🧚‍♀️ Qwen3.6 27B running inside of Pi coding agent via Llama.cpp on the MacBook Pro For non-trivial tasks on the @huggingface codebases, this feels very, very close to hitting the latest Opus in Claude Code, or whatever shiny monopolistic closed source API of the day is. In full airplane mode. Most people haven’t realized this yet. If you have, it means you have a huge headstart to what I call the second revolution of AI. Powerful local models for efficiency, security, privacy, sovereignty 🔥
Julien Chaumond tweet media
English
248
426
5.1K
566.8K
Akram Shehadi
Akram Shehadi@AkramShehadi·
How do you use STT for coding-related words, filenames, variables, etc? I like STT for describing the problem, the goals, etc, but later when I need to specify certain variables, file names, etc, STT engines don't work as well. I've tried Wispr, Aqua and using Handy (with Parakeet v2) now due to cost now. any tips?
English
1
0
1
32
Kevin
Kevin@kcosr·
This is a great pattern. I've taken this a step further with making the plan structured, ingesting it into an orchestrator, and having agents mark off tasks as they complete them. The quality of the final output has improved significantly. 1. Chat with agent via 2-way voice to capture requirements. 2. Invoke planner with design handoff. 3. Planner creates structure plan and invokes a plan review agent. 4. Plan review agent evaluates plan against requirements and plan creation rules. 5. Planner agent invokes implementation agent. 6. Implementation agent writes code and invokes code review agent. 7. Code review agent reviews implementation against design, plan, and its own code review rules. 8. Implementation agent opens PR. At each review step the caller makes corrections and re-invokes the reviewer until approved. When each agent loop ends, the agent is re-invoked with a reminder if its task list was not completed. Hooks validate deterministically when possible, like ensuring a PR was opened.
English
4
0
6
761
Nick Dobos
Nick Dobos@NickADobos·
@jxnlco Xcode and CodeX are taken leaves 2 options xXxCodexXx CoXde
English
23
8
426
19.1K
jason liu
jason liu@jxnlco·
Heard that Elon is trying to rename cursor to xcode
English
270
173
5K
344.9K
Akram Shehadi
Akram Shehadi@AkramShehadi·
@haz3rbageax @dannytt @julien_c @huggingface I wrote a quick guide on how to set it up if interested: x.com/AkramShehadi/s…
Akram Shehadi@AkramShehadi

Took me a bit to understand how to do the local setup, so here's a guide: 1) Install oMLX Download the Apple Silicon macOS build of oMLX from the official oMLX releases/site 2) Create a Hugging Face account and a read-only token 3) Install the Hugging Face CLI pip install -U huggingface_hub Then log in interactively: hf auth login 4) Know where oMLX stores models By default, oMLX uses: ~/.omlx/models I tested 2 model options: - Option A: Qwen3.6 + DFlash draft model - Option B: Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit + Qwen3.5-4B-MLX-4bit 5A) Download the correct model pair Use the 4-bit MLX base: hf download mlx-community/Qwen3.6-35B-A3B-4bit \ --local-dir ~/.omlx/models/Qwen3.6-35B-A3B-4bit Draft model Use the DFlash draft: hf download z-lab/Qwen3.6-35B-A3B-DFlash \ --local-dir ~/.omlx/models/Qwen3.6-35B-A3B-DFlash Note: Qwen/Qwen3.6-35B-A3B setup previously failed with: - model size: ~70 GB - oMLX max model memory: ~23 GB Because I only have 32GB RAM. So I used: - mlx-community/Qwen3.6-35B-A3B-4bit - not Qwen/Qwen3.6-35B-A3B 6A) Start oMLX /Applications/oMLX.app/Contents/MacOS/omlx-cli serve 7A) Configure DFlash in oMLX Open the oMLX admin dashboard For Qwen3.6-35B-A3B-4bit: - enable DFlash - set Draft Model to Qwen3.6-35B-A3B-DFlash - set Draft quant bits to 4 - optional: set this as default For Qwen3.6-35B-A3B-DFlash: - leave DFlash disabled - do not set it as default - do not use it as the Pi model Settings used: - thinking budget enabled - thinking budget tokens: 8192 - TurboQuant KV: on - TurboQuant KV bits: 4 - TurboQuant skip last: on - DFlash: on - DFlash draft quant bits: 4 Option B: Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit 5B) Download the main model hf download mlx-community/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit \ --local-dir ~/.omlx/models/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit 6B) Optional SpecPrefill draft model Qwen3.5-4B-MLX-4bit Download it with: hf download mlx-community/Qwen3.5-4B-MLX-4bit \ --local-dir ~/.omlx/models/Qwen3.5-4B-MLX-4bit Lower-memory alternative that was also recommended: hf download mlx-community/Qwen3.5-0.8B-4bit \ --local-dir ~/.omlx/models/Qwen3.5-0.8B-4bit 7B) Configure the 27B model in oMLX For Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit, the config I used: - max context window: 65536 - max tokens: 8192 - enable thinking: true - thinking budget enabled: true - thinking budget tokens: 1024 - TurboQuant KV: on - TurboQuant KV bits: 4 - TurboQuant skip last: true - SpecPrefill: on - SpecPrefill draft model: Qwen3.5-4B-MLX-4bit - SpecPrefill keep rate: 0.2 - DFlash: off Recommended fast coding preset If you want speed over depth: - max context window: 8192 or 16384 - max tokens: 1024 - TurboQuant KV: on - KV bits: 4 - SpecPrefill: on only when prompts are long - SpecPrefill keep rate: 0.2 - draft: Qwen3.5-4B-MLX-4bit or Qwen3.5-0.8B-4bit 9) Run Pi with the local oMLX models For Qwen3.6 + DFlash setup Use the base 4-bit model: '/Applications/oMLX.app/Contents/MacOS/omlx-cli' launch pi \ --model 'Qwen3.6-35B-A3B-4bit' \ --api-key "$OMLX_API_KEY" For Qwen3.5 27B distilled '/Applications/oMLX.app/Contents/MacOS/omlx-cli' launch pi \ --model 'Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit' \ --api-key "$OMLX_API_KEY"

English
0
0
3
39
Akram Shehadi
Akram Shehadi@AkramShehadi·
@dannytt @julien_c @huggingface Try using oMLX and the MLX versions of the models. I am hitting 77 t/s with Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit and Qwen3.5-4B-MLX-4bit for SpecPrefill
English
1
1
10
385
Akram Shehadi
Akram Shehadi@AkramShehadi·
Took me a bit to understand how to do the local setup, so here's a guide: 1) Install oMLX Download the Apple Silicon macOS build of oMLX from the official oMLX releases/site 2) Create a Hugging Face account and a read-only token 3) Install the Hugging Face CLI pip install -U huggingface_hub Then log in interactively: hf auth login 4) Know where oMLX stores models By default, oMLX uses: ~/.omlx/models I tested 2 model options: - Option A: Qwen3.6 + DFlash draft model - Option B: Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit + Qwen3.5-4B-MLX-4bit 5A) Download the correct model pair Use the 4-bit MLX base: hf download mlx-community/Qwen3.6-35B-A3B-4bit \ --local-dir ~/.omlx/models/Qwen3.6-35B-A3B-4bit Draft model Use the DFlash draft: hf download z-lab/Qwen3.6-35B-A3B-DFlash \ --local-dir ~/.omlx/models/Qwen3.6-35B-A3B-DFlash Note: Qwen/Qwen3.6-35B-A3B setup previously failed with: - model size: ~70 GB - oMLX max model memory: ~23 GB Because I only have 32GB RAM. So I used: - mlx-community/Qwen3.6-35B-A3B-4bit - not Qwen/Qwen3.6-35B-A3B 6A) Start oMLX /Applications/oMLX.app/Contents/MacOS/omlx-cli serve 7A) Configure DFlash in oMLX Open the oMLX admin dashboard For Qwen3.6-35B-A3B-4bit: - enable DFlash - set Draft Model to Qwen3.6-35B-A3B-DFlash - set Draft quant bits to 4 - optional: set this as default For Qwen3.6-35B-A3B-DFlash: - leave DFlash disabled - do not set it as default - do not use it as the Pi model Settings used: - thinking budget enabled - thinking budget tokens: 8192 - TurboQuant KV: on - TurboQuant KV bits: 4 - TurboQuant skip last: on - DFlash: on - DFlash draft quant bits: 4 Option B: Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit 5B) Download the main model hf download mlx-community/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit \ --local-dir ~/.omlx/models/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit 6B) Optional SpecPrefill draft model Qwen3.5-4B-MLX-4bit Download it with: hf download mlx-community/Qwen3.5-4B-MLX-4bit \ --local-dir ~/.omlx/models/Qwen3.5-4B-MLX-4bit Lower-memory alternative that was also recommended: hf download mlx-community/Qwen3.5-0.8B-4bit \ --local-dir ~/.omlx/models/Qwen3.5-0.8B-4bit 7B) Configure the 27B model in oMLX For Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit, the config I used: - max context window: 65536 - max tokens: 8192 - enable thinking: true - thinking budget enabled: true - thinking budget tokens: 1024 - TurboQuant KV: on - TurboQuant KV bits: 4 - TurboQuant skip last: true - SpecPrefill: on - SpecPrefill draft model: Qwen3.5-4B-MLX-4bit - SpecPrefill keep rate: 0.2 - DFlash: off Recommended fast coding preset If you want speed over depth: - max context window: 8192 or 16384 - max tokens: 1024 - TurboQuant KV: on - KV bits: 4 - SpecPrefill: on only when prompts are long - SpecPrefill keep rate: 0.2 - draft: Qwen3.5-4B-MLX-4bit or Qwen3.5-0.8B-4bit 9) Run Pi with the local oMLX models For Qwen3.6 + DFlash setup Use the base 4-bit model: '/Applications/oMLX.app/Contents/MacOS/omlx-cli' launch pi \ --model 'Qwen3.6-35B-A3B-4bit' \ --api-key "$OMLX_API_KEY" For Qwen3.5 27B distilled '/Applications/oMLX.app/Contents/MacOS/omlx-cli' launch pi \ --model 'Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit' \ --api-key "$OMLX_API_KEY"
Akram Shehadi@AkramShehadi

Great thread with super useful information. I ended up testing Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit and using Qwen3.5-4B-MLX-4bit for SpecPrefill and it runs pretty good on my M2 Max 32 Gb. I wish I had bought the 64Gb version when I had the money though, because I have to close mostly everything to have enough available ram for it. However I'm impressed at how well it works with @badlogicgames Pi harness. I originally tried using Qwen3.6-35B-A3B-DFlash as the draft model for Qwen3.6-35B-A3B but it reqires much more RAM :( Definitely worth trying.

English
0
0
1
88
Akram Shehadi
Akram Shehadi@AkramShehadi·
@Prince_Canuma Damn... how long did it take you? You make it sound as if you just need to push a button "port to MLX" and you are done! 😆 Do you have a sort of recipe book for these ports or is it all ad-hoc every time?
English
0
0
0
107
Akram Shehadi
Akram Shehadi@AkramShehadi·
Great thread with super useful information. I ended up testing Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit and using Qwen3.5-4B-MLX-4bit for SpecPrefill and it runs pretty good on my M2 Max 32 Gb. I wish I had bought the 64Gb version when I had the money though, because I have to close mostly everything to have enough available ram for it. However I'm impressed at how well it works with @badlogicgames Pi harness. I originally tried using Qwen3.6-35B-A3B-DFlash as the draft model for Qwen3.6-35B-A3B but it reqires much more RAM :( Definitely worth trying.
AVB@neural_avb

Qwen3.6-27B is the first model that made me want that Mac Studio My current mac runs the 4bit version at just 15 tok/s. Loading the initial opencode system prompt takes like 30-50 seconds. Is there a draft model I could be using inside LM studio for faster inference?

English
1
0
1
74
Akram Shehadi
Akram Shehadi@AkramShehadi·
@kcosr @lukalotl Definitely. I get much tired with focus work now that I use coding agents constantly. There's also a sort of idea-burnout I think, where I have so many ideas I want to build, but just don't have the time, so I fear I won't be able to build them and stress out about it.
English
0
0
1
13
luka
luka@lukalotl·
everyone (that's relevant anyway) is going to have a shell. kind of like a custom operating system, more like Emacs if anything, and custom in that to varying degrees it's quite unique. large chunks of people will use widely distributed ones and smaller and smaller groups will use more valuable and harder to learn ones, otherwise inaccessible both by literal privacy and by cultural knowledge. it's just like existing software except it will be a layer on top, using all of the existing abstractions, even emulating outdated desktop applications and puppeteering their visual interfaces, and often written as a web app or at least using modern libraries. they'll bring with them homogenization of all of the apps you use every day into one object, yet diversification of software used from person to person. the smaller ones will be developed by individuals and communities for themselves, and be the most powerful. relative to the software of old, these things are quite organic, ugly (or beautiful, but certainly opinionated), hulking interfaces. easy to navigate for their users because they are a familiar home, not because they are easy to learn at first sight. few people are ever introduced, and when they are it's with the oversight of the dweller there's been a long standing fear in software development of: - scope creep, overscoping and becoming an emacs style or otherwise bloated piece of software - developing for yourself instead of the market both of these incentives will be inverted. the best software is software you can use more effectively than a competing team. your ability to make software that improves others will be bottlenecked by your ability to make software that improves you.
English
7
10
126
17.2K
Akram Shehadi
Akram Shehadi@AkramShehadi·
That workflow sounds pretty compelling and definitely would like to try to build my own, but it feels like there's a natural limit due to "context/focus burnout", no? I mean, even though I could have the most efficient AI-interaction tools, my brain only has a limited amount of focus capacity (as it has been discussed many times). So at some point an ultra-efficient workflow would hit the same diminish returns.
English
1
0
1
67
Kevin
Kevin@kcosr·
Woah, this is right up my alley. Excellent post. Over the summer I created a tool to organize my CLI terminal sessions with a sidebar and tabs, but mixed with web tabs for GitHub issues, and PRs, etc. Why not just use a browser? Because having things grouped and easily accessible makes me just a bit faster. Over the past few weeks I've been experimenting with something more general purpose, an AI focused workspace for doing anything, managing lists, using terminals, reviewing diffs. The layout is complex and difficult to master with splits and tabs, but I can set it up in a way that gives me an edge. I can focus on any panel and ask a model via chat to interact with it using built in tools to read or type into a terminal, pull up an artifact (notes, lists), manage artifacts (find everything due today, and display it on a view, etc.). Both are accessible from mobile with the underlying sessions running on a headless backend. Just another factor in giving me the edge as I can open my workspace and take care of something (or ask an agent to do so) from anywhere.
English
1
0
3
263
AVB
AVB@neural_avb·
@okbhaicool I can see it on my phone! (Not on web)
AVB tweet media
English
1
0
3
115
AVB
AVB@neural_avb·
Wait comments now have a dislike button!!
AVB tweet media
English
2
0
12
1.2K
Yakov
Yakov@yak32·
I finally finished my Gaussian Splat based FPS demo. It's a @playcanvas project, runs in a browser. On a real photoscan. With physics, baked lighting, pathfinding NPCs. Here's how 👇
English
41
135
1.5K
168.7K
Akram Shehadi
Akram Shehadi@AkramShehadi·
If you are ever curious about how absolutely magical microchip production is, watch this video: 38C3 - From Silicon to Sovereignty: How Advanced Chips are Redefining Global Dominance youtube.com/watch?v=NdppYY… The absurdity of it is just incredible.
YouTube video
YouTube
English
0
0
0
19
Gwen (Chen) Shapira
Gwen (Chen) Shapira@gwenshap·
@AkramShehadi This really depends on the customers. But often true. Note that this solution gives "transparent physical separation" *and* performance optimization. So a true win-win.
English
1
0
2
58
Gwen (Chen) Shapira
Gwen (Chen) Shapira@gwenshap·
Got a really good question during my talk: What do you do with embedding table, when one tenant is much larger than others? This happens a lot! Sometimes the biggest tenant is over 50% of the data. Solution: 1. Partition table by tenants 2. Use FDW to place tenant partition on separate machine while maintaining the top-level partitioned table. 3. "Right size" the separate machine so the partition index fits in memory. Nile does this automatically. But everyone with Postgres can do this.
Gwen (Chen) Shapira@gwenshap

My @postgresconf talk on “Postgres for Production AI Agents” is at 1pm in San Pedro room (level C). Hope to see you there :)

English
1
1
27
3.3K