Akram Shehadi (@AkramShehadi) - Twitter Profili

Akram Shehadi@AkramShehadi·1d

@kcosr @svpino Interesting. Thanks for the details.

English

0

8

Kevin@kcosr·1d

In my no/low code projects I'm not often referencing specific file, function or variable names so speech is sufficient. I mostly use two-way voice when I'm away from my computer, which is most of the time for design discussions where I'll walk around or do other things like housework. If I need to communicate more specifics I'm probably sitting at my machine and in that case would just be typing and reading. I'm using Kokoro for TTS and Parakeet for STT, running on a 4090 in my personal assistant app.

English

1

0

1

24

Santiago@svpino·1d

I'm now using two separate skills to generate a plan before I give it to Claude Code: designing → planning The "designing" skill does the following: 1. Gathers context by looking at local files, commits, and documentation. 2. Analyzes my request 3. Interviews me with clarifying questions 4. Proposes alternative solutions 5. Writes a specification document Then, the "planning" skill takes over: 1. Reads the specification document 2. Breaks it down into small tasks 3. Generates a plan specifying how to complete each task The output of running these two skills is a set of markdown files I can give the agent to implement (sometimes in parallel, sometimes sequentially). These plans are very prescriptive. Here is an example of what a potential plan could look like: """ Goal: Implement add() function File: src/calculator .py Description: Implement an add() function that takes two values and returns the sum. Step 1: Implement a filing test in tests/test_calculator .py Step 2: Run the test and ensure it fails Step 3: Implement the add() function Step 4: Run the test and ensure it passes Step 5: Run a code review Step 6: Commit the code """ I've found that these agentic coding tools love detailed, bite-sized instructions.

English

30

16

178

13.8K

Akram Shehadi@AkramShehadi·1d

@dannytt @haz3rbageax @julien_c @huggingface Oh, I also use OpenCode and I love it. Still getting acquainted with Pi though, so not sure which one I prefer yet.

English

0

20

Danny Thuering@dannytt·1d

@AkramShehadi @haz3rbageax @julien_c @huggingface I will have a look at Pi, sounds interesting. With OpenCode I was quite happy as I could easily switch plans and models and the UI feels better than Claude Code.

English

1

0

49

Julien Chaumond@julien_c·1d

This is where we are right now. And i’m not gonna lie it feels pretty magical 🧚‍♀️ Qwen3.6 27B running inside of Pi coding agent via Llama.cpp on the MacBook Pro For non-trivial tasks on the @huggingface codebases, this feels very, very close to hitting the latest Opus in Claude Code, or whatever shiny monopolistic closed source API of the day is. In full airplane mode. Most people haven’t realized this yet. If you have, it means you have a huge headstart to what I call the second revolution of AI. Powerful local models for efficiency, security, privacy, sovereignty 🔥

English

248

426

5.1K

566.8K

Akram Shehadi@AkramShehadi·1d

How do you use STT for coding-related words, filenames, variables, etc? I like STT for describing the problem, the goals, etc, but later when I need to specify certain variables, file names, etc, STT engines don't work as well. I've tried Wispr, Aqua and using Handy (with Parakeet v2) now due to cost now. any tips?

English

1

0

1

32

Kevin@kcosr·1d

This is a great pattern. I've taken this a step further with making the plan structured, ingesting it into an orchestrator, and having agents mark off tasks as they complete them. The quality of the final output has improved significantly. 1. Chat with agent via 2-way voice to capture requirements. 2. Invoke planner with design handoff. 3. Planner creates structure plan and invokes a plan review agent. 4. Plan review agent evaluates plan against requirements and plan creation rules. 5. Planner agent invokes implementation agent. 6. Implementation agent writes code and invokes code review agent. 7. Code review agent reviews implementation against design, plan, and its own code review rules. 8. Implementation agent opens PR. At each review step the caller makes corrections and re-invokes the reviewer until approved. When each agent loop ends, the agent is re-invoked with a reminder if its task list was not completed. Hooks validate deterministically when possible, like ensuring a PR was opened.

English

4

0

6

761

Akram Shehadi@AkramShehadi·1d

@NickADobos @jxnlco Xode? 🤢

Tiếng Việt

0

3

301

Nick Dobos@NickADobos·1d

@jxnlco Xcode and CodeX are taken leaves 2 options xXxCodexXx CoXde

English

23

8

426

19.1K

jason liu@jxnlco·1d

Heard that Elon is trying to rename cursor to xcode

English

270

173

5K

344.9K

Akram Shehadi@AkramShehadi·1d

@haz3rbageax @dannytt @julien_c @huggingface I wrote a quick guide on how to set it up if interested: x.com/AkramShehadi/s…

Akram Shehadi@AkramShehadi

Took me a bit to understand how to do the local setup, so here's a guide: 1) Install oMLX Download the Apple Silicon macOS build of oMLX from the official oMLX releases/site 2) Create a Hugging Face account and a read-only token 3) Install the Hugging Face CLI pip install -U huggingface_hub Then log in interactively: hf auth login 4) Know where oMLX stores models By default, oMLX uses: ~/.omlx/models I tested 2 model options: - Option A: Qwen3.6 + DFlash draft model - Option B: Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit + Qwen3.5-4B-MLX-4bit 5A) Download the correct model pair Use the 4-bit MLX base: hf download mlx-community/Qwen3.6-35B-A3B-4bit \ --local-dir ~/.omlx/models/Qwen3.6-35B-A3B-4bit Draft model Use the DFlash draft: hf download z-lab/Qwen3.6-35B-A3B-DFlash \ --local-dir ~/.omlx/models/Qwen3.6-35B-A3B-DFlash Note: Qwen/Qwen3.6-35B-A3B setup previously failed with: - model size: ~70 GB - oMLX max model memory: ~23 GB Because I only have 32GB RAM. So I used: - mlx-community/Qwen3.6-35B-A3B-4bit - not Qwen/Qwen3.6-35B-A3B 6A) Start oMLX /Applications/oMLX.app/Contents/MacOS/omlx-cli serve 7A) Configure DFlash in oMLX Open the oMLX admin dashboard For Qwen3.6-35B-A3B-4bit: - enable DFlash - set Draft Model to Qwen3.6-35B-A3B-DFlash - set Draft quant bits to 4 - optional: set this as default For Qwen3.6-35B-A3B-DFlash: - leave DFlash disabled - do not set it as default - do not use it as the Pi model Settings used: - thinking budget enabled - thinking budget tokens: 8192 - TurboQuant KV: on - TurboQuant KV bits: 4 - TurboQuant skip last: on - DFlash: on - DFlash draft quant bits: 4 Option B: Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit 5B) Download the main model hf download mlx-community/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit \ --local-dir ~/.omlx/models/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit 6B) Optional SpecPrefill draft model Qwen3.5-4B-MLX-4bit Download it with: hf download mlx-community/Qwen3.5-4B-MLX-4bit \ --local-dir ~/.omlx/models/Qwen3.5-4B-MLX-4bit Lower-memory alternative that was also recommended: hf download mlx-community/Qwen3.5-0.8B-4bit \ --local-dir ~/.omlx/models/Qwen3.5-0.8B-4bit 7B) Configure the 27B model in oMLX For Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit, the config I used: - max context window: 65536 - max tokens: 8192 - enable thinking: true - thinking budget enabled: true - thinking budget tokens: 1024 - TurboQuant KV: on - TurboQuant KV bits: 4 - TurboQuant skip last: true - SpecPrefill: on - SpecPrefill draft model: Qwen3.5-4B-MLX-4bit - SpecPrefill keep rate: 0.2 - DFlash: off Recommended fast coding preset If you want speed over depth: - max context window: 8192 or 16384 - max tokens: 1024 - TurboQuant KV: on - KV bits: 4 - SpecPrefill: on only when prompts are long - SpecPrefill keep rate: 0.2 - draft: Qwen3.5-4B-MLX-4bit or Qwen3.5-0.8B-4bit 9) Run Pi with the local oMLX models For Qwen3.6 + DFlash setup Use the base 4-bit model: '/Applications/oMLX.app/Contents/MacOS/omlx-cli' launch pi \ --model 'Qwen3.6-35B-A3B-4bit' \ --api-key "$OMLX_API_KEY" For Qwen3.5 27B distilled '/Applications/oMLX.app/Contents/MacOS/omlx-cli' launch pi \ --model 'Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit' \ --api-key "$OMLX_API_KEY"

English

0

3

39

Akram Shehadi@AkramShehadi·1d

@haz3rbageax @dannytt @julien_c @huggingface Pi, which I feel is pretty light on the system prompt. I will try claude code as well but I feel it's system prompt will be too much for local inference.

English

3

0

3

42

Akram Shehadi@AkramShehadi·1d

@dannytt @julien_c @huggingface Try using oMLX and the MLX versions of the models. I am hitting 77 t/s with Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit and Qwen3.5-4B-MLX-4bit for SpecPrefill

English

1

10

385

Danny Thuering@dannytt·1d

@julien_c @huggingface How fast was it? I tried the same while riding a train through Indonesia on the M4 pro today using #ollama and #opencode. Also used the #mlx version. It was very slow. More fast with #gemma4. Maybe need to change my setup? 🤔

English

4

0

11

4.4K

Akram Shehadi@AkramShehadi·1d

Took me a bit to understand how to do the local setup, so here's a guide: 1) Install oMLX Download the Apple Silicon macOS build of oMLX from the official oMLX releases/site 2) Create a Hugging Face account and a read-only token 3) Install the Hugging Face CLI pip install -U huggingface_hub Then log in interactively: hf auth login 4) Know where oMLX stores models By default, oMLX uses: ~/.omlx/models I tested 2 model options: - Option A: Qwen3.6 + DFlash draft model - Option B: Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit + Qwen3.5-4B-MLX-4bit 5A) Download the correct model pair Use the 4-bit MLX base: hf download mlx-community/Qwen3.6-35B-A3B-4bit \ --local-dir ~/.omlx/models/Qwen3.6-35B-A3B-4bit Draft model Use the DFlash draft: hf download z-lab/Qwen3.6-35B-A3B-DFlash \ --local-dir ~/.omlx/models/Qwen3.6-35B-A3B-DFlash Note: Qwen/Qwen3.6-35B-A3B setup previously failed with: - model size: ~70 GB - oMLX max model memory: ~23 GB Because I only have 32GB RAM. So I used: - mlx-community/Qwen3.6-35B-A3B-4bit - not Qwen/Qwen3.6-35B-A3B 6A) Start oMLX /Applications/oMLX.app/Contents/MacOS/omlx-cli serve 7A) Configure DFlash in oMLX Open the oMLX admin dashboard For Qwen3.6-35B-A3B-4bit: - enable DFlash - set Draft Model to Qwen3.6-35B-A3B-DFlash - set Draft quant bits to 4 - optional: set this as default For Qwen3.6-35B-A3B-DFlash: - leave DFlash disabled - do not set it as default - do not use it as the Pi model Settings used: - thinking budget enabled - thinking budget tokens: 8192 - TurboQuant KV: on - TurboQuant KV bits: 4 - TurboQuant skip last: on - DFlash: on - DFlash draft quant bits: 4 Option B: Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit 5B) Download the main model hf download mlx-community/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit \ --local-dir ~/.omlx/models/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit 6B) Optional SpecPrefill draft model Qwen3.5-4B-MLX-4bit Download it with: hf download mlx-community/Qwen3.5-4B-MLX-4bit \ --local-dir ~/.omlx/models/Qwen3.5-4B-MLX-4bit Lower-memory alternative that was also recommended: hf download mlx-community/Qwen3.5-0.8B-4bit \ --local-dir ~/.omlx/models/Qwen3.5-0.8B-4bit 7B) Configure the 27B model in oMLX For Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit, the config I used: - max context window: 65536 - max tokens: 8192 - enable thinking: true - thinking budget enabled: true - thinking budget tokens: 1024 - TurboQuant KV: on - TurboQuant KV bits: 4 - TurboQuant skip last: true - SpecPrefill: on - SpecPrefill draft model: Qwen3.5-4B-MLX-4bit - SpecPrefill keep rate: 0.2 - DFlash: off Recommended fast coding preset If you want speed over depth: - max context window: 8192 or 16384 - max tokens: 1024 - TurboQuant KV: on - KV bits: 4 - SpecPrefill: on only when prompts are long - SpecPrefill keep rate: 0.2 - draft: Qwen3.5-4B-MLX-4bit or Qwen3.5-0.8B-4bit 9) Run Pi with the local oMLX models For Qwen3.6 + DFlash setup Use the base 4-bit model: '/Applications/oMLX.app/Contents/MacOS/omlx-cli' launch pi \ --model 'Qwen3.6-35B-A3B-4bit' \ --api-key "$OMLX_API_KEY" For Qwen3.5 27B distilled '/Applications/oMLX.app/Contents/MacOS/omlx-cli' launch pi \ --model 'Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit' \ --api-key "$OMLX_API_KEY"

Akram Shehadi@AkramShehadi

Great thread with super useful information. I ended up testing Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit and using Qwen3.5-4B-MLX-4bit for SpecPrefill and it runs pretty good on my M2 Max 32 Gb. I wish I had bought the 64Gb version when I had the money though, because I have to close mostly everything to have enough available ram for it. However I'm impressed at how well it works with @badlogicgames Pi harness. I originally tried using Qwen3.6-35B-A3B-DFlash as the draft model for Qwen3.6-35B-A3B but it reqires much more RAM :( Definitely worth trying.

English

0

1

88

Akram Shehadi@AkramShehadi·1d

@Prince_Canuma Damn... how long did it take you? You make it sound as if you just need to push a button "port to MLX" and you are done! 😆 Do you have a sort of recipe book for these ports or is it all ad-hoc every time?

English

0

107

Prince Canuma@Prince_Canuma·1d

Ported DeepSeek-V4 to MLX 🔥 There still lots to optimize but it’s work well

DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

7

6

98

16.1K

Akram Shehadi@AkramShehadi·1d

Great thread with super useful information. I ended up testing Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit and using Qwen3.5-4B-MLX-4bit for SpecPrefill and it runs pretty good on my M2 Max 32 Gb. I wish I had bought the 64Gb version when I had the money though, because I have to close mostly everything to have enough available ram for it. However I'm impressed at how well it works with @badlogicgames Pi harness. I originally tried using Qwen3.6-35B-A3B-DFlash as the draft model for Qwen3.6-35B-A3B but it reqires much more RAM :( Definitely worth trying.

AVB@neural_avb

Qwen3.6-27B is the first model that made me want that Mac Studio My current mac runs the 4bit version at just 15 tok/s. Loading the initial opencode system prompt takes like 30-50 seconds. Is there a draft model I could be using inside LM studio for faster inference?

English

1

0

1

74

Akram Shehadi@AkramShehadi·2d

@kcosr @lukalotl Definitely. I get much tired with focus work now that I use coding agents constantly. There's also a sort of idea-burnout I think, where I have so many ideas I want to build, but just don't have the time, so I fear I won't be able to build them and stress out about it.

English

0

1

13

Kevin@kcosr·2d

@AkramShehadi @lukalotl I do pretty well with multi-tasking but the deep thinking tasks are pretty taxing when you have them lined up back-to-back. x.com/kcosr/status/2…

Kevin@kcosr

Folks talk about the mental burden of working with multiple agents and context switching. I don't hear much about how tiring working with a single agent can be when you have a good planning workflow and a streamlined I/O interface (voice, comprehensive visual plan artifacts presented on screen, etc). Even working in serial mode you have to process a lot of information, digest it, and think deeply about the next decision before sending the agent off on its next task. To me, this is more tiring than interruptions from concurrent small back-and-forth interactions.

English

1

0

1

31

luka@lukalotl·26 Ara

everyone (that's relevant anyway) is going to have a shell. kind of like a custom operating system, more like Emacs if anything, and custom in that to varying degrees it's quite unique. large chunks of people will use widely distributed ones and smaller and smaller groups will use more valuable and harder to learn ones, otherwise inaccessible both by literal privacy and by cultural knowledge. it's just like existing software except it will be a layer on top, using all of the existing abstractions, even emulating outdated desktop applications and puppeteering their visual interfaces, and often written as a web app or at least using modern libraries. they'll bring with them homogenization of all of the apps you use every day into one object, yet diversification of software used from person to person. the smaller ones will be developed by individuals and communities for themselves, and be the most powerful. relative to the software of old, these things are quite organic, ugly (or beautiful, but certainly opinionated), hulking interfaces. easy to navigate for their users because they are a familiar home, not because they are easy to learn at first sight. few people are ever introduced, and when they are it's with the oversight of the dweller there's been a long standing fear in software development of: - scope creep, overscoping and becoming an emacs style or otherwise bloated piece of software - developing for yourself instead of the market both of these incentives will be inverted. the best software is software you can use more effectively than a competing team. your ability to make software that improves others will be bottlenecked by your ability to make software that improves you.

English

7

10

126

17.2K

Akram Shehadi@AkramShehadi·2d

That workflow sounds pretty compelling and definitely would like to try to build my own, but it feels like there's a natural limit due to "context/focus burnout", no? I mean, even though I could have the most efficient AI-interaction tools, my brain only has a limited amount of focus capacity (as it has been discussed many times). So at some point an ultra-efficient workflow would hit the same diminish returns.

English

1

0

1

67

Kevin@kcosr·27 Ara

Woah, this is right up my alley. Excellent post. Over the summer I created a tool to organize my CLI terminal sessions with a sidebar and tabs, but mixed with web tabs for GitHub issues, and PRs, etc. Why not just use a browser? Because having things grouped and easily accessible makes me just a bit faster. Over the past few weeks I've been experimenting with something more general purpose, an AI focused workspace for doing anything, managing lists, using terminals, reviewing diffs. The layout is complex and difficult to master with splits and tabs, but I can set it up in a way that gives me an edge. I can focus on any panel and ask a model via chat to interact with it using built in tools to read or type into a terminal, pull up an artifact (notes, lists), manage artifacts (find everything due today, and display it on a view, etc.). Both are accessible from mobile with the underlying sessions running on a headless backend. Just another factor in giving me the edge as I can open my workspace and take care of something (or ask an agent to do so) from anywhere.

English

1

0

3

263

Akram Shehadi@AkramShehadi·2d

@neural_avb @okbhaicool I think it's a partial rollout, and only for iPhone users IIRC

English

0

1

12

AVB@neural_avb·2d

@okbhaicool I can see it on my phone! (Not on web)

English

1

0

3

115

AVB@neural_avb·2d

Wait comments now have a dislike button!!

English

2

0

12

1.2K

Akram Shehadi@AkramShehadi·2d

@yak32 @playcanvas Wow! Nice.

English

0

1

1.2K

Yakov@yak32·2d

I finally finished my Gaussian Splat based FPS demo. It's a @playcanvas project, runs in a browser. On a real photoscan. With physics, baked lighting, pathfinding NPCs. Here's how 👇

English

41

135

1.5K

168.7K

Akram Shehadi@AkramShehadi·2d

If you are ever curious about how absolutely magical microchip production is, watch this video: 38C3 - From Silicon to Sovereignty: How Advanced Chips are Redefining Global Dominance youtube.com/watch?v=NdppYY… The absurdity of it is just incredible.

YouTube

English

0

19

Akram Shehadi@AkramShehadi·3d

@gwenshap Interesting. I'll need to read more on that.

English

0

1

23

Gwen (Chen) Shapira@gwenshap·3d

@AkramShehadi This really depends on the customers. But often true. Note that this solution gives "transparent physical separation" *and* performance optimization. So a true win-win.

English

1

0

2

58

Gwen (Chen) Shapira@gwenshap·3d

Got a really good question during my talk: What do you do with embedding table, when one tenant is much larger than others? This happens a lot! Sometimes the biggest tenant is over 50% of the data. Solution: 1. Partition table by tenants 2. Use FDW to place tenant partition on separate machine while maintaining the top-level partitioned table. 3. "Right size" the separate machine so the partition index fits in memory. Nile does this automatically. But everyone with Postgres can do this.

Gwen (Chen) Shapira@gwenshap

My @postgresconf talk on “Postgres for Production AI Agents” is at 1pm in San Pedro room (level C). Hope to see you there :)

English

1

27

3.3K

Akram Shehadi@AkramShehadi·4d

Alright! We can now bypass the RAM shortage youtube.com/watch?v=h6GWik…

YouTube

English

0

20

Akram Shehadi

Keşfet