@haroun

1.8K posts

@haroun banner
@haroun

@haroun

@danyharoun

Beirut Katılım Temmuz 2010
1.2K Takip Edilen79 Takipçiler
@haroun retweetledi
Joseph Azar
Joseph Azar@parazar·
Wild what's possible now: I can ship a 3D experience prototype to a museum in a single afternoon. Everything coded and generated with Omma, leveraging AI to iterate fast. 3D models generated in Omma. Scene built on top of Three.js. Prompted for dissolving shaders + particle systems?
English
6
19
260
15.8K
@haroun retweetledi
Eric
Eric@Ex0byt·
the different flavors of specdec, and why I'm trying produce a Qwen-3.6-27b EAGLE-3 drafter for ya'll
English
8
10
101
8.4K
@haroun retweetledi
Petri Kuittinen
Petri Kuittinen@KuittinenPetri·
Many people say Nvidia DGX Spark is too slow and not worth the money. I'm getting crazy speed qwen3.5-35b-a3b-nvfp4 with my ASUS Ascend GX10: over 200 token/s, 495k prefill. In real-life performance is lower, but still 100+ token/s. sparkrun run @atlas/qwen3.5-35b-a3b-nvfp4
Petri Kuittinen tweet media
English
19
15
139
10.7K
@haroun retweetledi
mr-r0b0t
mr-r0b0t@mr_r0b0t·
Another 🔥 @UnslothAI NVFP4 unsloth/Qwen3.6-27B-NVFP4 @NVIDIAAI GB10 (DGX Spark) really liked those Qwen3.6 quants!
mr-r0b0t tweet mediamr-r0b0t tweet media
English
8
5
47
2.3K
@haroun retweetledi
Victor M
Victor M@victormustar·
Supertonic 3 is incredible 🤯 Open source on-device TTS that runs in the browser, sounds human, handles a dozen+ languages, and finishes generating before you finish reading your prompt! ⬇️ demo available on Hugging Face
English
9
11
79
6.5K
@haroun retweetledi
Tengfei Wang
Tengfei Wang@DylanTFWang·
⚡️After weeks of hard work, we're thrilled to fully open-source HY World 2.0 today -- full inference code and all models! Build, explore, and create your own interactive worlds with us.👇 github.com/Tencent-Hunyua…
Tengfei Wang@DylanTFWang

Genie3 generates videos. We generate 𝟯𝗗 𝘄𝗼𝗿𝗹𝗱𝘀 you can actually use. Launching tomorrow — Tencent #HYWorld 2.0, an engine-ready World Model🚀 This isn't a video. It's a real 3D scene, all generated & editable. One image in. A whole 3D world out. 🔥Open-source tomorrow

English
19
50
438
41.2K
@haroun retweetledi
mr-r0b0t
mr-r0b0t@mr_r0b0t·
Dear @UnslothAI My @NVIDIAAI GB10 (DGX Spark) absolutely loved your quantization of this model! unsloth/Qwen3.6-35B-A3B-NVFP4 It ran stably up to concurrency 64, not optimal but I did that for no other reason than to see if it would 😂 🧵
mr-r0b0t tweet media
mr-r0b0t@mr_r0b0t

Dear @UnslothAI I just found your Qwen3.6 NVFP4 quants. It made me quite happy and I have the GB10 benchmarking them now 😁 Please continue with these!

English
2
4
35
1.9K
@haroun retweetledi
Erick
Erick@ErickSky·
¿Cuál es tu mayor dolor de cabeza con las herramientas de soporte que usas hoy? ¿Costo, dependencia, falta de control de datos o dificultad para personalizar? Existe una alternativa seria, moderna y completamente bajo tu control. Se llama Chatwoot. Es una plataforma open source de soporte omnicanal que puedes instalar en tus propios servidores. Con más de 29.000 estrellas en GitHub, se ha convertido en una de las opciones más sólidas para equipos que quieren calidad enterprise sin pagar licencias millonarias ni perder soberanía sobre su información. Centraliza todas las conversaciones en una sola bandeja de entrada: chat en vivo de tu web, email, WhatsApp Business, Instagram, Facebook Messenger, Telegram y otros canales. El cliente siente que habla con una sola empresa, aunque uses varios canales. Su agente de IA se llama Captain. No es un chatbot genérico. Captain aprende de tu propio centro de ayuda, de las conversaciones anteriores y de las preguntas frecuentes para resolver automáticamente una gran parte de las consultas repetitivas. Tu equipo solo entra cuando realmente hace falta. Incluye también: - Un Help Center profesional y personalizable que reduce tickets antes de que lleguen. - Herramientas de colaboración muy bien resueltas: notas privadas, menciones @, etiquetas, respuestas predefinidas, asignación automática y detección de colisiones. - Reportes claros de rendimiento, tiempos de respuesta y satisfacción del cliente (CSAT). - API completa, webhooks e integraciones (Slack, Shopify, Dialogflow, Linear y más). Y el punto más importante para muchos: es self-hosted. Tú decides dónde viven los datos. Esto cambia completamente el juego si estás en una industria regulada, si valoras la privacidad o simplemente si no quieres que un proveedor externo tenga toda la información de tus clientes. Desplegarlo es bastante accesible con Docker. Tiene buena documentación, soporte para Kubernetes y una comunidad activa que lo sigue mejorando constantemente. Chatwoot no busca ser "la opción gratis y limitada". Busca ser una alternativa real y profesional a Intercom, Zendesk o Freshdesk para quienes quieren escalar el soporte sin escalar los costos de la misma manera y sin renunciar al control. REPOOO👇
Erick tweet media
Español
1
2
54
2.6K
@haroun retweetledi
NOGUCHI, Shoji
NOGUCHI, Shoji@noguchis·
DeepSeek-V4-Flash on DGX Spark (GB10, sm_121): CLI single decode 17.85 t/s → 24.45 t/s, +37%. Q4 preload + dispatch wire-in + fused shared_gate_up_swiglu + F16→Q4 preload (358 tensors). Now gateup 0.165 / down 0.106 ms fork of @antirez's ds4 github.com/antirez/ds4/co…
English
2
3
54
7.5K
@haroun retweetledi
Joey
Joey@aijoey·
tested Qwen3.6-35B-A3B-NVFP4 today on the NVIDIA DGX Spark / GB10 using the public Atlas recipe. single GB10 Atlas runtime TP=1 NVFP4 weights FP8 KV cache speculative decoding on prefix caching on ctx_tg @ d4096, concurrency 1 result: 176.66 tok/s decode the screenshot i was checking against showed 202.63 tok/s, so i didn’t reproduce the exact claim from the public setup, but it’s still a strong local result. next up: comparing Atlas/NVFP4 against my llama.cpp GGUF/MTP runs for real workflow usefulness. reproducible numbers > benchmark vibes.
Azeez@AtlasInference

DGX Spark just benched 200+ tok/s for Qwen3.6-35B with @AtlasInference on @spark_arena 🔥 How's that possible? Providers like Codex and Claude get ~60. Other major engines don't come close 🦥 We haven't seen speeds like this on GB10. NO ONE HAS. Atlas is shattering records 🚀

English
11
10
97
12.9K
@haroun retweetledi
wd 🔺
wd 🔺@populartourist·
Ornstein3.6-27B-MTP-NSC-ACE-SABER-Q6_K-MTP Doing some repo work with 5090 MTP acceptance rate 0.94 is bosh 👊
wd 🔺 tweet media
English
3
1
35
3.2K
@haroun retweetledi
sparkarena
sparkarena@spark_arena·
Atlas Inference runtime is now supported on github.com/spark-arena/sp… Run their Qwen3.6-35B NVFP4 recipe with one line: sparkrun run @atlas/qwen3.5-35b-a3b-nvfp4
Azeez@AtlasInference

DGX Spark just benched 200+ tok/s for Qwen3.6-35B with @AtlasInference on @spark_arena 🔥 How's that possible? Providers like Codex and Claude get ~60. Other major engines don't come close 🦥 We haven't seen speeds like this on GB10. NO ONE HAS. Atlas is shattering records 🚀

English
4
2
22
2.8K
@haroun retweetledi
Sudo su
Sudo su@sudoingX·
i've run a stack of models across a single 3090, a 5090, and a 128GB DGX Spark. exactly three are worth building on. the honest list. the three worth it: > 1. StepFun Step-3.5 Flash, the REAP pruned 121B MoE (Q6, DGX Spark) a 121 billion parameter mixture of experts running on a single desktop box. the most worth-it model in everything i've tested. > 2. Qwen 3.6 27B Dense, Q4 (single RTX 3090) the undisputed king of the 24GB tier. one shot a playable game, around 41 tok/s, fits with context headroom to spare. one 24GB card, this is your answer. > 3. NVIDIA Nemotron 3 Nano Omni, 30B-A3B (DGX Spark) the best multimodal i've tested for video classification work. vision in, runs clean on the Spark. the rest, ran them, they hold up fine: on the Spark: DeepSeek V4 Flash 158B, GLM 4.7 Flash, GLM 4.5 Air REAP 82B-A12B, Gemma 4 26B-A4B, Qwen3-VL 235B-A22B, Qwen3 Coder 30B-A3B, Qwen3 30B-A3B, Carnice 35B-A3B. on consumer GPUs: Kimi K2.5 1T, Qwen3-Coder-Next 80B, Hermes 4.3 36B, Qwen 3.5 27B Dense. single 3090 to a 128GB Spark, that's the range. the three up top are the ones worth your hardware today.
English
24
16
256
37.4K
@haroun retweetledi
Loktar 🇺🇸
Loktar 🇺🇸@loktar00·
Crazy that 200tps is slowly becoming the floor for local inference... a year ago that was top tier API speed, now its what a single 5090 or dual 3090 with MTP can get you for "free" ™
English
11
3
70
4.4K
@haroun retweetledi
Joey
Joey@aijoey·
live demo, no bells and whistles. just raw. qwen3.6 35b running locally on the nvidia dgx spark through llama.cpp. mtp is doing the heavy lifting here. the model drafts multiple future tokens, verifies them, and gets a real speedup without needing a separate draft model. q5_k_m thinking disabled this is not a polished benchmark video. it’s just the box, the terminal, live ttft, live token stats, and me seeing how far local inference can go today. github.com/ggml-org/llama…
English
3
3
20
2.3K
@haroun retweetledi
Daniel Moll
Daniel Moll@rumgewieselt·
This is ridiculous: MTP with llama.cpp now runs ... 3× GTX 1080 Ti from 2017 No Tensor Cores No NVLink 33GB total VRAM Qwen 3.6 35B A3B MoE 229K context 71.43 tok/s benchy best 78 tok/s peak Qwen 3.6 27B Dense 196K context 29.68 tok/s best No TurboQuant fork. Key flags: --cache-type-k q4_0 --cache-type-v q4_0 --spec-type draft-mtp --spec-draft-n-max 2 Pascal is not dead 😄
Daniel Moll tweet media
English
24
26
260
13.4K
@haroun retweetledi
Nous Research
Nous Research@NousResearch·
It has been a pleasure collaborating with the @NVIDIAAI team to ensure that Hermes Agent runs perfectly on DGX Spark!
NVIDIA AI PC@NVIDIA_AI_PC

Run @NousResearch's Hermes Agent fully locally on DGX Spark. 🚀 Our newest playbook shows you how to get set up via @Ollama step by step. 👇

English
50
77
1.2K
67.3K
@haroun retweetledi
Dheepan Ratnam
Dheepan Ratnam@Dheepanratnam·
AI is wild Chatgpt 2.0 images + Seedance 2.0 tried to generate a red carpet AI video using this custom Sydney Sweeney character sheet. the result looks exactly like "we have sydney sweeney at home" 😭 A Simple Prompt used to test - Use the provided character reference for the facial features and body structure. create a 15 second red carpet multi shot walk with elegant designer wear
English
42
101
980
91.2K