Rompel

1K posts

Rompel banner
Rompel

Rompel

@ukrroot

Running local AI on consumer hardware. RTX 5090 + Mac Studio M4. Benchmarks, costs, what actually works — receipts only.

Bergabung Aralık 2007
764 Mengikuti171 Pengikut
Tweet Disematkan
Rompel
Rompel@ukrroot·
My home AI lab's electricity bill last month: $0. 93 kWh consumed (5090 + Mac Studio + Comfy daily). 154 kWh produced by solar panels. 0 from the grid. Same workload on cloud APIs: ~$300/mo. Posting the receipts here.
English
1
0
2
579
Rompel
Rompel@ukrroot·
@KanaWorks_AI @Hailuo_AI 70秒はワークフロー次第で大きく変わります。自分の4090だと解像度とサンプラー設定で倍近く動く。どのGPUと解像度で出た数字ですか?花びらの分離はVAEのデコード品質も効いてくる部分ですね。
日本語
0
0
1
12
KANA|東京AI映像
KANA|東京AI映像@KanaWorks_AI·
@ukrroot @Hailuo_AI 花びらの一枚一枚がしっかり分離して見えるのが本当にすごいですね。 金色の房飾りのディテールもとても綺麗です。 ローカル環境で70秒というのも驚きました✨
日本語
1
0
0
6
KANA|東京AI映像
KANA|東京AI映像@KanaWorks_AI·
Prompt Share 114- つまみ细工 Image : Nanobanana 2 made in @Hailuo_AI Prompt Japanese traditional handmade accessory product photography, an elegant Tsumami Zaiku fabric flower ornament shaped like a chrysanthemum, made from layered folded fabric petals in white, red, and dark tones with green leaves. The ornament is attached to a gold cord with a decorative bead and a long golden tassel. The accessory is placed inside a light wooden jewelry box with a black velvet interior. The background is a dark textured stone surface. Beside the box lies a small Japanese card featuring a crane and a red sun motif. Elegant Japanese aesthetic, traditional craft atmosphere. high-end product photography, soft studio lighting, macro detail, shallow depth of field, ultra realistic texture, commercial still life photography.
KANA|東京AI映像 tweet media
KANA|東京AI映像@KanaWorks_AI

Prompt Share 110- 夜桜 Image : Nanobanana 2 made in @Hailuo_AI Prompt night sakura trees lined along a narrow canal, glowing pink cherry blossoms at night, water mirror reflection perfectly reflecting the trees, stone riverbank and green grass slope, perspective leading into the distance, low angle shot near the water surface, deep indigo night sky, neon pink illumination, dreamy spring atmosphere, cinematic composition, ultra detailed, long exposure photography, highly saturated colors, Japan spring scenery

日本語
2
4
39
1.7K
Rompel
Rompel@ukrroot·
@simonw @GroqInc @cerebras 744B is the whole point of wafer-scale silicon—nobody's running that MoE at home, even at Q2. The ~40B active helps their compute, not your VRAM. Real question: does Cerebras have it up before the GGUFs even land?
English
0
0
0
48
Simon Willison
Simon Willison@simonw·
Really looking forward to one of the super-fast custom silicon inference providers like @GroqInc or @cerebras getting GLM 5.2 running Cerebras has GLM-4.7, Groq is still mostly Llama 3.x and gpt-oss
Jeremy Howard@jeremyphoward

Wow. @Zai_org GLM 5.2 is a marvel! It is *at least* as good as Opus 4.8 and GPT 5.5. It's super fast, inexpensive, and not too verbose. It responds with nuance and judgement, & handles long context VERY well. I've never experienced an open weights model like this before.

English
71
41
1K
99.6K
Rompel
Rompel@ukrroot·
Cramming a 70B at 4-bit onto a 24GB card gets you ~3 tok/s. Useless for an agent loop. VRAM capacity isn't the bottleneck, memory bandwidth is. Benchmark that 70B against a 32B at 6-bit on the same card under a multi-turn workload.
English
0
0
0
9
Rompel
Rompel@ukrroot·
@pcuenq mxfp4 single-box is the realistic config most people will actually run, so those numbers matter more than the RDMA-fanned multi-node ones. Curious what tok/s you see at decent context before RDMA even enters the picture.
English
0
0
0
4
Pedro Cuenca
Pedro Cuenca@pcuenq·
@ukrroot This is mxfp4, which actually fits in a single machine. I'll grab some numbers, but we don't have RDMA enabled so there will be room for improvement.
English
1
0
8
2.5K
Pedro Cuenca
Pedro Cuenca@pcuenq·
GLM 5.2 has just been released 🔥 Here it's already running with MLX on two Mac Studios (M3 Ultra). This is comparable to the latest closed models, with weights you can download, quantize, distill, fine-tune, run.
English
42
49
720
81.6K
Rompel
Rompel@ukrroot·
@LewisMediaAUS Skin tone is where Flux still loses to the newer ones — the default look skews waxy without a realism LoRA. What sampler/steps did you run for that ImagineArt pass? Curious if it's the model or just better tuning on the face.
English
1
0
1
7
Margarita Media Australia
Margarita Media Australia@LewisMediaAUS·
@ukrroot Looking at your Flux render it is a strong second, I like the composition of the neon buildings in the background that pop more. ImagineArt 2.0 seems most “realistic” for the woman’s skin tone.
English
1
0
0
8
Margarita Media Australia
Margarita Media Australia@LewisMediaAUS·
Same prompt 2 different image models. ImagineArt 2.0 and Seedream v5 lite. Which do you prefer?
Margarita Media Australia tweet mediaMargarita Media Australia tweet media
English
2
0
0
30
Rompel
Rompel@ukrroot·
@AndrewFromDO Fair, "system" was my word, not yours. What I'm poking at: do those 4 steps stay fixed across tasks, or do you reorder/drop them depending on the problem? The ordering is where most of the leverage hides.
English
1
0
0
4
Andrew @ DreamOpera
Andrew @ DreamOpera@AndrewFromDO·
@ukrroot Not sure what you mean by system. The steps I think in are the 4 ones I mentioned.
English
1
0
0
4
Andrew @ DreamOpera
Andrew @ DreamOpera@AndrewFromDO·
How I'm getting cinematic AI worlds like this in Midjourney: Step 1: Build a civilization, not a location Most people prompt: "futuristic tropical city" I prompt: "post-scarcity lagoon civilization built around reef restoration, vertical farming, water transport, bamboo biophilic architecture" The story comes first. The visuals follow. Step 2: Give every object a purpose Don't add random cool stuff. Ask: • How do people travel? • What do they eat? • Where does energy come from? • What does luxury look like? • What replaced cars? • What replaced skyscrapers? When everything has a reason to exist, the image instantly feels cinematic. Step 3: Think like a production designer I layer: Architecture + Infrastructure + Culture + Environment Example: Floating labyrinth council platforms. Cliffside vertical farms. Reef restoration lagoons. Organic bamboo megastructures. Water-based transit. Every frame should reveal part of the world. Step 4: Direct it like a film Add: wide establishing shot, environmental storytelling, cinematic scale, volumetric god rays, atmospheric perspective, production design, practical architecture, lived-in details, epic composition, film still Stop generating images. Start generating movie frames. Midjourney is absurd when you treat it like a worldbuilding engine instead of an image generator. The gap between average prompts and this is mostly storytelling.
Andrew @ DreamOpera tweet mediaAndrew @ DreamOpera tweet mediaAndrew @ DreamOpera tweet mediaAndrew @ DreamOpera tweet media
English
1
0
2
61
Rompel
Rompel@ukrroot·
@JordanRey98 @Zai_org Right, full weights are out of reach at home. Question is whether the quants land — a 3-bit GGUF that fits 24GB and holds a 16k context is the only version most of us will ever touch. Until those drop, it's an API model.
English
0
0
0
2
Z.ai
Z.ai@Zai_org·
Introducing GLM-5.2: Frontier Intelligence, Open Weights - Significant improvements in coding and agentic tasks - Strong long-horizon capabilities with a 1M context window - Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency - MIT-licensed open weights - Same API pricing as GLM-5.1 Tech Blog: z.ai/blog/glm-5.2 Weights: huggingface.co/zai-org/GLM-5.2 API: docs.z.ai/guides/llm/glm… Coding Plan: z.ai/subscribe Chat: chat.z.ai
Z.ai tweet media
English
613
1.6K
11.4K
6.4M
Rompel
Rompel@ukrroot·
@Latent_photo Yeah, the explicit "sharp, well-defined rectangular beam" does the heavy lifting — the model latches onto hard-edge geometry better than vague "golden hour" prompts. Are you running this through Flux or SDXL? Curious if the beam holds shape across seeds or needs a fixed one.
English
0
0
0
6
Latent photography
Latent photography@Latent_photo·
@ukrroot I think that "Golden Hour Window Projection: A sharp, well-defined rectangular beam of warm, golden sunlight is projected from an unseen window directly onto the model's face and torso" was the key
English
1
0
0
2
Latent photography
Latent photography@Latent_photo·
Shooting with some light casting, focus on #chiaroscuro #AIart #nanobanana --- Prompt GLOBAL STYLE & QUALITY ​Ultra-high-end hyper-realistic editorial fashion photography. Minimalist Studio aesthetic. Cinematic "Chiaroscuro" lighting. High-resolution 9:16 portrait. Focused on extreme leg elongation and sharp architectural shadows. ​LAYER 1: THE BACKGROUND (ENVIRONMENT) ​A minimalist, high-ceilinged professional photo studio. The background is a clean, matte off-white plaster wall. The floor is a polished light grey concrete that subtly reflects the light. ​LAYER 2: THE LIGHTING (ATMOSPHERE) ​Golden Hour Window Projection: A sharp, well-defined rectangular beam of warm, golden sunlight is projected from an unseen window directly onto the model's face and torso. The rest of the scene remains in deep penumbra and soft shadows, creating a dramatic high-contrast effect. Dust motes dancing in the golden beam. ​LAYER 3: THE SUBJECT (SUPERMODEL) ​Model: A statuesque East European supermodel, exceptionally tall. ​Face & Hair: Sharp "Bobcat" haircut (sleek bob with volume). Intense black "sfumato" smoky eye makeup, pale lips, high fashion editorial expression. ​Clothing: Sophisticated black long-sleeved bodysuit with a high-cut leg line and sheer mesh/transparent paneling on the midriff. Elegant black stiletto high heels. ​Pose: Iconic wide-legged "Power Pose". Standing tall with hands firmly on hips, chest out, leaning slightly back against the wall. The stance is wide to emphasize the negative space between the legs. ​CAMERA & TECHNICAL ​Technical: 14mm ultra-wide-angle lens, positioned 3 meters away and at the model eyes level ​Composition: The model's stilettos are at the bottom edge of the frame, creating a forced perspective that dramatically elongates the legs toward the ceiling. ​Color Palette: Deep Obsidian Black, warm Golden Amber (light beam), and neutral Stark White/Grey. ​NEGATIVE PROMPT ​(bright room), (even lighting), (outdoor), desert, sand, long hair, colorful clothes, squatting, sitting, deformed limbs, low resolution, CGI, 3D render, blurred face, casual pose, vintage filter
Latent photography tweet mediaLatent photography tweet media
English
2
0
0
121
Rompel
Rompel@ukrroot·
@thesystms @ComfyUI Makes sense if it's just node-graph processing and no diffusion in the loop. What's the bottleneck on the full-clip pass — decode/encode or the actual ops? Frame-by-frame streaming would kill the batch speedup but get you closer to real time.
English
0
0
0
16
SYSTMS
SYSTMS@thesystms·
@ukrroot @ComfyUI there's no generative AI involved, so it's pretty quick in ComfyUI, but not real time as it processes the full video clip at once.
English
1
0
0
163
SYSTMS
SYSTMS@thesystms·
Today we're launching TimeSlice - a powerful @ComfyUI node pack for creating dynamic visual effects! Links below to the GH page + real-time demo site + Spotify link to our new single "Look Up" ⬇️⬇️⬇️
English
6
9
102
9K
Rompel
Rompel@ukrroot·
The blur-and-bolt sref panel is the hard one — that twilight palette wants to go muddy and the horses want to fuse together. Ran the dusk-field scene locally on a 5090 (Flux 2 + Turbo LoRA), 1024² in ~70s, solar so basically free. Mine reads a touch sharper than the sref — would push the motion blur next pass. What seed range were you on?
Rompel tweet media
English
0
0
0
18
Midjourney Sref and prompt Library
Apr 19, 2026 - Most popular sref on PromptSref.com: 🏆 Top 1 Sref: --sref 2543866241 --v 7 --sv 6 ❤️ Likes number: 3 ✨ ## sref Style Characteristics Analysis This set of SREF presents a highly visually striking style that fuses **avant-garde experimental photography with neo-sci-fi (Neon-noir)**. It cleverly captures the physical photography textures of long exposure (Long Exposure) and light painting (Light Painting), combining them with surreal out-of-focus and ghosting effects. This style is reminiscent of master photographer Ernst Haas's extreme use of motion blur, while also bearing the signature high-contrast neon aesthetics found in Nicolas Winding Refn's films. The most impressive aspect is the **extreme red and blue duo-tone (Duo-tone) collision**: the deep Klein blue background and the aggressively fluorescent red light trails form a strong contrast of cold and warm tones. This style abandons clear edges, using flowing light and shadow along with blurry afterimages to express emotion, creating a dreamlike atmosphere that is both mysteriously dangerous and full of futuristic tech vibes. ## What is Neon Long Exposure Neon Long Exposure is a visual art style that simulates the slow shutter effect of traditional cameras. In physical photography, when the shutter remains open, moving luminous objects leave a trail of motion (i.e., light trails) on the film, while the moving subject also creates a semi-transparent blur or trail. In the context of AI painting, this style extracts the core characteristics of "long exposure", "light painting", and "multiple exposure", and forcefully injects high-saturation neon colors (such as deep blue, bright red). It is not only a visual special effect but also an artistic expression that conveys a sense of speed, the passage of time, or the fluctuation of a character's inner emotions through "blur" and "flowing light and shadow". ## Neon Long Exposure Use Cases This emotionally tense and avant-garde style is particularly suitable for the following specific creative scenarios: 1. **Music and Album Visuals**: Especially album covers for electronic music, Synthwave, post-punk, or experimental music, perfectly conveying a psychedelic and rhythmically charged auditory atmosphere. 2. **Avant-garde Fashion Editorials and Magazine Layouts**: High fashion brands or independent fashion magazines often use this highly experimental motion blur to showcase the reflective materials of clothing or the extreme tension of the models. 3. **Cyberpunk or Sci-Fi Themed Posters**: Used for promotional posters of movies, series, or indie games, instantly establishing a dark, blurry future-city worldview through the classic pairing of red and blue neon lights. 4. **Psychological Suspense Book Covers**: Using ghosting and blurred character outlines to metaphorically represent the character's inner split, confusion, or a mysterious and unknown sense of suspense. ## Neon Long Exposure prompt Inspiration To recreate this style in MidJourney, you can try incorporating the following keyword combinations into your Prompt: - `long exposure photography` - `light painting trails` - `motion blur, ghostly effect` - `neon red and deep dark blue duo-tone` - `avant-garde fashion photography` - `cinematic lighting, low-key lighting` - `surrealism, ethereal atmosphere` You can try using a simple sentence structure like this: *A mysterious figure, long exposure photography, neon red light painting trails, deep dark blue background, motion blur, surrealism.* To get more advanced parameters and exquisite combinations, upgrade to a website member to unlock all the prompts on the website, so your creative inspiration never runs dry! 🎨 Want to know how I use this sref? Check out the specific prompts on our website! 💎 website: promptsref.com 📩 Weekly newsletter: underwoodxie.substack.com 🔊 Join our Discord: discord.com/invite/AMTn64F… #midjourney #sref
Midjourney Sref and prompt Library tweet media
English
2
2
19
6.4K
Rompel
Rompel@ukrroot·
The reflection-as-other-self read is what sells this — the trick was keeping her ethereal without the mirror just looking like a portrait hung on the wall. Mud texture on the brow vs. the soft pale hair does a lot. Ran it local on a 5090 (Flux 2), 768x1344, ~70s, solar so no meter moving. How many tries to land the frame?
Rompel tweet media
English
0
0
0
15
Rompel
Rompel@ukrroot·
@artingent What sells a hollow-mountain city is scale cues — those tiny arched bridges stacking up the cavern walls make the cathedral spire feel huge. Tried mirroring yours on our own hardware: Flux 2 dev, landscape render in just over a minute, nothing on the meter but sunshine.
Rompel tweet media
English
0
0
0
15
Rompel
Rompel@ukrroot·
The thing that makes "cinematic" land here isn't the cloak — it's the valley doing the work behind it. Gave your base prompt a depth cue (ruins on a cliff, river threading the gorge) so the scale reads before the figure does. Local Flux 2 run, 1536x1024, ~70s on a 5090, solar so $0 to render.
Rompel tweet media
English
1
0
1
12
Tischeins
Tischeins@tisch_eins·
Cinematic Lab. Saturday edition. Same prompt. Three completely different ideas of what "cinematic" actually means. Base prompt — copy it, swap the code: a figure moving through a dramatic landscape, powerful atmosphere, strong sense of tension and scale, cinematic light and composition --v 7.0 --ar 5:4 ① Arthouse. Quiet tension. A film that breathes. --sref 776560310 ② Pixel action. Rain and fire. Pure Saturday energy. --sref 1398731592 ③ War film still. Smoke and ruins. The shot before the silence. --sref 1710511781 Which one fits your weekend mood? A, B or C. 👇 Do you need the prompts? Just tell me. 😀 #Midjourney #SREF #CinematicAI #AIArt #CinematicLab #tischeins
Tischeins tweet mediaTischeins tweet mediaTischeins tweet media
English
2
1
11
378
Rompel
Rompel@ukrroot·
That dangling gondola off the cliff edge is what makes the whole island feel like a real toy you could pick up. Reproduced it locally — Flux 2 dev + Turbo LoRA, one 5090 running off the roof panels, about 70 seconds. The glowing chalet windows against the cold snow were the detail I kept chasing.
Rompel tweet media
English
0
0
1
15
Artingent
Artingent@artingent·
Whimsical diorama of a small alpine settlement built around a mountain cable car station generated using Seedream 4.5 Prompt SCENE: A small alpine settlement built around a mountain cable car station with wooden chalets, equipment sheds, and snowy pathways connecting the buildings. Tiny skiers carry gear while birds perch on cable towers. STYLE: Create a premium stylized miniature diorama where sculpted snowy terrain and mountain slopes form a continuous terrain base. The entire diorama should float in mid-air without touching any surface. Surround the model with a blank single-color background environment that contrasts with the snowy landscape. Use stylized snow textures, compact chalets, cable towers, and warm window glow. Include miniature skiers, birds, sleds, trees, and gear racks to reinforce handcrafted collectible scale. Lock the render to 4:3. CAMERA: Use an isometric camera angle capturing the entire floating diorama. Ensure the full terrain base and all structures are visible within the frame from top to bottom and side to side.
Artingent tweet media
English
1
1
12
123
Rompel
Rompel@ukrroot·
@8fstudioz Model-agnostic prompts are why I keep my workflow in ComfyUI — same nodes, swap the checkpoint, see how SDXL vs Flux read the same tokens. The drift between them is the interesting part. How many iterations do you usually need before a prompt holds across all three?
English
0
0
0
25
8fstudioz
8fstudioz@8fstudioz·
Great result! Glad my prompt worked for you. I try to keep my prompts model-agnostic so I’m not tied to any single image generator; different models interpret prompts differently, so a few iterations are often part of the process. I usually work across Nano Banana Pro, GPT, and Midjourney.
English
1
0
0
76
8fstudioz
8fstudioz@8fstudioz·
Prompt Share: Ever export a gorgeous Midjourney render, only to realize it lacks that final, cinematic "magic"? You don't have to re-roll. You can inject authentic film stock, anamorphic lens flares, and professional color grading into ANY existing image using GPT Image 2. Here is the exact workflow to completely transform your renders. Works with single images or 9 shot grid storyboard. 👇 1. The Setup ⚙️ Take your base image and drop it into any tool that has GPT Image 2. (I'm using Adobe Firefly) We are going to use my Global Style Line prompt with a custom intensity modifier to overhaul the lighting and lens effects while keeping our original composition intact. 2. The Secret Sauce: Controlling Style Intensity %. The magic trick here is defining the exact percentage of the style transfer. Add Global Style Line 80%: (or 90%) at the start of your prompt. Why? In my testing, 80% to 90% is the ultimate sweet spot. It gives you maximum stylistic impact (rich lens flares, heavy film grain, color grading) without destroying the core subject or geometry of your base image. But definitely experiment! 3. The Prompt I Used (Blockbuster Vibe) 🎬 Global Style Line 80%: > Cinematic 4k, shot on 35mm Kodak Vision3 250D film stock, moderate film grain, rapid jump cut editing style, J.J. Abrams science fiction blockbuster aesthetic, classic anamorphic lens flares, rich color grading. The result added, sweeping horizontal blue flares and locked in that warm sunset lighting perfectly. 🔄 Swap the Style to Fit Your Vibe! You can use this exact template with any film style. Here are two other killer style lines to try: 🤖 The "Blade Runner 2049" Look (Neon-Noir) Global Style Line 80%: Cinematic 4k, shot on Arri Alexa XT, Zeiss Master Prime wide-angle lenses, Roger Deakins cinematography, Blade Runner 2049 neon-noir aesthetic, heavy atmospheric fog, stark striking silhouettes, precise soft lighting, iconic orange and cyan color grading. ⚡ The "Marvel Cinematic Universe" Look (Clean & Vibrant) Global Style Line 80%: Cinematic 4k, shot on Arri Alexa 65 digital camera, spherical lenses, Marvel Cinematic Universe superhero aesthetic, sharp edge-to-edge focus, vibrant primary colors, high clarity, polished digital finish, bright and even lighting. Dial in your style, lock the percentage, and elevate your images. #AIArt #PromptEngineering #GPTImage2 #Midjourney #CreativeAI #Anamorphic @AdobeFirefly
8fstudioz tweet media8fstudioz tweet media8fstudioz tweet media
English
1
0
4
592
Rompel
Rompel@ukrroot·
@GeorgeWuAI The ComfyUI lag is real but it's mostly the node ecosystem catching up, not the models. Flux at fp8 on the 5090 runs fine the day weights drop. Where's gpt image 2 actually winning for you — prompt adherence or just clean text rendering?
English
1
0
1
40
George Wu
George Wu@GeorgeWuAI·
Looks pretty good. I have a 5090 too, but noticed anything ran on comfyui is about 1 year behind. But I would say the gap is closing for open source image models to lets say gpt image 2. Right now for me for my workflow and stack its #1) gpt image 2 (mostly), #2) nanobanana pro/2 and #3) seedance 5 lite. If you played around with all of them as much as I do then you would see why gpt image 2 is the best right now.
English
2
0
0
52
George Wu
George Wu@GeorgeWuAI·
Seedream 5.0 Lite (left) vs GPT Image 2 (right) character sheet comparison. Both are extremely good! Prompt: Horizontal character reference sheet on pure white background, three views side by side: front view (left), side profile (center), upper body close-up (right). Follow image Photorealistic, studio lighting, 16:9 horizontal layout. full body sheet. no text Created in picXstudio.com
George Wu tweet mediaGeorge Wu tweet media
English
2
0
2
113
Rompel
Rompel@ukrroot·
@NTWR_LaL @pcuenq Right, that's the catch. Two cards don't pool VRAM unless the framework shards across them — tensor parallel in vLLM, or layer splits in llama.cpp. You get more total VRAM for a bigger model, not faster single-stream tok/s. Linking two 4090s gets you 48GB, not a 5090.
English
0
0
0
19
natwarlal
natwarlal@NTWR_LaL·
@ukrroot @pcuenq Yes exactly. They can’t share memory. How and why would i connect two?
English
2
0
0
62
Rompel
Rompel@ukrroot·
@Yves_From_BE @pcuenq Tensor parallel over TB RDMA scaling dense well tracks — it's bandwidth-bound and TB has the headroom. What model and size are you splitting across the two Macs, and what tok/s gap are you seeing vs single? Decode or prefill, since you say MoE sees less.
English
0
0
0
16
Yves Van Den Broek
Yves Van Den Broek@Yves_From_BE·
@ukrroot @pcuenq With RDMA over thunderbolt there is no real slow down. In fact dense models scale quite well. MoE models see less benefit, though prefill speeds up as well . I see serious speedup between 1 or 2 Macs and the same model. Condition is you use Tensor/RDMA and not pipeline/TCP
English
1
0
1
45
Rompel
Rompel@ukrroot·
@ivanfioravanti Agreed, get the baseline solid first. EAGLE3's acceptance rate swings hard with the draft model match — if the base path is jittery the speculative gains just hide it. What backend are you stabilizing on, vLLM or MLX?
English
0
0
0
13
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
@ukrroot I bet we can try to use EAGLE3, let's first make standard decoding flow stable and fast.
English
1
0
0
35
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
oMLX supports MiniMax M3 locally! 🚀
Jun Kim@jundotkim

oMLX 0.4.4rc2 is out with early MiniMax M3 support, made possible by the awesome mlx-vlm work from @Prince_Canuma and @ivanfioravanti, tracking the upstream mlx-vlm PR. Stable 0.4.4 is planned after a short final RC test pass! MiniMax M3 is supported with oMLX features including SSD cache, prefix cache, continuous batching, and the OpenAI-compatible API. This release also adds stronger macOS 27 compatibility, safer native MTP batching, more robust Gemma 4 / Harmony tool-call handling, and additional cache / Memory Guard hardening. MiniMax M3 single-request results (M3U 512G, ssd-cache on) pp1024 332.6 tok/s, tg128 28.9 tok/s pp4096 359.8 tok/s, tg128 20.8 tok/s pp8192 340.2 tok/s, tg128 20.2 tok/s pp32768 243.5 tok/s, tg128 18.9 tok/s Continuous batching at pp1024/tg128: 1x 28.9 tok/s 2x 40.1 tok/s 4x 51.3 tok/s 8x 57.8 tok/s github.com/jundot/omlx/re…

English
7
2
57
6.4K