Josh Leverette

543 posts

Josh Leverette

@coder543

Ready to move to space. Software engineer specializing in Rust and Go, with a variety of other languages used from time to time.

参加日 Eylül 2009

135 フォロー中86 フォロワー

Josh Leverette@coder543·16 Nis

@ronigoldshmidt yeah, I don't think braking will help here... already standing still.

English

Roni Goldshmidt@ronigoldshmidt·16 Nis

156

Roni Goldshmidt@ronigoldshmidt·16 Nis

שרשור הדגמה של יכולות ההכלה של BADAS-2.0 מודל חיזוי הסכנות הפיזיקלי שלנו. לפני זמן קצר השקנו באופן רשמי את המשפחה של מודל העולם שלנו. המודלים הללו מציגים יכולת הכללה קיצונית, ובשרשור להלן אתן הצצה לכמה דוגמאות. המודל זמין לשימוש בעמוד ההשקה שלנו, אתם מוזמנים לנסות בעצמכם:

עברית

3.5K

Josh Leverette@coder543·14 Nis

@wbjang11 That’s really cool! Are the code/weights going to be available under a commercial-friendly license? I’ve wanted a model like this for a long time!

English

Wonbong Jang / Won@wbjang11·13 Nis

New paper: Rays as Pixels We represent camera rays as pixels ("raxels") and learn a joint distribution over video + trajectories. One model does pose estimation and novel view synthesis, and we show the two are self-consistent. 📄arxiv.org/abs/2604.09429 🌐 wbjang.github.io/raysaspixels/

English

239

18.9K

Josh Leverette@coder543·12 Nis

@MiniMax_AI “Open source” and a non-commercial license aren’t really compatible. This is “source available”.

English

876

MiniMax (official)@MiniMax_AI·12 Nis

We're delighted to announce that MiniMax M2.7 is now officially open source. With SOTA performance in SWE-Pro (56.22%) and Terminal Bench 2 (57.0%). You can find it on Hugging Face now. Enjoy!🤗 huggingface：huggingface.co/MiniMaxAI/Mini… Blog: minimax.io/news/minimax-m… MiniMax API: platform.minimax.io

English

287

763

5.6K

1.6M

Josh Leverette@coder543·2 Nis

@osanseviero !!!

QAM

185

Omar Sanseviero@osanseviero·2 Nis

💎

Logan Kilpatrick@OfficialLoganK

Gemma

ART

173

11K

Josh Leverette@coder543·2 Nis

@OfficialLoganK Do my eyes deceive me?! It’s happening?!

English

Logan Kilpatrick@OfficialLoganK·2 Nis

Gemma

Nederlands

155

115

2.1K

327.4K

Josh Leverette@coder543·31 Mar

@Alibaba_Qwen It makes us nervous that this is a “Plus” model, one of the proprietary models. And you just released a proprietary Omni model. Is Alibaba Qwen still doing open models?

English

Qwen@Alibaba_Qwen·31 Mar

Our new model is now live on OpenRouter for an early preview, go give it a try! Looking forward to your feedback~😎

OpenRouter@OpenRouter

Qwen 3.6 Plus Preview from @Alibaba_Qwen is live now for free for a limited time on OpenRouter! During this free period, prompts and completions will be collected and may be used to improve the model.

English

1.1K

107.5K

Josh Leverette@coder543·29 Mar

@osanseviero Gemma4 inference code would be great to see

English

508

Omar Sanseviero@osanseviero·29 Mar

New GitHub org 🔥 So far, we have a cookbook with inference and fine-tuning recipes for Gemma. What else would you like to see here? github.com/google-gemma

English

166

11K

Josh Leverette@coder543·17 Mar

@skalskip92 LightOnOCR-2 is really good, and FireRed OCR has also impressed me. GLM-OCR is good, but it's not even the one I'm most likely to reach for. But, directing it to extract specific things is an unconventional use case for an OCR model, and it is interesting to see that it works.

English

2.7K

SkalskiP@skalskip92·17 Mar

spent most of my day playing with GLM-OCR it's a 0.9B param vision-language model. supports 8K resolution, 8+ languages, and has built-in text, LaTeX, and table recognition modes. awesome! I tested it across different OCR tasks. starting with shipping container serial numbers.

English

817

273.1K

Josh Leverette@coder543·10 Mar

@skalskip92 Note that Qwen3.5 seems to be overpriced because it is so new. Look at even bigger models (more expensive to serve) like DeepSeek-V3.2, which is $0.25 and $0.40 on OpenRouter.

English

SkalskiP@skalskip92·10 Mar

$5K figure is based on Anthropic's retail API prices. not actual compute costs. on OpenRouter, comparable models cost ~10x less: - Opus 4.6 API: $5 / $25 per MTok - Qwen 3.5 397B: $0.39 / $2.34 per MTok real cost per power user? ~$500. not $5,000. [2/5]

English

3.1K

SkalskiP@skalskip92·10 Mar

Cursor claims it costs Anthropic $5,000/mo to serve each $200 Claude Code user. 25x loss on every subscriber. but does it actually? [1/5]

English

21.1K

Josh Leverette@coder543·18 Şub

@ArtificialAnlys What about Soniox? Parakeet TDT V2 is supposed to be better than V3 at English transcription. Let’s not forget the crowd pleasers: if you want to get attention, add Apple and Google’s default keyboard transcription models to the benchmark dataset… they’re amusingly terrible!

English

231

Artificial Analysis@ArtificialAnlys·18 Şub

Announcing AA-WER v2.0 Speech to Text accuracy benchmark, and AA-AgentTalk, a new proprietary dataset focused on speech directed at voice agents AA-AgentTalk focuses on the speech that matters most to voice agents. As a held-out, proprietary dataset, AA-AgentTalk also mitigates the risk of models training to perform well on public test sets. Leading public Speech to Text datasets contain errors in their reference transcripts, where the ground truth doesn't match what was actually said. We've manually corrected these and are open-sourcing cleaned versions of VoxPopuli and Earnings22 on Hugging Face. What's changed in v2.0: ➤ New held-out, proprietary dataset - AA-AgentTalk (50% weighting): 469 samples (~250 minutes) of speech directed at voice agents, and it's private so models can't train on it. Spans voice agent & call center interaction, AI agent interaction, industry jargon, meetings, consumer & personal, and media content across 17 accent groups, 8 speaking styles, and a mix of devices and environments. ➤ Cleaned transcripts for existing public datasets: We identified errors in the original ground truth transcriptions for public datasets, VoxPopuli and Earnings22 - instances where reference transcripts didn't accurately capture what was actually said. Inaccurate ground truth unfairly penalizes models that correctly transcribe the audio, so we manually reviewed and created cleaned versions, VoxPopuli-Cleaned-AA and Earnings22-Cleaned-AA. ➤ Removal of AMI-SDM: We removed the AMI-SDM dataset as the transcript errors were too extensive to correct without making a large number of judgment calls we weren't comfortable with (e.g., heavily overlapping speech). ➤ Improved text normalization: We developed a custom text normalizer building on OpenAI’s whisper normalizer package to reduce artificially inflated WER from formatting differences rather than genuine transcription errors. Key fixes include digit splitting to prevent number grouping mismatches (e.g., 1405 553 272 vs. 1405553272), preservation of leading zeros, normalization of spoken symbols (e.g., “+”, “_”), stripping redundant :00 in times (e.g., 7:00pm vs. 7pm), adding additional US / UK English spelling equivalences (e.g., totalled vs totaled), and accepted equivalent spellings for ambiguous proper nouns in our dataset (e.g., Mateo vs. Matteo). This ensures models are evaluated on actual transcription accuracy rather than surface-level formatting choices. The new weighting is 50% AA-AgentTalk, 25% VoxPopuli-Cleaned-AA, 25% Earnings22-Cleaned-AA. Key results: @elevenlabs's Scribe v2 leads at 2.3% AA-WER v2.0, followed by @GoogleDeepMind's Gemini 3 Pro at 2.9%, @MistralAI's Voxtral Small at 3.0%, Google's Gemini 3 Flash at 3.1%, and ElevenLabs Scribe v1 at 3.2%. ElevenLabs Scribe v2 leads on two of the three component datasets, AA-AgentTalk and Earnings22-Cleaned-AA, while Google's Gemini 3 Pro leads on VoxPopuli-Cleaned-AA. See below for further detail.

English

204

27.1K

Josh Leverette@coder543·2 Şub

@skalskip92 Is there a pothole-detection huggingface space we can use to try this out? It looks fun!

English

180

SkalskiP@skalskip92·2 Şub

taking pothole detection to the next level real-time; takes only 15 minutes to train; no manual annotations - auto-annotated the dataset with SAM3 - fine-tuned RF-DETR segmentation model - track and count potholes with ByteTrack

Marco Franzon@mfranz_on

This is the power of YOLO, trained on a laptop for ~1 hour, with a Kaggle dataset. Oh, and just ~100 lines of Python. I can make a startup on this and it took me literally a couple of hours.

English

410

27.4K

Josh Leverette@coder543·17 Oca

@ArtificialAnlys @bfl_ml @ArtificialAnlys the post still has not been corrected.

English

Josh Leverette@coder543·17 Oca

@ArtificialAnlys @bfl_ml That is not true: “All four variants are released under the Apache 2.0 license, enabling unrestricted commercial use.” Only the 4B models are under Apache 2.0. The 9B models are under a non-commercial license.

English

250

Artificial Analysis@ArtificialAnlys·17 Oca

FLUX.2 [klein] is the new open weights image model from Black Forest Labs, with the 9B variant ranking as the top open weights image editing model in the Artificial Analysis Image Editing Arena! FLUX.2 [klein] is the spiritual successor to FLUX.1 [schnell] from @bfl_ml, released in four variants across two sizes: 9B and 4B parameters, each with main and base variants. The main models are 4-step distilled for faster generation, while the base models may be better suited for fine-tuning. All four variants are released under the Apache 2.0 license, enabling unrestricted commercial use. In image editing, FLUX.2 [klein] 9B surpassed even FLUX.2 [dev], Black Forest Labs' own 32B non-commercial open weights model, while the 4B variant performed slightly better than the original Qwen Image Edit. For text to image, FLUX.2 [klein] 9B ranks #4 among open weights models, trailing FLUX.2 [dev] Turbo and Qwen Image 2512. FLUX.2 [klein] uses megapixel-based pricing with a fixed base for the first megapixel (on BFL’s first party API). For text to image at 1MP, the 9B variant costs $15/1k images and the 4B variant costs $14/1k images, compared to $20/1k for Qwen Image 2512 or $8/1k for FLUX.2 [dev] Turbo. For image editing with an additional 1MP input, pricing increases to $17/1k (9B) and $15/1k (4B), substantially cheaper than Qwen Image Edit 2511 at $60/1k images. Weights are available on @huggingface, with the model also accessible via Black Forest Labs' API and third-party inference providers. See below for comparisons between FLUX.2 [klein] and other leading models in our Artificial Analysis Image Arena 🧵

English

7.6K

Josh Leverette@coder543·5 Oca

@osanseviero Gemma 4 is desperately needed, in both dense and MoE variants with reasoning.

English

Omar Sanseviero@osanseviero·5 Oca

Ok, back to shipping. What do you want to see from GDM in the next couple of months? (Gemini, Nano Banana, AI Studio, Gemma, Veo, and other AI dev products)

English

285

527

94.2K

Josh Leverette@coder543·3 Nis

@agammessi10 @skalskip92 Also consider that the human is supervising. The automated annotations might be wrong, or might not fit perfectly, but a model can often do 80% of the work in one click, and then the human cleans up the data, which the model will learn from and do better next time.

English

Agamdeep Singh@agammessi10·3 Nis

@skalskip92 This is awesome. As a student, I'm still trying to understand if a model can already do the annotation, then why is the data curation and training another model important or needed? Is it to train a smaller specialised model? Could you please share your thoughts on this.

English

1.7K

SkalskiP@skalskip92·3 Nis

this is how you annotate data in 2025 (I've been told this egg demo may trigger some US people)

English

645

108.6K

Josh Leverette@coder543·25 Şub

@ClassicMain @OfficialLoganK Gemini 1.5 Flash is $0.075/Mtok for prompts <= 128k tokens, $0.15/Mtok for prompts > 128k tokens, for input tokens. Gemini 2.0 Flash-Lite is *always* $0.075/Mtok. So it is half of the price of Gemini 1.5 Flash if you go above 128k tokens. Unless I’m very bad at reading things..?

English

177

ClassicMain@ClassicMain·25 Şub

@OfficialLoganK Why do you say that it costs the same, when it is more expensive (2x the price of gemini 1.5 flash) if you go above 128k token?

English

1.4K

Logan Kilpatrick@OfficialLoganK·25 Şub

Gemini 2.0 Flash-Lite is now available for production use, at $0.075 / 1M input tokens and $0.30 / 1M output tokens (same cost as 1.5 Flash) and a great performance upgrade with a 1-line code change : ) developers.googleblog.com/en/start-build…

English

114

1.1K

135.5K

Josh Leverette@coder543·20 Şub

@skalskip92 @onuralpszr Why do these object detectors all stop at such small parameter counts? The mAP values seem so low…? It seems like there would be use cases for a bigger, better object detector, even if it either can’t run in real time or requires an H200 to run in real time.

English

393

SkalskiP@skalskip92·20 Şub

"new" SOTA object detector, and it's NOT YOLOv12 D-FINE is a model released 3 months ago under Apache-2.0 license; I have no idea how it flew under my radar @onuralpszr thanks for adding it to leaderboard leaderboard link: leaderboard.roboflow.com ↓ more about architecture

SkalskiP@skalskip92

YOLOv12 is out; I made fine-tuning tutorial instead of relying heavily on CNN-based architectures like its predecessors, YOLOv12 introduces “area attention” module, which strategically partitions the feature map to reduce the quadratic complexity of full self-attention. notebook link: github.com/roboflow/noteb…

English

626

51.8K

Josh Leverette@coder543·5 Şub

@OfficialLoganK How is a preview different from experimental? And why isn’t the flash lite “preview” model under the Preview section of AI Studio? Seems confusing! But, I am excited that 2.0 Flash is finally GA!

English

Logan Kilpatrick@OfficialLoganK·5 Şub

Introducing Gemini 2.0 Flash-Lite, our work horse model which is much stronger than 1.5 flash, at the same industry leading price. Flash-Lite is available to preview today, and will be generally available for production in the next few weeks. (3/n)

English

162

10.3K

Logan Kilpatrick@OfficialLoganK·5 Şub

Today is the day I’ve been looking forward to for almost a year now… Say hello to Gemini 2.0 Flash, Gemini 2.0 Flash-Lite, and Gemini 2.0 Pro, our strongest lineup of models ever, available to all developers. 🧵 developers.googleblog.com/en/gemini-2-fa…

English

236

287

3.1K

293.7K

Josh Leverette@coder543·5 Kas

@SciGuySpace I suspect I’d be more surprised if another company or country had landed a Falcon 9 by now! :P

English

513

Eric Berger@SciGuySpace·5 Kas

After tonight, SpaceX has now landed a Falcon 9 family booster 362 times. In the nearly nine years since the first orbital Falcon landing, no other company or country has done this even a single time. Hopefully that finally changes within the next 12 months.

English

252

3.9K

160.3K

Josh Leverette@coder543·26 Eyl

@skalskip92 It’s cool that it works better, but I feel like the prompt language is ambiguous. I expected “left lane” to be the leftmost lane out of the 4 lanes visible, not the left roadway/carriageway. Also, what is a “car”? It highlighted a truck, which is arguably not a car.

English

144

SkalskiP@skalskip92·26 Eyl

I like Molmo's "pointing" feature especially when handling additional spatial constraints ("on right lane") a few weeks ago, I tried to do the same with Gemini-1.5; it didn't go so well

SkalskiP@skalskip92

can you use Gemini-1.5 object detection to solve real-life vision use cases? well almost... but we are not there yet. I wondered if I could use Gemini to detect and count how many cars are on the left and right lanes. ↓ I learned that:

English

195

72K

ディスカバー

@ronigoldshmidt @wbjang11 @MiniMax_AI @osanseviero @OfficialLoganK @Alibaba_Qwen @skalskip92 @elonmusk