Josh Leverette

543 posts

Josh Leverette banner
Josh Leverette

Josh Leverette

@coder543

Ready to move to space. Software engineer specializing in Rust and Go, with a variety of other languages used from time to time.

参加日 Eylül 2009
135 フォロー中86 フォロワー
Roni Goldshmidt
Roni Goldshmidt@ronigoldshmidt·
שרשור הדגמה של יכולות ההכלה של BADAS-2.0 מודל חיזוי הסכנות הפיזיקלי שלנו. לפני זמן קצר השקנו באופן רשמי את המשפחה של מודל העולם שלנו. המודלים הללו מציגים יכולת הכללה קיצונית, ובשרשור להלן אתן הצצה לכמה דוגמאות. המודל זמין לשימוש בעמוד ההשקה שלנו, אתם מוזמנים לנסות בעצמכם:
עברית
8
1
22
3.5K
Josh Leverette
Josh Leverette@coder543·
@wbjang11 That’s really cool! Are the code/weights going to be available under a commercial-friendly license? I’ve wanted a model like this for a long time!
English
0
0
0
45
Josh Leverette
Josh Leverette@coder543·
@MiniMax_AI “Open source” and a non-commercial license aren’t really compatible. This is “source available”.
English
0
0
19
876
Josh Leverette
Josh Leverette@coder543·
@Alibaba_Qwen It makes us nervous that this is a “Plus” model, one of the proprietary models. And you just released a proprietary Omni model. Is Alibaba Qwen still doing open models?
English
0
0
14
1K
Qwen
Qwen@Alibaba_Qwen·
Our new model is now live on OpenRouter for an early preview, go give it a try! Looking forward to your feedback~😎
OpenRouter@OpenRouter

Qwen 3.6 Plus Preview from @Alibaba_Qwen is live now for free for a limited time on OpenRouter! During this free period, prompts and completions will be collected and may be used to improve the model.

English
59
76
1.1K
107.5K
Omar Sanseviero
Omar Sanseviero@osanseviero·
New GitHub org 🔥 So far, we have a cookbook with inference and fine-tuning recipes for Gemma. What else would you like to see here? github.com/google-gemma
English
18
13
166
11K
Josh Leverette
Josh Leverette@coder543·
@skalskip92 LightOnOCR-2 is really good, and FireRed OCR has also impressed me. GLM-OCR is good, but it's not even the one I'm most likely to reach for. But, directing it to extract specific things is an unconventional use case for an OCR model, and it is interesting to see that it works.
English
3
1
38
2.7K
SkalskiP
SkalskiP@skalskip92·
spent most of my day playing with GLM-OCR it's a 0.9B param vision-language model. supports 8K resolution, 8+ languages, and has built-in text, LaTeX, and table recognition modes. awesome! I tested it across different OCR tasks. starting with shipping container serial numbers.
SkalskiP tweet media
English
17
55
817
273.1K
Josh Leverette
Josh Leverette@coder543·
@skalskip92 Note that Qwen3.5 seems to be overpriced because it is so new. Look at even bigger models (more expensive to serve) like DeepSeek-V3.2, which is $0.25 and $0.40 on OpenRouter.
English
1
0
1
73
SkalskiP
SkalskiP@skalskip92·
$5K figure is based on Anthropic's retail API prices. not actual compute costs. on OpenRouter, comparable models cost ~10x less: - Opus 4.6 API: $5 / $25 per MTok - Qwen 3.5 397B: $0.39 / $2.34 per MTok real cost per power user? ~$500. not $5,000. [2/5]
SkalskiP tweet media
English
3
0
15
3.1K
SkalskiP
SkalskiP@skalskip92·
Cursor claims it costs Anthropic $5,000/mo to serve each $200 Claude Code user. 25x loss on every subscriber. but does it actually? [1/5]
SkalskiP tweet media
English
21
0
55
21.1K
Josh Leverette
Josh Leverette@coder543·
@ArtificialAnlys What about Soniox? Parakeet TDT V2 is supposed to be better than V3 at English transcription. Let’s not forget the crowd pleasers: if you want to get attention, add Apple and Google’s default keyboard transcription models to the benchmark dataset… they’re amusingly terrible!
English
0
0
0
231
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Announcing AA-WER v2.0 Speech to Text accuracy benchmark, and AA-AgentTalk, a new proprietary dataset focused on speech directed at voice agents AA-AgentTalk focuses on the speech that matters most to voice agents. As a held-out, proprietary dataset, AA-AgentTalk also mitigates the risk of models training to perform well on public test sets. Leading public Speech to Text datasets contain errors in their reference transcripts, where the ground truth doesn't match what was actually said. We've manually corrected these and are open-sourcing cleaned versions of VoxPopuli and Earnings22 on Hugging Face. What's changed in v2.0: ➤ New held-out, proprietary dataset - AA-AgentTalk (50% weighting): 469 samples (~250 minutes) of speech directed at voice agents, and it's private so models can't train on it. Spans voice agent & call center interaction, AI agent interaction, industry jargon, meetings, consumer & personal, and media content across 17 accent groups, 8 speaking styles, and a mix of devices and environments. ➤ Cleaned transcripts for existing public datasets: We identified errors in the original ground truth transcriptions for public datasets, VoxPopuli and Earnings22 - instances where reference transcripts didn't accurately capture what was actually said. Inaccurate ground truth unfairly penalizes models that correctly transcribe the audio, so we manually reviewed and created cleaned versions, VoxPopuli-Cleaned-AA and Earnings22-Cleaned-AA. ➤ Removal of AMI-SDM: We removed the AMI-SDM dataset as the transcript errors were too extensive to correct without making a large number of judgment calls we weren't comfortable with (e.g., heavily overlapping speech). ➤ Improved text normalization: We developed a custom text normalizer building on OpenAI’s whisper normalizer package to reduce artificially inflated WER from formatting differences rather than genuine transcription errors. Key fixes include digit splitting to prevent number grouping mismatches (e.g., 1405 553 272 vs. 1405553272), preservation of leading zeros, normalization of spoken symbols (e.g., “+”, “_”), stripping redundant :00 in times (e.g., 7:00pm vs. 7pm), adding additional US / UK English spelling equivalences (e.g., totalled vs totaled), and accepted equivalent spellings for ambiguous proper nouns in our dataset (e.g., Mateo vs. Matteo). This ensures models are evaluated on actual transcription accuracy rather than surface-level formatting choices. The new weighting is 50% AA-AgentTalk, 25% VoxPopuli-Cleaned-AA, 25% Earnings22-Cleaned-AA. Key results: @elevenlabs's Scribe v2 leads at 2.3% AA-WER v2.0, followed by @GoogleDeepMind's Gemini 3 Pro at 2.9%, @MistralAI's Voxtral Small at 3.0%, Google's Gemini 3 Flash at 3.1%, and ElevenLabs Scribe v1 at 3.2%. ElevenLabs Scribe v2 leads on two of the three component datasets, AA-AgentTalk and Earnings22-Cleaned-AA, while Google's Gemini 3 Pro leads on VoxPopuli-Cleaned-AA. See below for further detail.
Artificial Analysis tweet media
English
10
22
204
27.1K
Josh Leverette
Josh Leverette@coder543·
@skalskip92 Is there a pothole-detection huggingface space we can use to try this out? It looks fun!
English
1
0
2
180
Josh Leverette
Josh Leverette@coder543·
@ArtificialAnlys @bfl_ml That is not true: “All four variants are released under the Apache 2.0 license, enabling unrestricted commercial use.” Only the 4B models are under Apache 2.0. The 9B models are under a non-commercial license.
English
1
0
2
250
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
FLUX.2 [klein] is the new open weights image model from Black Forest Labs, with the 9B variant ranking as the top open weights image editing model in the Artificial Analysis Image Editing Arena! FLUX.2 [klein] is the spiritual successor to FLUX.1 [schnell] from @bfl_ml, released in four variants across two sizes: 9B and 4B parameters, each with main and base variants. The main models are 4-step distilled for faster generation, while the base models may be better suited for fine-tuning. All four variants are released under the Apache 2.0 license, enabling unrestricted commercial use. In image editing, FLUX.2 [klein] 9B surpassed even FLUX.2 [dev], Black Forest Labs' own 32B non-commercial open weights model, while the 4B variant performed slightly better than the original Qwen Image Edit. For text to image, FLUX.2 [klein] 9B ranks #4 among open weights models, trailing FLUX.2 [dev] Turbo and Qwen Image 2512. FLUX.2 [klein] uses megapixel-based pricing with a fixed base for the first megapixel (on BFL’s first party API). For text to image at 1MP, the 9B variant costs $15/1k images and the 4B variant costs $14/1k images, compared to $20/1k for Qwen Image 2512 or $8/1k for FLUX.2 [dev] Turbo. For image editing with an additional 1MP input, pricing increases to $17/1k (9B) and $15/1k (4B), substantially cheaper than Qwen Image Edit 2511 at $60/1k images. Weights are available on @huggingface, with the model also accessible via Black Forest Labs' API and third-party inference providers. See below for comparisons between FLUX.2 [klein] and other leading models in our Artificial Analysis Image Arena 🧵
Artificial Analysis tweet media
English
3
7
72
7.6K
Josh Leverette
Josh Leverette@coder543·
@osanseviero Gemma 4 is desperately needed, in both dense and MoE variants with reasoning.
English
0
0
0
38
Omar Sanseviero
Omar Sanseviero@osanseviero·
Ok, back to shipping. What do you want to see from GDM in the next couple of months? (Gemini, Nano Banana, AI Studio, Gemma, Veo, and other AI dev products)
English
285
23
527
94.2K
Josh Leverette
Josh Leverette@coder543·
@agammessi10 @skalskip92 Also consider that the human is supervising. The automated annotations might be wrong, or might not fit perfectly, but a model can often do 80% of the work in one click, and then the human cleans up the data, which the model will learn from and do better next time.
English
2
0
2
49
Agamdeep Singh
Agamdeep Singh@agammessi10·
@skalskip92 This is awesome. As a student, I'm still trying to understand if a model can already do the annotation, then why is the data curation and training another model important or needed? Is it to train a smaller specialised model? Could you please share your thoughts on this.
English
2
0
3
1.7K
SkalskiP
SkalskiP@skalskip92·
this is how you annotate data in 2025 (I've been told this egg demo may trigger some US people)
English
21
45
645
108.6K
Josh Leverette
Josh Leverette@coder543·
@ClassicMain @OfficialLoganK Gemini 1.5 Flash is $0.075/Mtok for prompts <= 128k tokens, $0.15/Mtok for prompts > 128k tokens, for input tokens. Gemini 2.0 Flash-Lite is *always* $0.075/Mtok. So it is half of the price of Gemini 1.5 Flash if you go above 128k tokens. Unless I’m very bad at reading things..?
English
1
0
3
177
ClassicMain
ClassicMain@ClassicMain·
@OfficialLoganK Why do you say that it costs the same, when it is more expensive (2x the price of gemini 1.5 flash) if you go above 128k token?
English
1
0
0
1.4K
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Gemini 2.0 Flash-Lite is now available for production use, at $0.075 / 1M input tokens and $0.30 / 1M output tokens (same cost as 1.5 Flash) and a great performance upgrade with a 1-line code change : ) developers.googleblog.com/en/start-build…
English
62
114
1.1K
135.5K
Josh Leverette
Josh Leverette@coder543·
@skalskip92 @onuralpszr Why do these object detectors all stop at such small parameter counts? The mAP values seem so low…? It seems like there would be use cases for a bigger, better object detector, even if it either can’t run in real time or requires an H200 to run in real time.
English
1
0
1
393
SkalskiP
SkalskiP@skalskip92·
"new" SOTA object detector, and it's NOT YOLOv12 D-FINE is a model released 3 months ago under Apache-2.0 license; I have no idea how it flew under my radar @onuralpszr thanks for adding it to leaderboard leaderboard link: leaderboard.roboflow.com ↓ more about architecture
SkalskiP tweet media
SkalskiP@skalskip92

YOLOv12 is out; I made fine-tuning tutorial instead of relying heavily on CNN-based architectures like its predecessors, YOLOv12 introduces “area attention” module, which strategically partitions the feature map to reduce the quadratic complexity of full self-attention. notebook link: github.com/roboflow/noteb…

English
18
71
626
51.8K
Josh Leverette
Josh Leverette@coder543·
@OfficialLoganK How is a preview different from experimental? And why isn’t the flash lite “preview” model under the Preview section of AI Studio? Seems confusing! But, I am excited that 2.0 Flash is finally GA!
English
0
0
2
58
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Introducing Gemini 2.0 Flash-Lite, our work horse model which is much stronger than 1.5 flash, at the same industry leading price. Flash-Lite is available to preview today, and will be generally available for production in the next few weeks. (3/n)
English
5
1
162
10.3K
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Today is the day I’ve been looking forward to for almost a year now… Say hello to Gemini 2.0 Flash, Gemini 2.0 Flash-Lite, and Gemini 2.0 Pro, our strongest lineup of models ever, available to all developers. 🧵 developers.googleblog.com/en/gemini-2-fa…
English
236
287
3.1K
293.7K
Josh Leverette
Josh Leverette@coder543·
@SciGuySpace I suspect I’d be more surprised if another company or country had landed a Falcon 9 by now! :P
English
0
0
0
513
Eric Berger
Eric Berger@SciGuySpace·
After tonight, SpaceX has now landed a Falcon 9 family booster 362 times. In the nearly nine years since the first orbital Falcon landing, no other company or country has done this even a single time. Hopefully that finally changes within the next 12 months.
English
60
252
3.9K
160.3K
Josh Leverette
Josh Leverette@coder543·
@skalskip92 It’s cool that it works better, but I feel like the prompt language is ambiguous. I expected “left lane” to be the leftmost lane out of the 4 lanes visible, not the left roadway/carriageway. Also, what is a “car”? It highlighted a truck, which is arguably not a car.
English
0
0
0
144