Andrej Baranovskij

5.3K posts

Andrej Baranovskij banner
Andrej Baranovskij

Andrej Baranovskij

@andrejusb

Sparrow Creator: Open-Source AI Doc Extraction 🚀 | ML/Oracle Dev | @katana_ml | Try: https://t.co/V0h9FMJzKb | https://t.co/nRgXgLL0mO

Katana ML 👉 Katılım Mart 2010
153 Takip Edilen6.7K Takipçiler
Andrej Baranovskij retweetledi
Minghao Wu
Minghao Wu@WuMinghao_nlp·
@natolambert I don't know how this dude got this conclusion. As far as I know, we are going to keep cooking and open-sourcing SOTA LLMs for the community.
English
2
2
21
1.1K
Andrej Baranovskij retweetledi
vLLM
vLLM@vllm_project·
🎉 Congrats to @MistralAI on releasing Mistral Small 4 — a 119B MoE model (6.5B active per token) that unifies instruct, reasoning, and coding in one checkpoint. Multimodal, 256K context. Day-0 support in vLLM — MLA attention backend, tool calling, and configurable reasoning mode, verified on @nvidia GPUs. 🔗 huggingface.co/mistralai/Mist…
vLLM tweet media
Mistral AI for Developers@MistralDevs

🔥 Meet Mistral Small 4: One model to do it all. ⚡ 128 experts, 119B total parameters, 256k context window ⚡ Configurable Reasoning ⚡ Apache 2.0 ⚡ 40% faster, 3x more throughput Our first model to unify the capabilities of our flagship models into a single, versatile model.

English
7
37
382
28.7K
Andrej Baranovskij retweetledi
Andrej Baranovskij retweetledi
vLLM
vLLM@vllm_project·
vLLM Production Stack now has an end-to-end deployment guide on @OracleCloud OKE 🚀 Self-hosted LLM inference on OCI bare metal GPUs (A10, A100, H100) — from provisioning to first request. OCI deployment scripts are contributed and maintained in the official production-stack repo. Great option for teams that need full control over GPU drivers, CUDA versions, and model configs while keeping cloud elasticity. Thanks @OracleDevs!
Oracle Developers@OracleDevs

This tutorial walks you through deploying the vLLM Production Stack on OKE—from infrastructure provisioning to running your first inference request. social.ora.cl/6012hNgEp

English
3
9
72
6.3K
moonliteTech
moonliteTech@MoonliteTechLLC·
@andrejusb you used a png, is it able to grab from a pdf as well, or would you convert it to png first?
English
1
0
0
110
Andrej Baranovskij
Andrej Baranovskij@andrejusb·
Qwen 3.5 Test for JSON Structured Data Extraction Quick test of the new Qwen 3.5 models on JSON structured data extraction from images. Testing and comparing results for 9B FP16, 27B Q8, and A3B 35B Q8. The 35B Q8 model wins in terms of both speed and accuracy. Test was run on MLX-VLM using a Mac Mini M4 Pro with 64GB RAM Video: youtube.com/watch?v=zCoBF1… Code: github.com/katanaml/sparr… Sparrow UI: sparrow.katanaml.io
YouTube video
YouTube
Andrej Baranovskij tweet media
English
3
4
55
4.3K
Thanh Pham
Thanh Pham@runsonai·
@andrejusb I’m guessing it shows that MoE works well with 35B. Same memory as 27B (dense) and yet faster too.
English
1
0
1
184
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
@andrejusb Thanks Andrej and keep pushing with Sparrow, you have built something great there!
English
1
0
1
73
Andrej Baranovskij
Andrej Baranovskij@andrejusb·
@ivanfioravanti LinkedIn keeps sending profile viewers info. Totally useless, what I supposed to do - contact people who are viewing my profile, or what. lol :)
English
0
0
0
69
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
LinkedIn is so terrible! It’s beyond cringe! Why most people love to appear so dumb publicly???
English
12
0
41
2.5K
Andrej Baranovskij retweetledi
Valeriy M., PhD, MBA, CQF
Valeriy M., PhD, MBA, CQF@predict_addict·
Why Dividing by 5 Is Easy A simple arithmetic observation. To divide by 5, multiply by 2 and divide by 10. Example: 85 ÷ 5 Multiply by 2 → 170 Divide by 10 → 17 Why does this work? Because 5 = 10 / 2 So dividing by 5 is the same as multiplying by 2 and then dividing by 10.
English
4
5
46
2.4K
Julien Chaumond
Julien Chaumond@julien_c·
get PRO on @huggingface and instantly 10x your storage to 1 TB private + 10 TB public ...for $9 a month 😮 a deal this good should be illegal
Julien Chaumond tweet media
English
12
13
159
29.3K
Andrej Baranovskij retweetledi
Robert Scoble
Robert Scoble@Scobleizer·
49 years ago my dad bought an Apple II and my junior high, Hyde, in Cupertino, became one of the first schools to get an Apple II. I was one of five kids in its first computer club. 48 years ago my mom got a job building Apple II motherboards. She paid me and my brothers to help make them. Learned how to solder on them. Since then my life has always been affected by Apple. Siri was launched in my house. Was the first to buy an iPhone at Steve Jobs store in Palo Alto. Wrote two books about spatial computing because it kept buying startups I interviewed. Studied every morning for a semester with Apple cofounder @stevewoz. Great friend Andy Grignon was one of first 12 to build the iPhone. It has brought me so much magic. It is why I am still in love with new things and the people who build them today. Happy 50th!
Tim Cook@tim_cook

April 1st marks 50 years of Apple. Thank you to everyone who’s been a part of our journey. apple.com/50-years-of-th… #Apple50

English
36
68
983
83.9K
Andrej Baranovskij retweetledi
vLLM
vLLM@vllm_project·
🚀 vLLM v0.17.0 is here! 699 commits from 272 contributors (48 new!) This is a big one. Highlights: ⚡ FlashAttention 4 integration 🧠 Qwen3.5 model family with GDN (Gated Delta Networks) 🏗️ Model Runner V2 maturation: Pipeline Parallel, Decode Context Parallel, Eagle3 + CUDA graphs 🎛️ New --performance-mode flag: balanced / interactivity / throughput 💾 Weight Offloading V2 with prefetching 🔀 Elastic Expert Parallelism Milestone 2 🔧 Quantized LoRA adapters (QLoRA) now loadable directly
vLLM tweet media
English
22
86
948
60.8K
Andrej Baranovskij retweetledi
Prince Canuma
Prince Canuma@Prince_Canuma·
mlx-vlm v0.4.0 is here 🚀 New models: • Moondream3 by @vikhyatk • Phi-4-reasoning-vision by @MSFTResearch • Phi4-multimodal-instruct by @MSFTResearch • Minicpm-o-2.5 (except tts) by @OpenBMB What's new: → Full weight finetuning + ORPO h/t @ActuallyIsaak → Tool calling in server → Thinking budget support → KV cache quantization for server → Fused SDPA attention optimization → Streaming & OpenAI-compatible endpoint improvements Fixes: • Gemma3n • Qwen3-VL • Qwen3.5-MoE • Qwen3-Omni h/t @ronaldseoh • Batch inference, and more. Big shoutout to 7 new contributors this release! 🙌 Get started today: > uv pip install -U mlx-vlm Leave us a star ⭐️ github.com/Blaizzy/mlx-vl…
Prince Canuma tweet media
English
6
20
126
15.1K
Andrej Baranovskij retweetledi
Steve the Beaver
Steve the Beaver@beaversteever·
incredible that we built all this RAG and vector database stuff and it turns out that grep from 1973 works better than all that
English
182
363
8.6K
502.9K