
Max K
1.6K posts

Max K
@max_does_tech
building @visionagents_ai. dev advocate @getstream_io, ex: @IBM, @Vonage public speaker, OS maintainer. python, vision AI, APIs


To pineapple or to not pineapple? @max_does_tech built a voice agent using @inworld_ai's Realtime API to answer the age old question 🍕 Quickstart: visionagents.ai/introduction/v… Example code: github.com/GetStream/Visi…

This entire example is only 87 lines of code 🤯 This fully-local processor pipeline with @huggingface Transformers object inference and segmentation is running 100% in realtime on a Macbook with the @visionagents_ai SDK!

v0.5.0 of the Vision Agents SDK is out now! New in this release: run agents directly on your hardware, Anam avatar integration, way faster @DeepgramAI TTS, @AssemblyAI support, and much more. Details in 🧵👇

Trying the new @Alibaba_Qwen 3.6-Plus model: it's very capable and responsive, not to mention that it takes a lot of creative liberties... 😆



Gemini 3.1 Flash Live just dropped, check out our demo with it! 🙌 This @googleaidevs native audio model now comes with lower latency, stronger instruction following and more reliable tool calling. We'll share the full demo soon!

Using @roboflow's Neural Architecture Search to make a video moderation bot with under 1.8ms average inference time! In this demo we're able to moderate video coming in on a video call so quickly that you almost don't see the offending content before it's censored 🙌

added Cursor cli with Composer 2 to the eval based on community feedback #2 coding agent for vision tasks updated blog post if you want to see the details blog.roboflow.com/best-coding-ag… would like to add more evals, share what issues you're having with vision tasks and we can add them


Build a local vision + voice agent with Qwen 3.5 Small. Runs entirely on your laptop using Ollama + Python. No cloud LLM calls. Stack: - Qwen 3.5 Small - Stream WebRTC - Deepgram + ElevenLabs - @visionagents_ai Thread with code ↓

Here's @NVIDIAAIDev Nemotron-3-Super-49B used in a real-time Vision Agents application as a fraud assistant! You can see every action it takes, and it's all happening in real-time 😮 Using the Nemotron model hosted on @baseten for reliability.

v0.4 of Vision Agents is here! Here's the most intense video possible to catch you up on the big hits 🤷♂️ Link to the VA GitHub (7K stars, join us!) in bio


Telling stories with the new GPT-5.4 and got into a fight with it... I want to end the story but it wants to keep playing. This is a Vision Agents app where the video is being streamed by an SFU to a k8s instance running the actual agent

This is @GoogleDeepMind Gemini 3.1 Flash-Lite responding in real time in a Vision Agents app. It's able to handle a lot of different video understanding questions much more quickly than the previous gen... and this is on release day, when everyone's hitting the API! 😆

Running the new @Alibaba_Qwen 3.5 2B parameter model LOCALLY here in a Vision Agents app. This is all in realtime and it can understand my handwriting and respond to questions... this wouldn't have been possible even MONTHS ago!
