Max K

1.6K posts

Max K

@max_does_tech

building @visionagents_ai. dev advocate @getstream_io, ex: @IBM, @Vonage public speaker, OS maintainer. python, vision AI, APIs

Katılım Aralık 2021

259 Takip Edilen364 Takipçiler

Max K retweetledi

Vision Agents@visionagents_ai·30 Nis

Real-time avatars are now available in Vision Agents with @Anam__ai to bring custom, responsive experiences to the world! Here's one cleverly deciding not to trust @stefanjblos with all its company's money... write up in 🧵

English

290

Max K@max_does_tech·23 Nis

sharing this to get out ahead of the controversy

Vision Agents@visionagents_ai

To pineapple or to not pineapple? @max_does_tech built a voice agent using @inworld_ai's Realtime API to answer the age old question 🍕 Quickstart: visionagents.ai/introduction/v… Example code: github.com/GetStream/Visi…

English

Max K@max_does_tech·21 Nis

it could register and track more stuff but i ran out of things to show it

Vision Agents@visionagents_ai

This entire example is only 87 lines of code 🤯 This fully-local processor pipeline with @huggingface Transformers object inference and segmentation is running 100% in realtime on a Macbook with the @visionagents_ai SDK!

English

Max K@max_does_tech·8 Nis

i did the huggingface, roboflow and nemotron stuff, maybe more, it's all a blur

Vision Agents@visionagents_ai

v0.5.0 of the Vision Agents SDK is out now! New in this release: run agents directly on your hardware, Anam avatar integration, way faster @DeepgramAI TTS, @AssemblyAI support, and much more. Details in 🧵👇

English

Max K retweetledi

Vision Agents@visionagents_ai·2 Nis

@Alibaba_Qwen It's amazing how responsive this model is! Really easy to work with too! x.com/visionagents_a…

Vision Agents@visionagents_ai

Trying the new @Alibaba_Qwen 3.6-Plus model: it's very capable and responsive, not to mention that it takes a lot of creative liberties... 😆

English

7.9K

Max K@max_does_tech·2 Nis

for the record, it WAS water

Vision Agents@visionagents_ai

Trying the new @Alibaba_Qwen 3.6-Plus model: it's very capable and responsive, not to mention that it takes a lot of creative liberties... 😆

English

Max K retweetledi

Vision Agents@visionagents_ai·27 Mar

Let’s build a vision + voice agent with the new Gemini 3.1 Flash Live model 🔥 Following this tutorial, you'll build a multimodal agent that helps you sell your used items! Check out the full tutorial on the @googledevs YouTube channel! youtube.com/watch?v=8lA6bF…

YouTube

English

1.9K

Max K@max_does_tech·26 Mar

Nice work @stefanjblos! No launch is complete without your dulcet tones

Vision Agents@visionagents_ai

Gemini 3.1 Flash Live just dropped, check out our demo with it! 🙌 This @googleaidevs native audio model now comes with lower latency, stronger instruction following and more reliable tool calling. We'll share the full demo soon!

English

Max K@max_does_tech·25 Mar

Now this one I'm proud of...

Vision Agents@visionagents_ai

Using @roboflow's Neural Architecture Search to make a video moderation bot with under 1.8ms average inference time! In this demo we're able to moderate video coming in on a video call so quickly that you almost don't see the offending content before it's censored 🙌

English

Max K@max_does_tech·20 Mar

wow kimi 2.5 is nailing these evals

Roboflow@roboflow

added Cursor cli with Composer 2 to the eval based on community feedback #2 coding agent for vision tasks updated blog post if you want to see the details blog.roboflow.com/best-coding-ag… would like to add more evals, share what issues you're having with vision tasks and we can add them

Filipino

277

Max K retweetledi

Vision Agents@visionagents_ai·19 Mar

Changelog update time! We've added: - @huggingface object detection support - @AssemblyAI support, including diarization Upgraded: - K8s deployment example - Real time transcript buffering and handling - SFU error handling Imminent: - Non-Stream video options

English

342

Max K retweetledi

Vision Agents@visionagents_ai·19 Mar

here's @XiaomiMiMo v2-omni being very very polite to me

English

167

Max K retweetledi

Vision Agents@visionagents_ai·18 Mar

Qwen 3.5 is awesome, it's crazy to have an 0.8B param model that's capable of real time vision understanding!

Stream@getstream_io

Build a local vision + voice agent with Qwen 3.5 Small. Runs entirely on your laptop using Ollama + Python. No cloud LLM calls. Stack: - Qwen 3.5 Small - Stream WebRTC - Deepgram + ElevenLabs - @visionagents_ai Thread with code ↓

English

174

Max K@max_does_tech·13 Mar

It was really satisfying to see my vibe coded frontend actually work

Vision Agents@visionagents_ai

Here's @NVIDIAAIDev Nemotron-3-Super-49B used in a real-time Vision Agents application as a fraud assistant! You can see every action it takes, and it's all happening in real-time 😮 Using the Nemotron model hosted on @baseten for reliability.

English

Max K retweetledi

Thariq@trq212·11 Mar

We just added /btw to Claude Code! Use it to have side chain conversations while Claude is working.

English

1.2K

1.6K

25.9K

2.8M

Max K@max_does_tech·9 Mar

Editing this one was like living a fever dream

Vision Agents@visionagents_ai

v0.4 of Vision Agents is here! Here's the most intense video possible to catch you up on the big hits 🤷‍♂️ Link to the VA GitHub (7K stars, join us!) in bio

English

Max K retweetledi

Vision Agents@visionagents_ai·6 Mar

Interested in computer vision? Here's 31 examples of it in action for ya getstream.io/blog/computer-…

English

176

Max K@max_does_tech·6 Mar

This thing is creative in how it wants to screw with me

Vision Agents@visionagents_ai

Telling stories with the new GPT-5.4 and got into a fight with it... I want to end the story but it wants to keep playing. This is a Vision Agents app where the video is being streamed by an SFU to a k8s instance running the actual agent

English

Max K@max_does_tech·3 Mar

actually pretty impressive how responsive this one is

Vision Agents@visionagents_ai

This is @GoogleDeepMind Gemini 3.1 Flash-Lite responding in real time in a Vision Agents app. It's able to handle a lot of different video understanding questions much more quickly than the previous gen... and this is on release day, when everyone's hitting the API! 😆

English

Max K@max_does_tech·2 Mar

Not bad for a realtime VLM that's only 4.5GB!

Vision Agents@visionagents_ai

Running the new @Alibaba_Qwen 3.5 2B parameter model LOCALLY here in a Vision Agents app. This is all in realtime and it can understand my handwriting and respond to questions... this wouldn't have been possible even MONTHS ago!

English

Keşfet

@Anam__ai @stefanjblos @Alibaba_Qwen @googledevs @huggingface @AssemblyAI @XiaomiMiMo @elonmusk