Andreas Hronopoulos

42 posts

Andreas  Hronopoulos banner
Andreas  Hronopoulos

Andreas Hronopoulos

@Ahronopoulos

San Diego, California Katılım Nisan 2010
2.4K Takip Edilen434 Takipçiler
Noah Frydberg | Tiktok Shop For Brands
Clawdbot + Kling = 550 videos per day Fully-realistic UGC ads — cinematic lighting, human motion, perfect pacing — powered by AI agents. UGC cost: $5 Production time: minutes Scale: instant One AI engine that creates, tests, and scales short-form ads automatically — nonstop. It’s live. Campaigns are scaling now. Comment + RT “AGENT” and I’ll DM you the full workflow. (Must be following)
Noah Frydberg | Tiktok Shop For Brands tweet media
English
789
385
1K
114K
Min Choi
Min Choi@minchoi·
Yikes. Llama 4 benchmarks looked insane but something feels off. Reddit leaks claim Meta cooked it. Here’s what people are saying:
Min Choi tweet media
English
35
29
257
119.6K
maz
maz@miralizain·
it’s never been easier to build distribution for b2c i started this page 6 days ago and i’m already averaging 15k views all with just AI UGC videos
maz tweet media
English
226
11
657
113.1K
Daniel Jeffries
Daniel Jeffries@Dan_Jeffries1·
Our Applied AI team at Kentauros (kentauros.ai) spends a lot of time tackling the problem of GUI navigation for agents. Today, we are releasing our first dataset, WaveUI-25k, and our first fine-tuned vision-language model: huggingface.co/collections/ag… Why bother about GUI Navigation? Won’t AI just use APIs or have a clear machine-learning interface? Yes, but APIs aren’t always available, and it will take many years for ML APIs to become a reality, so we’re meeting the world where it is today. It’s also because UIs are uniquely human, much like language. They’re a high level abstraction of the world and that makes them a fantastic training ground for advanced AI development because they’re complex, challenging, ever changing, and filled with endless edge cases. They’re a great approximation for AI that uses real-world models, planning, prediction and adaptation on the fly. Next week we’re releasing a new agent that navigates GUIs very well with a combination of heuristics, classical computer vision and MLLMs. Although we’re super excited about our Gen 2 agent release there are still some limitations. Rather than bolting an expert system onto a ML model, it would be absolutely ideal if models were good at knowing where and what to click on a UI. Unfortunately, we’ve discovered that today’s frontier models like GPT-4o, Claude Opus 3, Sonnet 3.5, Qwen2, and InternVL are absolutely terrible at three key things on UIs: 1) Returning coordinates 2) Moving the mouse 3) Drawing bounding boxes on GUIs The reasons shouldn't be surprising. They just weren't trained to do any of these things. That’s why, for our Gen 3 agent, we’re already working on a click model as the primary navigator, and we’ll fall back to classical techniques as needed. Our WaveUI-35k dataset is 25K well labeled images that you can use to train a click and coordinate model. We’re currently auto-labeling a larger version of the dataset that is roughly 80K images and we’ll release that soon. Why do you want this? Well, maybe you want a model that can navigate smartphones or desktops. Or maybe you want to control remote desktops the way our AgentSea platform can navigate a virtual machine like a human navigating TeamViewer but with the Agent in charge, the way we do with our ToolFuse protocol (github.com/agentsea/toolf…) and the AgentDesk (github.com/agentsea/agent…) Python packages. We built this dataset by collecting examples from other existing datasets. The three main sources are: - WebUI (uimodeling.github.io) - RoboFlow Website Screenshots (universe.roboflow.com/roboflow-gw7yv…) - GroundUI-18K (huggingface.co/datasets/agent…) The datasets are great and we have tremendous respect for the teams that gathered them but unfortunately many of the labels were less than ideal or just wrong. So we set about fixing these fantastic resources into a new, unified, well labeled dataset. We’ve preprocessed them to have a matching format and programmatically filtered out unwanted examples, such as duplicated, overlapping or low-quality elements. The resulting pre-processed dataset has ~80k examples at the object/element level -- i.e. each row consists of the bounding box coordinates of a UI object, along with the screenshot of underlying UI (and extra info, like source, screen size, etc). We then took ~25k examples from the dataset and annotated them to get the following additional info, per example: - name: A descriptive name of the UI element. - description: A long, detailed description of the element - type: The type of the UI element - OCR: OCR of the UI element. Set to null if no text is available - language: The language of the OCR text, if available. Set to null if no text is available - purpose: A general purpose of the element - expectation: An expectation of what will happen when you click this element You can explore the dataset at huggingface.co/spaces/agentse… We’ve also fine-tuned a very small PaliGemma model as a proof of concept. You can check it out here: huggingface.co/spaces/agentse… You can upload a screenshot and use the phrase “detect” XYZ. For example, on a screenshot of Google Flights: * detect the “Return” field * detect the Vacation rentals button * detect the flying plane in the picture To be clear, the PaliGemma model is too small and has too low a resolution to be state of the art. It’s a little too rigid in how it detects things and we want it to be more flexible. It sometimes nails detection tasks and it’s sometimes just wildly off base. But it is proof that a properly trained model with a well-annotated dataset can make headway on the gnarly problem of navigating GUIs on the wide and wonderful world of the web. Out next steps are simple: * Get the full dataset annotated and release it in the next few weeks (we only used 25k out of ~85k potential examples) * Training the variant of PaliGemma that has higher resolution. * Training a larger multimodal model like InternVL or Chameleon We’ll see you next week when we release G2 agents! See you then.
Daniel Jeffries tweet media
English
16
43
291
36.4K
Matteo V.
Matteo V.@teovito·
These pics were AI-generated using @everartai. Not perfect, but almost. The role of social media managers is getting easier. Someone still has to prompt the AI, generate pictures and videos, and build engagement, but very soon a social media manager will be able to manage more accounts simultaneously. As a business and agency owner, I see thousands of dollars in savings each year. Well done @skirano
Matteo V. tweet mediaMatteo V. tweet media
English
3
3
21
4.2K
Marc Hemeon
Marc Hemeon@hemeon·
No excuses. Take care of your health. Don’t wait until you’re 48 like me. - Weight train M-F - Yoga Sat-Sun - Eating Macros / Protein - Daily Meditation - 10 pages nonfiction a day - Accountability friends (txt weight ea day) - Joined @danmartell fitness program
Marc Hemeon tweet media
English
16
0
70
5.5K
Marc Hemeon
Marc Hemeon@hemeon·
I’m doing pushups everyday until I can do 100 in a row. I’m up to 26 unbroken. Tomorrow I’ll do 27.. Then keep adding 1 until I hire 100 in a row.
English
82
2
340
76.2K
Ben Lang
Ben Lang@benz145·
XR & VISION PRO NETWORKING THREAD 🚨 I see lots of old school XR people decrying the new wave of Vision Pro excitement with the old "we've been doing that for years!" But rather than cast new people as 'the other', we should all get connected so ideas new and old can synergize! So, reply to tell us who you are and what you're building or interested in. And retweet to spread this thread!
English
151
47
298
129.2K
Matt Schlicht
Matt Schlicht@MattPRD·
@Scobleizer Probably instagram and TikTok, but could also be someone new. Vision Pro isn’t perfect but volumetric videos are A+++
English
1
0
2
177
Matt Schlicht
Matt Schlicht@MattPRD·
Volumetric videos are insane to experience in Vision Pro. They have VERY real depth. Any day now we are going to be scrolling through viral volumetric content from around the world, teleporting into each other’s bodies for 15 seconds at a time.
English
2
0
3
1.9K
Andreas Hronopoulos retweetledi
Jason Gu
Jason Gu@jasonprompts·
Found this neat little number on r/artificial:
English
4
5
29
5.8K
Andreas  Hronopoulos
Andreas Hronopoulos@Ahronopoulos·
@nutlope What if you wanted to take a specific style of images and use that as the theme? Let's say a specific designer who has a style.
English
1
0
0
345
Hassan
Hassan@nutlope·
Announcing roomGPT! Redesign your room in seconds with AI! 100% free and open source. roomgpt.io
English
264
1.3K
8.4K
1.5M
Andreas  Hronopoulos
Andreas Hronopoulos@Ahronopoulos·
@devinaconley What about staking NFTs for music/video in the form of a license with the gysr managing the royalties?
English
1
0
0
0
Devin Conley
Devin Conley@devinaconley·
I started an EIP discussion on a standard interface to represent NFT "functional ownership" when held by another smart contract. This is very relevant for many DeFi + NFT use cases. Would love to hear any thoughts or feedback! ethereum-magicians.org/t/erc-standard…
English
6
5
24
0