Andreas Hronopoulos

42 posts

Andreas Hronopoulos

@Ahronopoulos

San Diego, California Katılım Nisan 2010

2.4K Takip Edilen434 Takipçiler

Andreas Hronopoulos@Ahronopoulos·10 Şub

@maverickecom agent

English

Noah Frydberg | Tiktok Shop For Brands@maverickecom·5 Şub

Clawdbot + Kling = 550 videos per day Fully-realistic UGC ads — cinematic lighting, human motion, perfect pacing — powered by AI agents. UGC cost: $5 Production time: minutes Scale: instant One AI engine that creates, tests, and scales short-form ads automatically — nonstop. It’s live. Campaigns are scaling now. Comment + RT “AGENT” and I’ll DM you the full workflow. (Must be following)

Noah Frydberg | Tiktok Shop For Brands tweet media

English

789

385

114K

Andreas Hronopoulos@Ahronopoulos·8 Nis

@minchoi @AskPerplexity does this include video

English

Min Choi@minchoi·7 Nis

Yikes. Llama 4 benchmarks looked insane but something feels off. Reddit leaks claim Meta cooked it. Here’s what people are saying:

English

257

119.6K

Andreas Hronopoulos@Ahronopoulos·3 Oca

@miralizain @dream_masheen im interested

English

119

maz@miralizain·2 Oca

@dream_masheen Sent you a dm

English

1.1K

maz@miralizain·2 Oca

it’s never been easier to build distribution for b2c i started this page 6 days ago and i’m already averaging 15k views all with just AI UGC videos

English

226

657

113.1K

Andreas Hronopoulos@Ahronopoulos·12 Tem

@Dan_Jeffries1 ⭐️

QME

Daniel Jeffries@Dan_Jeffries1·11 Tem

Our Applied AI team at Kentauros (kentauros.ai) spends a lot of time tackling the problem of GUI navigation for agents. Today, we are releasing our first dataset, WaveUI-25k, and our first fine-tuned vision-language model: huggingface.co/collections/ag… Why bother about GUI Navigation? Won’t AI just use APIs or have a clear machine-learning interface? Yes, but APIs aren’t always available, and it will take many years for ML APIs to become a reality, so we’re meeting the world where it is today. It’s also because UIs are uniquely human, much like language. They’re a high level abstraction of the world and that makes them a fantastic training ground for advanced AI development because they’re complex, challenging, ever changing, and filled with endless edge cases. They’re a great approximation for AI that uses real-world models, planning, prediction and adaptation on the fly. Next week we’re releasing a new agent that navigates GUIs very well with a combination of heuristics, classical computer vision and MLLMs. Although we’re super excited about our Gen 2 agent release there are still some limitations. Rather than bolting an expert system onto a ML model, it would be absolutely ideal if models were good at knowing where and what to click on a UI. Unfortunately, we’ve discovered that today’s frontier models like GPT-4o, Claude Opus 3, Sonnet 3.5, Qwen2, and InternVL are absolutely terrible at three key things on UIs: 1) Returning coordinates 2) Moving the mouse 3) Drawing bounding boxes on GUIs The reasons shouldn't be surprising. They just weren't trained to do any of these things. That’s why, for our Gen 3 agent, we’re already working on a click model as the primary navigator, and we’ll fall back to classical techniques as needed. Our WaveUI-35k dataset is 25K well labeled images that you can use to train a click and coordinate model. We’re currently auto-labeling a larger version of the dataset that is roughly 80K images and we’ll release that soon. Why do you want this? Well, maybe you want a model that can navigate smartphones or desktops. Or maybe you want to control remote desktops the way our AgentSea platform can navigate a virtual machine like a human navigating TeamViewer but with the Agent in charge, the way we do with our ToolFuse protocol (github.com/agentsea/toolf…) and the AgentDesk (github.com/agentsea/agent…) Python packages. We built this dataset by collecting examples from other existing datasets. The three main sources are: - WebUI (uimodeling.github.io) - RoboFlow Website Screenshots (universe.roboflow.com/roboflow-gw7yv…) - GroundUI-18K (huggingface.co/datasets/agent…) The datasets are great and we have tremendous respect for the teams that gathered them but unfortunately many of the labels were less than ideal or just wrong. So we set about fixing these fantastic resources into a new, unified, well labeled dataset. We’ve preprocessed them to have a matching format and programmatically filtered out unwanted examples, such as duplicated, overlapping or low-quality elements. The resulting pre-processed dataset has ~80k examples at the object/element level -- i.e. each row consists of the bounding box coordinates of a UI object, along with the screenshot of underlying UI (and extra info, like source, screen size, etc). We then took ~25k examples from the dataset and annotated them to get the following additional info, per example: - name: A descriptive name of the UI element. - description: A long, detailed description of the element - type: The type of the UI element - OCR: OCR of the UI element. Set to null if no text is available - language: The language of the OCR text, if available. Set to null if no text is available - purpose: A general purpose of the element - expectation: An expectation of what will happen when you click this element You can explore the dataset at huggingface.co/spaces/agentse… We’ve also fine-tuned a very small PaliGemma model as a proof of concept. You can check it out here: huggingface.co/spaces/agentse… You can upload a screenshot and use the phrase “detect” XYZ. For example, on a screenshot of Google Flights: * detect the “Return” field * detect the Vacation rentals button * detect the flying plane in the picture To be clear, the PaliGemma model is too small and has too low a resolution to be state of the art. It’s a little too rigid in how it detects things and we want it to be more flexible. It sometimes nails detection tasks and it’s sometimes just wildly off base. But it is proof that a properly trained model with a well-annotated dataset can make headway on the gnarly problem of navigating GUIs on the wide and wonderful world of the web. Out next steps are simple: * Get the full dataset annotated and release it in the next few weeks (we only used 25k out of ~85k potential examples) * Training the variant of PaliGemma that has higher resolution. * Training a larger multimodal model like InternVL or Chameleon We’ll see you next week when we release G2 agents! See you then.

English

291

36.4K

Andreas Hronopoulos@Ahronopoulos·19 May

@teovito @skirano @everartai @skirano @everartai any suggestions on how to improve writing with negative prompts?

English

Matteo V.@teovito·17 May

@skirano @everartai Trained the model from this set

English

363

Matteo V.@teovito·17 May

These pics were AI-generated using @everartai. Not perfect, but almost. The role of social media managers is getting easier. Someone still has to prompt the AI, generate pictures and videos, and build engagement, but very soon a social media manager will be able to manage more accounts simultaneously. As a business and agency owner, I see thousands of dollars in savings each year. Well done @skirano

English

4.2K

Andreas Hronopoulos@Ahronopoulos·17 May

@hemeon @danmartell Stud.

English

Marc Hemeon@hemeon·16 May

No excuses. Take care of your health. Don’t wait until you’re 48 like me. - Weight train M-F - Yoga Sat-Sun - Eating Macros / Protein - Daily Meditation - 10 pages nonfiction a day - Accountability friends (txt weight ea day) - Joined @danmartell fitness program

English

5.5K

Andreas Hronopoulos@Ahronopoulos·26 Mar

@hemeon ⚡️

QME

Marc Hemeon@hemeon·25 Mar

I’m doing pushups everyday until I can do 100 in a row. I’m up to 26 unbroken. Tomorrow I’ll do 27.. Then keep adding 1 until I hire 100 in a row.

English

340

76.2K

Andreas Hronopoulos@Ahronopoulos·20 Şub

@benz145 Naughty America (Real Girls Now)

English

Ben Lang@benz145·20 Şub

XR & VISION PRO NETWORKING THREAD 🚨 I see lots of old school XR people decrying the new wave of Vision Pro excitement with the old "we've been doing that for years!" But rather than cast new people as 'the other', we should all get connected so ideas new and old can synergize! So, reply to tell us who you are and what you're building or interested in. And retweet to spread this thread!

English

151

298

129.2K

Andreas Hronopoulos@Ahronopoulos·5 Şub

@MattPRD @Scobleizer What volumetric video are you watching on Vision Pro?

English

Matt Schlicht@MattPRD·5 Şub

@Scobleizer Probably instagram and TikTok, but could also be someone new. Vision Pro isn’t perfect but volumetric videos are A+++

English

177

Matt Schlicht@MattPRD·5 Şub

Volumetric videos are insane to experience in Vision Pro. They have VERY real depth. Any day now we are going to be scrolling through viral volumetric content from around the world, teleporting into each other’s bodies for 15 seconds at a time.

English

1.9K

Andreas Hronopoulos@Ahronopoulos·1 Şub

@WebAMV Does this work on any of the headsets for webxr?

English

todaylg@WebAMV·31 Oca

Using gpu to build bvh based on WebGPU, there is no need to wait for build time when switching large models for viewing, and the experience of Viewer becomes smoother👏 Try it too😆: lgltracer.com/viewer/index.h… #webgpu #raytracing #threejs #viewer

English

124

11.4K

Andreas Hronopoulos@Ahronopoulos·5 Tem

@Dan_Jeffries1 <

Jason Gu@jasonprompts

Found this neat little number on r/artificial:

QAM

106

Daniel Jeffries@Dan_Jeffries1·4 Tem

Jen AI is coming. youtube.com/watch?v=1vZRVF…

YouTube

English

762

Andreas Hronopoulos retweetledi

Jason Gu@jasonprompts·5 Tem

Found this neat little number on r/artificial:

English

5.8K

Andreas Hronopoulos@Ahronopoulos·4 Mar

@nutlope What if you wanted to take a specific style of images and use that as the theme? Let's say a specific designer who has a style.

English

345

Hassan@nutlope·2 Mar

Announcing roomGPT! Redesign your room in seconds with AI! 100% free and open source. roomgpt.io

English

264

1.3K

8.4K

1.5M

Andreas Hronopoulos retweetledi

ladies and gentlemen, the weekend 😌@CraigWeekend·19 Şub

ZXX

216

13.6K

59.1K

Andreas Hronopoulos@Ahronopoulos·22 Ara

@jonathanmarcus Congratulations🌟

English

jonathanmarcus@jonathanmarcus·21 Ara

In this moment, I am euphoric.

Kraken@krakenfx

As we gear up for 2022, Kraken is celebrating its continued growth and success with the expansion of its ‘staking’ tentacle through an acquisition of Staked. 🥳 Check out our blog for more details about this exciting deal! blog.kraken.com/post/12265/kra…

English

Andreas Hronopoulos retweetledi

ladies and gentlemen, the weekend 😌@CraigWeekend·30 Eki

ZXX

348

26.1K

127.5K

Andreas Hronopoulos retweetledi

ladies and gentlemen, the weekend 😌@CraigWeekend·23 Eki

ZXX

359

24.9K

122.2K

Andreas Hronopoulos retweetledi

ladies and gentlemen, the weekend 😌@CraigWeekend·16 Eki

ZXX

416

27.5K

134.7K

Andreas Hronopoulos retweetledi

ladies and gentlemen, the weekend 😌@CraigWeekend·2 Eki

ZXX

411

28.1K

135.4K

Andreas Hronopoulos@Ahronopoulos·26 Eyl

@devinaconley What about staking NFTs for music/video in the form of a license with the gysr managing the royalties?

English

Devin Conley@devinaconley·24 Eyl

I started an EIP discussion on a standard interface to represent NFT "functional ownership" when held by another smart contract. This is very relevant for many DeFi + NFT use cases. Would love to hear any thoughts or feedback! ethereum-magicians.org/t/erc-standard…

English

Keşfet

@maverickecom @minchoi @AskPerplexity @miralizain @dream_masheen @Dan_Jeffries1 @teovito @skirano