Kostas Karolemeas

1.3K posts

Kostas Karolemeas

@VoxelPerfect

Agentic AI Architect | 2x founder

Athens, Greece Katılım Nisan 2010

338 Takip Edilen219 Takipçiler

Kostas Karolemeas retweetledi

Brett Adcock@adcock_brett·18 May

Congrats to Aime!! He said his left forearm is basically broken 😂 Final scores: → F.03: 12,732 packages (2.83 seconds/package) → Aime: 12,924 packages (2.79 seconds/package) This is the last time a human will ever win

English

822

940

18.9K

4.5M

Kostas Karolemeas@VoxelPerfect·14 May

@neural_avb Women are sooo practical!

English

AVB@neural_avb·13 May

Told my girlfriend, "current $20 Codex sub is not enough. I need to go $100/mo" She suggested, "why don't you open a second $20 Codex account" I feel so dumb🤔

English

244

6.1K

929.8K

Kostas Karolemeas@VoxelPerfect·14 May

@Surendar__05 that happens when the core business is advertising.

English

Surendar@Surendar__05·14 May

this is actually sad.

English

2.3K

Kostas Karolemeas@VoxelPerfect·14 May

@jmao0001 @ash_twtz I know. I mentioned harness (the coding agent) + LLM.

English

jmao@jmao0001·14 May

@VoxelPerfect @ash_twtz OpenCode is great, I mean, as a coding agent, not as an LLM, so it does not compete with LLMs.

English

Mr Ash@ash_twtz·14 May

Why doesn’t Google have a coding specific Gemini model yet?

English

382

30.7K

Kostas Karolemeas@VoxelPerfect·14 May

@steipete you are about to release OpenHello?

English

2.4K

Peter Steinberger 🦞@steipete·14 May

hello

English

357

1.5K

202.4K

Kostas Karolemeas@VoxelPerfect·14 May

@Star_Knight12 16.2.4 is too old

English

850

Prasenjit@Star_Knight12·14 May

Next.js just got its worst vulnerability ever, CVSS 8.6. → affects versions 13.4.13+, 14.x, 15.x, and 16.0.0–16.2.4 → attackers can access your internal services, cloud credentials, API keys, and admin panels → no authentication needed → one crafted request is all it takes → roughly 79,000 instances are exploitable right now → vercel-hosted apps are safe, self-hosted are not upgrade to 15.5.16 or 16.2.5 immediately.

English

123

319

2.5K

859.5K

Kostas Karolemeas@VoxelPerfect·14 May

I was about time!

OpenAI@OpenAI

Rolling out today as a preview on iOS and Android in all supported regions. Support for connecting your phone to the Codex app on Windows is coming soon. openai.com/index/work-wit…

English

Kostas Karolemeas@VoxelPerfect·14 May

@jmao0001 @ash_twtz Is it better? For large complex projects you need a combination of a good model and a good agent harness. Who cares if it is proprietary or open-source as long as you get the best outcome.

English

jmao@jmao0001·14 May

@VoxelPerfect @ash_twtz They didn't. It's just a game where no one gives a damn about the winner because OpenCode, for that, is already better than all big companies like Anthropic and it's open-source

English

Kostas Karolemeas@VoxelPerfect·14 May

Be honest, what is this "Be honest, which x is best at y?" allover X?

English

Kostas Karolemeas@VoxelPerfect·14 May

@ash_twtz Their core business is Ads. They built Android out of the fear of loosing mobile as an Ad platform. They cannot succeed with that mentality.

English

168

Mr Ash@ash_twtz·14 May

@VoxelPerfect many said the same about bard

English

1.7K

Kostas Karolemeas@VoxelPerfect·14 May

@KaiXCreator GPT 5.5 (Codex used to be a separate model now included in 5.5)

English

143

Kaito@KaiXCreator·14 May

What’s your go-to AI model today? - Gemini 3.1 - Opus 4.7 - Sonnet 4.6 - Codex - GPT 5.5

English

149

110

11.7K

Kostas Karolemeas@VoxelPerfect·14 May

@Taniyatweets_ MacBook with no 2nd thought!

English

Taniya@Taniyatweets_·14 May

Which one are you picking for daily work? MacBook or ThinkPad

English

150

5.3K

Kostas Karolemeas@VoxelPerfect·14 May

@haider1 Is it a matter of a world model or just closing the loop by letting the model see the result and improve based on feedback like a human would do? Even humans that may be able to predict the consequences of their actions, still rely on feedback.

English

620

Haider.@haider1·13 May

Yann LeCun says you cannot build a reliable agentic system without a world model LLMs don't have world models. They can't predict the consequences of their actions before taking them "they just act, and whatever happens next is someone else's problem" Without that, it's not intelligence

English

274

366

2.7K

329.8K

Kostas Karolemeas@VoxelPerfect·14 May

@HopeForTheWorl1 @OpenAIDevs @cursor_ai Yes this is available already in Github Copilot. However, you can say that you do not want any code changes and it will respect that.

English

VibeCoder@HopeForTheWorl1·14 May

@OpenAIDevs We need an Ask Mode in Codex like with @cursor_ai ! Sometimes the agent starts coding right away even though I wanted it to discuss the issue first with me. Plan mode on the other side seems like an overkill if you only have a small question. Anyone agree with me here?

English

3.2K

OpenAI Developers@OpenAIDevs·14 May

2000 developers reached out in 3 hours. Let's build things.

OpenAI Developers@OpenAIDevs

Want to (officially) use Codex at work? Send this post to your CTO to bring your team to Codex. Eligible enterprise customers who switch in the next 30 days get 2 free months of Codex usage for new users.

English

392.7K

Kostas Karolemeas@VoxelPerfect·13 May

@elonmusk How many languages does it support in good quality?

English

Elon Musk@elonmusk·12 May

Grok Voice is #1!

Artificial Analysis@ArtificialAnlys

Announcing agentic performance benchmarking for Speech to Speech models on Artificial Analysis. We use 𝜏-Voice to measure tool calling and customer interaction voice agent capabilities in realistic customer service scenarios Even the strongest Speech to Speech (S2S) models today resolve only about half of realistic customer service scenarios end-to-end - a meaningful gap relative to frontier text-based agents on the same tasks. Voice channels introduce significant complexity: challenging accents, background noise, and packet loss, all while requiring fast responses, consistency across long multi-turn conversations, and reliable tool use. Performance also varies considerably by audio condition: in clean audio some models perform notably better, but realistic conditions continue to pose a challenge. Conversation duration also varies meaningfully across models, with implications for both customer experience and operational cost. About 𝜏-Voice: Our Agentic Performance benchmark is based on 𝜏-Voice (Ray, Dhandhania, Barres & Narasimhan, 2026), which extends 𝜏²-bench into the voice modality to evaluate S2S models on realistic customer service tasks. It measures multi-turn instruction following, support of a simulated customer through a complete interaction, and tool use against simulated customer service systems. The simulated user combines an LLM-driven decision model with realistic audio synthesis: diverse accents, background noise, and packet loss modelled on real network conditions. This complements our Big Bench Audio benchmark measuring intelligence and Conversational Dynamics (Full Duplex Bench subset) benchmark measuring conversational naturalness. Scores are the average of three independent pass @1 trials. We evaluate under realistic audio conditions using the 𝜏²-bench base task split across three domains: ➤ Airline (50 scenarios): e.g., changing a flight, rebooking under policy constraints ➤ Retail (114 scenarios): e.g., disputing a charge, processing a return ➤ Telecom (114 scenarios): e.g., resolving a billing issue, troubleshooting a service problem Task success is determined by deterministic checks against expected actions and final database state, consistent with the 𝜏²-bench evaluator. Key results: xAI's Grok Voice Think Fast 1.0 is the clear leader at 52.1%, averaging 5.6 minutes per conversation, the second-longest overall. OpenAI's GPT-Realtime-2 (High) (39.8%, 3.0 min) and GPT-Realtime-1.5 (38.8%, 4.8 min) follow, with Gemini 3.1 Flash Live Preview - High close behind at 37.7% (3.8 min). Speech to Speech is a fast evolving modality and we expect movement in rankings as we continue to add new models with these capabilities, and model robustness improves. Congratulations @xAI @elonmusk! See below for further detail ⬇️

English

2.4K

5.1K

23.6K

8.5M

Kostas Karolemeas@VoxelPerfect·13 May

I tried both and eventually chose Codex as a more complete application I can rely on for complex tasks.

English

Kostas Karolemeas@VoxelPerfect·13 May

It’s fascinating watching Codex vs Claude turn into the Apple vs Google of AI coding tools. People aren’t just comparing capabilities anymore - they’re forming identities, tribes, and loyalties.

English

Kostas Karolemeas@VoxelPerfect·13 May

@adcock_brett @hark_labs It was about time for something new!

English

Brett Adcock@adcock_brett·24 Mar

@hark_labs Welcome Hark

English

Hark@hark_labs·24 Mar

Introducing Hark, an AI lab building the most advanced, personal intelligence in the world. We're creating intelligent foundation models paired with next generation hardware designed to serve as a universal interface between humans and machines. hark.com

English

799

87K

Kostas Karolemeas retweetledi

Figure@Figure_robot·8 May

We taught two F.03 robots to clean a room and make a bed in under 2 minutes - fully autonomous.

English

672

1.1K

8.3K

1.4M

Keşfet

@neural_avb @Surendar__05 @jmao0001 @ash_twtz @steipete @Star_Knight12 @KaiXCreator @elonmusk