Edu Mello

690 posts

Edu Mello

@mello20760

Musician & AI Explorer. Musicamania AI Lab

Sao Paulo Katılım Haziran 2024

491 Takip Edilen105 Takipçiler

Edu Mello@mello20760·3d

@PolymarketMoney x.com/mello20760/sta…

Edu Mello@mello20760

iPad and iPhone: The First Advanced Popular Robotic Brains The most capable robotic brain ever mass-produced isn't from a robotics lab. It's in 2.2 billion pockets. LiDAR at 60Hz. Neural Engine at 38 TOPS. On-device LLMs. Accelerometer, magnetometer, barometer, UWB — factory-calibrated, API-ready. A $50 robotic arm with iPad correcting trajectories via LiDAR achieves sub-millimeter precision. That used to cost $15,000. We're running Qwen3-4B natively on M4. No cloud. No latency. Consumer robotics won't come from new hardware. It'll come from engineers looking at what they already carry. EN - drive.google.com/file/d/1l9lbD_… PT -drive.google.com/file/d/1SCLWAx… @Scobleizer @Apple

QME

Polymarket Money@PolymarketMoney·3d

$AAPL new hardware chief Johny Srouji is reorganizing hardware development to speed up work on future devices. The biggest shift moves product design oversight from Kate Bergeron to Shelly Goldberg while Bergeron now leads product reliability across Apple devices.

English

207

10.9K

Edu Mello@mello20760·3d

That's incredible Tim, accessibility through natural language is exactly where AI should shine. We built Sigma Vox on top of Apple Intelligence with this in mind. A voice assistant that answers from custom knowledge files, offline, in 8 languages, zero hallucination. youtu.be/zyZwd49I8y4

YouTube

English

1.5K

Tim Cook@tim_cook·3d

Building accessibility into our products is central to everything we do. This year, we’re proud to deliver more intuitive accessibility features with natural language powered by Apple Intelligence. #GAAD apple.com/newsroom/2026/…

English

173

328

2.8K

174.7K

Edu Mello@mello20760·6d

If you own a Mac, there's a fun new app for bringing all kinds of external sources onto your desktop — with or without a USB capture card. With a USB capture card: • TV boxes (Apple TV, Fire TV, Chromecast) • PlayStation, Xbox, Nintendo Switch • DSLRs & mirrorless cameras • Pretty much anything with HDMI out No capture card needed: • USB webcams • iPhone or iPad via Continuity Camera Hear the audio, watch in a floating PIP window that stays on top while you work, and record everything to MOV — ready to drop into your videos. HDMI Bar — now on the Mac App Store. apps.apple.com/app/hdmi-bar/i…

English

Edu Mello@mello20760·15 May

that's exactly where the advanced version is going. Probabilistic training gives you generalization, which is powerful but uncontrollable. Better retrieval with deterministic grounding gives you precision and auditability. We published a paper exploring this idea further, starting from verified axioms instead of statistical patterns. Early results are promising. (Axiom Engine)

English

Calvin Thurman@cet3001·14 May

@mello20760 @xai The deterministic read approach is smart design. If the model always has access to the source, you don't need to hope the fine-tuning held. The advanced version question is interesting. What problem does probabilistic training solve that you can't solve with better retrieval?

English

xAI@xai·24 Nis

Introducing Grok Voice Think Fast 1.0 A state-of-the-art voice model built for complex, multi-step workflows with snappy responses and high accuracy. It takes the top spot on the Tau Voice Bench and handles real-world messiness like noise, accents, and interruptions better than any other model in the world. x.ai/news/grok-voic…

English

1.2K

2.8K

24.1K

418.2M

Edu Mello@mello20760·12 May

Customer-facing service in general. Hotels loading their full guide so guests ask anything in their language at any hour. Small businesses with menus, catalogs, opening hours. Also exam prep and product training where accuracy to the source material really matters. Privacy and control matter. I think that's one of the big questions we'll see the major players trying to solve next.

English

Calvin Thurman@cet3001·12 May

@mello20760 @xai That constraint approach is smart. Grounding it in the actual file removes the guesswork entirely. The offline + 8 language support is a real differentiator. What's the biggest use case you're seeing people reach for first?

English

Edu Mello@mello20760·12 May

@thinkymachines That's amazing! But we did an offline version without API and hallucinations before OpenAI, Grok Fast and Thinking Machines. Sigma Vox. We're working on a new version too. youtu.be/zyZwd49I8y4

YouTube

English

Thinking Machines@thinkymachines·11 May

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

English

445

1.9K

15.1K

7.2M

Edu Mello@mello20760·11 May

@r0ck3t23 Thp Genome would help. chatgpt.com/g/g-695b7b1c2c…

English

Dustin@r0ck3t23·10 May

Demis Hassabis just described what might be the end of genetic disease. Google DeepMind built a system called AlphaGenome. It reads human DNA the way a software engineer reads source code. Every letter. Every position. Every mutation across 3 billion characters. Hassabis: “AlphaGenome is the best system in the world for predicting if a mutation will cause disease or if it’s benign.” 98% of your genome doesn’t code for proteins. For decades scientists treated it like dark matter. Present everywhere. Readable nowhere. AlphaGenome reads it. Now pair that with CRISPR. Jennifer Doudna’s gene editing tool can already target any DNA sequence on command. The bottleneck was never the scalpel. It was knowing exactly where to cut. Hassabis: “A combination of things like AlphaGenome and CRISPR could be incredibly powerful.” That might be the most restrained sentence ever spoken about the future of medicine. AI locates the exact mutation killing you. CRISPR goes in and deletes it. Not treatment. Not management. Deletion. The hardest cases are multigenic. Mutations that cascade and compound. Hiding behind each other. Too complex for any human mind to untangle in a single lifetime. Hassabis: “Those are even harder to detect, but actually perfect for AI to try and help with.” The diseases that have defeated medicine for centuries are the exact ones AI is purpose-built to solve. That’s not a coincidence. That’s the turning point. Every parent who sat in a white room and heard “there’s nothing more we can do.” Every patient who watched their own biology turn against them with nothing to fight back. Every name carved into stone because we could name the disease but couldn’t disarm it. That era now has an expiration date. Humanity spent 10,000 years fighting disease with observation and guesswork. We’re about to fight it with comprehension. Your grandchildren may read about genetic disease the way you read about smallpox. As something that once ended millions of lives before we learned to read the code that wrote them. Somewhere right now a child carries a death sentence folded into their DNA. Born with it before they ever opened their eyes. They don’t know it yet. Their parents don’t know it yet. But for the first time in human history, the answer might arrive before the disease does. That’s not technology. That’s the moment our biology stopped being our fate.

English

334

39.7K

Edu Mello@mello20760·10 May

Exactly, the constraint is the feature. In this version it reads the full knowledge file on every query, so there's nowhere for hallucination to creep in. I'm finishing a more advanced version now. And I keep thinking... what if instead of probabilistic training on everything, LLMs were grounded only in verified fundamentals across every domain? The output could be far more precise and auditable. Appreciate the thoughtful feedback, Calvin.

English

Calvin Thurman@cet3001·10 May

@mello20760 @xai Offline-first with no hallucination is the right call for anything customer-facing. The moment a support tool starts making stuff up from general training, trust is gone. Sounds like you built the constraint in from the start. Smart.

English

Edu Mello@mello20760·9 May

@spgreenwood A lot of possibilities. x.com/mello20760/sta…

Edu Mello@mello20760

iMMersia 4D – Patent Pending A universal format for immersive capture & distribution. A new concept for event transmission & recording, perfectly feasible today. A boost for concert revenue and the foundation of a holographic future for live experiences. @Apple @SonyElectronics @Ticketmaster – the next standard is here. PDF (ENG + JPN + PORT) drive.google.com/file/d/1atAjf2… #Immersia #ImmersiveTech #VR #Holography #Concerts

English

Stephen Greenwood@spgreenwood·9 May

The Vision Pro is the most misunderstood and poorly marketed device that Apple makes. But the team that built it did an incredible job. If it were $1500 cheaper and the emphasis was as a Mac extender, the entire narrative would be different.

English

135

7.7K

Edu Mello@mello20760·9 May

@aayushaggrwal @theapplecycle x.com/mello20760/sta…

Edu Mello@mello20760

OFFLINE. No API. No RAG. No external LLMs. Sigma Vox answers exclusively from your own file — .txt, .json, or .pdf, up to 12,000 characters. No hallucination. No internet. Just your content. Powered by @Apple Intelligence, built into iPhone and iPad. Business opportunity for anyone who wants to deploy a private voice assistant for companies. App Store → apps.apple.com/do/app/sigma-v… @AppStore @AppleDeveloper @viticci #AppleIntelligence #iOS #Sigmavox

QME

aayushaggarwal@aayushaggrwal·8 May

@theapplecycle Will still be useless

English

276

Apple Cycle@theapplecycle·8 May

iOS 27 will reportedly support these three AI models: • ChatGPT • Claude • Gemini It will be introduced in just one month! Source: Mark Gurman

English

635

23.3K

Edu Mello@mello20760·9 May

Custom knowledge assistant, you load any file (menu, manual, catalog) and it answers voice questions strictly from that content. No hallucination by design, not by prompt, since it always reads the full knowledge file. Works offline on Apple Silicon, 8 languages auto-detected, zero API cost. Great for product manuals, training, hotel concierge, anything that needs to stay 100% faithful to the source. Grok Think Fast is impressive, but it's cloud + API. This runs on a Mac Mini 24/7 or iPad-iPhone with no subscription. DM me and I'll send you a 28-day redeem code to try it.

English

Calvin Thurman@cet3001·9 May

@mello20760 @xai Haven't tried it yet. What's the main use case you're getting value from?

English

Edu Mello@mello20760·7 May

@PatentlyApple iMMersia would be perfect for Apple! x.com/mello20760/sta…

Edu Mello@mello20760

English

Patently Apple@PatentlyApple·7 May

x.com/i/article/2052…

ZXX

Edu Mello@mello20760·6 May

single, integrated cognitive workspace for files, agents, voice, and reasoning. Author: Eduardo Mello Date: November 2025 For: OpenAI Partnerships Team Subject: Proposal for a next-generation ChatGPT interface designed to unlock the full potential of GPT-5.1 Pro and future models. 1. Executive Summary ChatGPT has become the world’s default cognitive interface — but its current UI still operates as a traditional chat window, which fundamentally limits the true power of large-scale reasoning models such as GPT-5.1 Pro. I propose a new interface called the ChatGPT Unified Workspace: a single, integrated, multimodal environment where users can combine: files agents voice prompts context code documents repositories all inside one coherent workspace, without switching modes or losing context. This interface solves the #1 limitation noted by developers and professionals: “GPT-5.1 Pro is a staff-level engineer trapped inside a chat box.” The Unified Workspace frees that engineer. 2. The Problem with Today’s AI Interfaces Current AI interfaces (ChatGPT, Gemini, Anthropic, etc.) share the same constraints: 2.1. Single-channel interaction The chat window is one-dimensional: everything must be serialized as text input. 2.2. No persistent, inspectable context Documents, files, agents, instructions, and memory are invisible or inaccessible. 2.3. Manual friction for professional workflows Users must: re-upload files copy-paste code re-explain context manage long prompts manually switch between tools and modes jump between IDE, chat, drive, browser This creates cognitive friction and prevents adoption of GPT-5.1 Pro for real engineering work. 2.4. No unified multimodal workspace Voice, text, files, agents, and reasoning exist in separate silos. 2.5. Lack of “workspace structure” Real work — engineering, research, writing, planning — is project-based. Chat interfaces are not. 3. The Solution: ChatGPT Unified Workspace A single interface that merges: 🟦 File Explorer 🟦 Agent Launcher 🟦 Voice Command Layer 🟦 Chat + Editor hybrid 🟦 Project context and memory 🟦 Repository mode 🟦 Real-time multimodal reasoning into one coherent surface. This is the natural evolution of ChatGPT: from “chatbot” → to full cognitive operating system. 4. Key Concept Everything the model needs to reason is kept visibly in the left panel. Everything the user needs to say is typed or spoken in the right panel. This unlocks the true capabilities of GPT-5.1 Pro and future models. 5. Workspace Structure 5.1. Left Panel: Navigation Brain A persistent, scrollable sidebar containing: Files PDFs documents images videos audio code files repos datasets web captures agent modules YAML settings Organized with folders and drag-and-drop. Key Features click any item → model re-analyzes it shift-click → compare multiple items ctrl-click → combine items multi-select → define complex operations persistent “workspace state” for long projects works on desktop and mobile (slide-in panel) Direct-file reading mode No need to re-upload. The model reads directly from the user’s file source (local or cloud). 5.2. Right Panel: The Cognitive Console Not just a chat window. A multi-mode panel that adapts to user intent. Modes include: Text chat Code editor Document editor Voice console Story narrator Research mode UI/UX critique mode Multi-agent discussion view Repository diff/patch mode Switching is automatic and intent-based. But the UI never changes — one unified surface. 5.3. Global Voice Activation Voice can be triggered at any moment: “Summarize the selected PDFs.” “Refactor only the files I highlighted.” “Activate the UI Designer agent.” “Translate this story into Italian with native tone.” Language channels Each language has an isolated voice channel (no mixing). This solves the “accent leakage” and multilingual blending issue. 5.4. Agents as First-Class Citizens Agents appear in the left panel like files. Examples: Backend Engineer UI Designer Research Analyst Proof Checker Storyteller DevOps Architect User clicks an agent → it activates. The system orchestrates agents automatically in the background when tasks require multi-step reasoning. 6. Why This Interface Is Necessary 6.1. GPT-5.1 Pro’s biggest flaw is the interface Its reasoning abilities are unmatched — but the chat interface limits adoption. 6.2. Professionals need a structured workspace The workspace transforms ChatGPT from: a conversational agent → into a professional cognitive environment 6.3. Reduces cognitive load No need to remember context. No need for long prompts. No need for re-uploads. Everything is visible and persistent. 6.4. Enables true end-to-end workflows From research → to writing → to coding → to UI → to voice → to delivery. 6.5. Unlocks GPT-5.1 Pro’s long-context power Its chain-of-thought depth becomes usable. 6.6. Differentiates OpenAI vs Google Gemini 3’s competitive advantage is its toolchain integration (Cursor, Cline, Antigravity IDE). This proposal leapfrogs that entire ecosystem. 7. Why Only OpenAI Can Build This 7.1. Deep model integration This interface requires: token-stream awareness intermediate reasoning hooks chain-of-thought management (internal) memory architecture multi-agent scheduling file embeddings real-time multimodal fusion synthetic attention routing voice processing visual context Only OpenAI has the infrastructure and architecture to combine all these coherently in one product. 7.2. Alignment + Safety constraints A multi-agent environment requires robust safety APIs. OpenAI is the only lab already shipping such primitives (tools, agents, memory, UI permissions). 7.3. GPT-5.1 Pro is uniquely suited Its depth and precision benefit more from this interface than any other model. 8. Proposed Rollout Phases Phase 1 — File + Agent Sidebar (Foundational UI) drag-and-drop persistent file context agent activation multi-select actions Phase 2 — Unified Console Modes code editor story narrator voice console repository diff view Phase 3 — Real-time Multimodal Reasoning vision reading screen interpretation UI/UX critique Phase 4 — Project Memory persistent workspace states automatic context resurrection long-term collaboration workflows Phase 5 — Multi-Agent System AI-to-AI collaboration self-orchestration specialized agents 9. Expected Impact Professionals reduce friction reduce errors accelerate workflows 5–20× unlock GPT-5.1 Pro for real engineering Education interactive learning environment multimodal exploration Enterprise unified interface for teams structured, context-rich AI collaboration replaces multiple internal tools GPT Store agents become installable apps workspaces become reusable templates 10. Conclusion This proposal is not merely a UI improvement — it is the natural next step of AI evolution. The ChatGPT Unified Workspace transforms ChatGPT from: a chat interface into a universal cognitive environment designed for work, reasoning, creation, engineering, and collaboration. It solves the largest bottleneck facing GPT-5.1 Pro today and unlocks its full potential. I believe this direction can dramatically expand the usability and impact of OpenAI’s models, and I would be honored to discuss or refine this proposal with the team. Thank you for your time and consideration. — Eduardo Mello Musicamania Tecnologia - AI LAB Sao Paulo - Brazil -musicamania.ai

English

229

VraserX e/acc@VraserX·6 May

ChatGPT’s new voice mode will be one of the biggest releases of the year. It will listen and talk at the same time. It will sound fully human. It will run on GPT-5.5 instant-level intelligence. And once it is integrated into Codex, everything changes. You won’t just type prompts anymore. You’ll speak to your computer, and it will code, navigate, execute, debug, research, organize, and operate interfaces for you through computer use. People are massively underestimating this.

English

801

62.2K

Edu Mello@mello20760·5 May

@gailcweiner Try Sigma Vox!

English

Gail Weiner@gailcweiner·5 May

More hype. If it is anything like current advanced voice mode then it’s dead before launch.

Chubby♨️@kimmonismus

New ChatGPT Voice mode pretty much confirmed. And im really excited for it.

English

1.6K

Edu Mello@mello20760·5 May

@AgorithmAg @mark_k @OpenAI I am the developer, DM me and I give a promo code 28 days trial.

English

Agata Sliwinska (artist)@AgorithmAg·5 May

@mello20760 @mark_k @OpenAI Hi, nope, is it worth it?

English

Mark Kretschmann@mark_k·5 May

A new “voice mode” is being prepared for release by @OpenAI. The upgraded voice mode is based on the omnimodal GPT-5.5, making it substantially smarter and more expressive than the current version. It will also support full-duplex conversations, meaning it can listen and speak at the same time. That should make conversations feel much more natural and fluid.

English

1.6K

86.5K

Edu Mello@mello20760·5 May

@AgorithmAg @mark_k @OpenAI Have you tried Sigma Vox?

English

Agata Sliwinska (artist)@AgorithmAg·5 May

This sounds promising, but as an AI artist working with voice-first interaction, I’m cautious. Voice isn’t cosmetic, it shapes flow, continuity, focus, and nervous-system safety. My ADHD works best with a lower, warm, professional tone. After so many model changes + the Standard Voice scare, I’m not especially excited yet.

English

Edu Mello@mello20760·5 May

@mark_k @OpenAI Sounds great, but it will still hallucinate. Sigma Vox never does — because it reads the entire knowledge base every single time. No indexing, no guessing, no hallucination. x.com/mello20760/sta…

Edu Mello@mello20760

English

Edu Mello@mello20760·5 May

@sama When AI knows all the content... x.com/mello20760/sta…

Edu Mello@mello20760

Grok Voice vs Sigma Vox Grok Voice is the most advanced voice tool I've seen — congrats to the xAI team. But Sigma Vox runs offline, no API, no subscription, and doesn't hallucinate by design. If the answer isn't in your file, it says so. That's it. A different philosophy — for iPhone and iPad.

English

Sam Altman@sama·5 May

pretty excited for voice models to get great its interesting to watch how people are already starting to change the way they interface with AI

English

927

239

6.3K

657.5K

Edu Mello@mello20760·2 May

@benz145 If they integrate iMMersia. x.com/mello20760/sta…

Edu Mello@mello20760

English

Ben Lang@benz145·30 Nis

The biggest reason I believe Apple isn’t giving up on Vision Pro is that I’m certain the people at the company are smart enough to understand that they built something genuinely incredible that just hasn’t yet found product-market fit. The good news is that the path to PMF is obvious: make it cheaper and make it lighter. The bad news is that actually achieving cheaper and lighter is a massive challenge! I don’t think Apple is giving up Vision Pro or VisionOS, but I wouldn’t be surprised if they have new price/weight goals internally, and will wait years to reach those goals before launching a new Vision device.

English

146

16.6K

Edu Mello@mello20760·2 May

@luciano_rj @xai Oi Luciano, essa versão do Grok é para integrar a API deles em diversos atendimentos, mas em Março último lancei o Sigma Vox, vou passar o video do Youtube e você entende a ideia rápido, abraços. youtu.be/WsXWtqvxTUI

YouTube

Português

Luciano Henriques | RJ - 🇧🇷@luciano_rj·2 May

@xai Legal e tal, mas para que eu usaria isto?

Português

Keşfet

@PolymarketMoney @xai @thinkymachines @r0ck3t23 @spgreenwood @aayushaggrwal @theapplecycle @elonmusk