Carlos Santana

23.9K posts

Carlos Santana

@DotCSV

🤖 Divulgador de Inteligencia Artificial (DotCSV) ✉️ Contacto comercial: [email protected] 📚 Enseño sobre IA en Youtube, Tiktok e Instagram

Madrid, Comunidad de Madrid Katılım Mart 2016

1.2K Takip Edilen209.8K Takipçiler

Sabitlenmiş Tweet

Carlos Santana@DotCSV·14 Oca

🔮 ¡MIS PREDICCIONES IA del 2026! 🔮 Un año más aquí os traigo 24 ideas de lo que creo podría pasar en el mundo de la Inteligencia Artificial durante este año 24 predicciones que podéis votar y apoyar una a una usando el botón 💖 ¡Comparte el hilo! En 12 meses verificamos 😄👇

Español

229

901

207K

Carlos Santana@DotCSV·9h

Movimiento inteligente de OpenAI creando el tier Pro de 100$ que la gente le estaba pidiendo! Cómo saben que ahora hay mucho usuario migrando de Anthropic

OpenAI@OpenAI

We’re updating our ChatGPT Pro and Plus subscriptions to better support the growing use of Codex. We’re introducing a new $100/month Pro tier. This new tier offers 5x more Codex usage than Plus and is best for longer, high-effort Codex sessions. In ChatGPT, this new Pro tier still offers access to all Pro features, including the exclusive Pro model and unlimited access to Instant and Thinking models. To celebrate the launch, we’re increasing Codex usage for a limited time through May 31st so that Pro $100 subscribers get up to 10x usage of ChatGPT Plus on Codex to build your most ambitious ideas.

Español

387

31.5K

Carlos Santana@DotCSV·12h

Que no cunda el pánico con eso de que OpenAI limitará el acceso de su próximo modelo a unas pocas empresas, falsa alarma.

Español

169

9.1K

Carlos Santana@DotCSV·14h

@ismaMorale115 x4.9 menos tokens en este benchmark

Deutsch

152

Isma@ismaMorale115·14h

@DotCSV x5 veces más caro, pero no x5 veces más eficiente que yo sepa

Español

170

Carlos Santana@DotCSV·15h

Respecto a Mythos me han preguntado por qué en el vídeo de Youtube no he hecho mención a esta gráfica que todos estas comentando, y hay un par de motivos por el que descarté hablar de ello tras leer la Model Card. 1) Reportar la eficiencia de un modelo sobre un único benchmark no te da realmente información sobre si el modelo es más o menos eficiente en general, sino únicamente en la tarea que evalúa dicho benchmark: En este caso encontrar información muy difícil en internet usando herramientas de navegación. Esta es una tarea que por tanto se vuelve mucho más dependiente del harness utilizado y de cómo el modelo lo utilice, pero da pocas pistas de la inteligencia y eficiencia real del modelo por token usado. Sería genial que todos los benchmarks que se reporten en un futuro por todos los labs viniera acompañado de sus métricas de uso de tokens y precio por API, por ejemplo. - Y luego está el segundo motivo...

Español

251

25K

Carlos Santana@DotCSV·14h

@dobleio *capacidad entendida como número de parémetros

Español

Carlos Santana@DotCSV·14h

@dobleio Pero a mayor capacidad del modelo más riesgo de memorización.

Español

Carlos Santana@DotCSV·15h

Por aquí el análisis de Mythos, si aún no lo has visto youtu.be/e2M5vRKcyuI

YouTube

Español

8.6K

Carlos Santana@DotCSV·15h

Aquí siempre en el equipo de Noam, ojalá más y más benchmarks se reportarán cruzando accuracy con coste! x.com/polynoamial/st…

Noam Brown@polynoamial

I'm surprised that, more than a year later, it's still the norm to compare reasoning models on evals by a single number.

Español

7.8K

Carlos Santana@DotCSV·16h

@carlosazaustre Tal cual. El tono es lo más triste porque la calidez de Opus se pierde por completo, e incluso cuando fuerzas a que lo intente suena todo el rato a esto 👇

Español

2.6K

Carlos Azaustre@carlosazaustre·17h

Tras probar varias cosas, me quedo por ahora con gpt-5.4-codex. Puedo usarlo por suscripción en lugar por API/token como hacia antes con Claude. El nivel es muy parecido a Opus, aunque tiene otro tono. Por lo que he podido probar, da más información y entrega mucho más rápido que Opus o Sonnet.

Carlos Azaustre@carlosazaustre

Ya hay muchos matando a OpenClaw, pero OpenClaw no es un wrapper de Claude. Es un enrutador de modelos que mantiene memoria, personalidad y contexto. Las alternativas por ahora que voy a estudiar para usar: - gpt-codex - Ollama cloud con Gemma 4, MiniMax, Qwen, etc… - Gemma en local con 64Gb de memoria - Gemma en local con una 4070 y 12gb de VRAM Ni en vacaciones le dejan a uno descansar 😅

Español

14.3K

Carlos Santana@DotCSV·17h

El gran bucle se está cerrando 🫡 youtu.be/GOhMh__Z4xI?is…

YouTube

Español

5.5K

Carlos Santana@DotCSV·21h

Otros 5 problemas Erdos más resueltos usando un modelo interno de OpenAI. Suma y sigue.

Mehtaab Sawhney@mehtaab_sawhney

We’ve just released another paper solving five further Erdős problems with an internal model at OpenAI: arxiv.org/abs/2604.06609. Several of the proofs were especially enjoyable to digest while writing the paper. My personal favorite was the solution to Erdős Problem 1091. The question asks: if a graph G has chromatic number 4, while every small subgraph has chromatic number at most 3, must it contain an odd cycle with many diagonals? The internal model gives a very enlightening counterexample to this conjecture, and the proof was a pleasure to understand. For those so inclined, a really fun exercise is to try to reconstruct the proof from Figure 5 of the paper, which was of course produced by Codex.

Español

281

18.6K

Carlos Santana@DotCSV·17h

@ismaguimarais @grok Hacen mucho cherry picking de sólo aquellos benchmarks en los que salen primeros, pero se han quedado atrás frente a Gemini, GPT y Claude. Cuando hacen updates que sí los coloca a la frontera suelo hacer vídeo. Pero llevan un tiempo que se han quedado atrás.

Español

Ismael Guimarais@ismaguimarais·20h

@DotCSV pq casi nunca hablas de @grok ? Es pq simplemente es mentira cada ventaja que publican? O sea, es uno de los grandes competidores o es más publicidad de Musk que otra cosa?

X Freeze@XFreeze

Grok 4.20 Non-Hallucination rate improved to even higher than previous highest Just days ago, it hit a record-breaking 78% Non-Hallucination Rate - already #1 in the world, smoking Claude Opus 4.6 (max), Gemini 3.1, GPT-5.4 (xhigh), and every other major model Now, it just pushed that number even higher to 83% While every other AI confidently makes up stuff and fabricate answers it doesn't know - Grok simply says "I don't know"

Español

Carlos Santana@DotCSV·1d

Las GPUs de Zuckerberg went brrrrr brrrrr para dar un salto necesario por estar en la carrera. Ahora toca mantener el ritmo! x.com/ArtificialAnly…

Artificial Analysis@ArtificialAnlys

Meta is back! Muse Spark scores 52 on the Artificial Analysis Intelligence Index, behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. Muse Spark is the first new release since Llama 4 in April 2025 and also Meta's first release that is not open weights Muse Spark is a new model from @Meta evaluated on Artificial Analysis. We were given early access by Meta to independently benchmark the model. It is the first frontier-class model from Meta since Llama 4 Maverick was released in April 2025, and notably the first @AIatMeta model that is not being released as open weights. The release follows Meta's reorganization of its AI efforts under Meta Superintelligence Labs, and signals that Meta is re-entering the frontier race after roughly a year of relative quiet. For context, Llama 4 Maverick and Scout scored 18 and 13 respectively on the Artificial Analysis Intelligence Index as non-reasoning models at the time of their release, while Muse Spark scores 52. Muse Spark essentially closes the gap between to the frontier in a single release. The model is not open source and is not yet accessible via an API but Meta has shared they expect this to come soon. Meta is also integrating Muse Spark into their first party products including their Meta AI chat product, Facebook, Instagram and Threads. Key takeaways from our benchmarks: ➤ Muse Spark scores 52 on the Artificial Analysis Intelligence Index, placing it within the top 5 models we have benchmarked. It sits ahead of Claude Sonnet 4.6, GLM-5.1, MiniMax-M2.7, Grok 4.20 and behind Gemini 3.1 Pro Preview, GPT-5.4 and Claude Opus 4.6 ➤ Muse Spark is notably token efficient for its intelligence level. It used 58M output tokens to run the Intelligence Index, comparable to Gemini 3.1 Pro Preview (57M) and notably lower than Claude Opus 4.6 (Adaptive Reasoning, max effort, 157M), GPT-5.4 (xhigh, 120M) and GLM-5 (110M) ➤ Muse Spark is the second-most capable vision model we have benchmarked. It scores 80.5% on MMMU-Pro, behind only Gemini 3.1 Pro Preview (82.4%) ➤ Muse Spark performs strongly on reasoning and instruction-following evaluations. It scores 39.9% on HLE, trailing only Gemini 3.1 Pro Preview (44.7%) and GPT-5.4 (xhigh, 41.6%). The model also achieved 5th highest in CritPT with a score of 11%, an eval that is focused on difficult physics research questions. This is substantially above above Gemini 3 Flash (9%) and Claude 4.6 Sonnet (3%) ➤ Agentic performance does not stand out. On GDPval-AA, our evalaution focused on real world work tasks, Muse Spark scores 1427, behind both Claude Sonnet 4.6 at 1648 and GPT-5.4 at 1676, but ahead of Gemini 3.1 Pro Preview at 1320. On On TerminalBench Hard, Muse Spark trails Claude Sonnet 4.6, GPT-5.4, and Gemini 3.1 Pro. Muse Spark joins others in achieving a high τ²-Bench Telecom score of 92% Key model details: ➤ Modalities: Multimodal including text and vision input, text output ➤ License: Proprietary, Meta's first frontier model not released as open weights ➤ Availability: No public API at the time of publishing. Meta expects to provide API access soon. Meta has started integration into their first party AI offering Meta AI and inside Facebook, Instagram, and Threads

Español

128

12.2K

Carlos Santana@DotCSV·1d

El rendimiento del modelo lo coloca cerca de Opus 4.6, Gemini 3.1 y GPT 5.4 sin sobresalir notablemente en ninguna dimensión. Mi sensación es que han metido prisa para sacar y estar en la carrera a la vista de los movimientos de Anthropic y OpenAI. x.com/AIatMeta/statu…

AI at Meta@AIatMeta

Muse Spark is built from the ground up to integrate visual information across domains and tools. It achieves strong performance on visual STEM questions, entity recognition, and localization, enabling interactive experiences like troubleshooting your home appliances with dynamic annotations.

Español

137

15.6K

Carlos Santana@DotCSV·1d

🔴 ¡META VUELVE A LA BATALLA! Tras el fracaso de Llama 4, Meta ha reorientado durante el último año toda su estrategia en IA y hoy por fin muestra su primer modelo (privado) que busca competir cara a cara con los grandes con su nueva línea de modelos Muse 👇

Español

684

43.7K

Carlos Santana@DotCSV·1d

Twitter has tú magia! Entrad a este tweet y compartidlo para mayor difusión y apoyo 👇🙏

Míriam González@miriamgonp

Tengo 35 años y cancer de mama metastásico, un caso raro, menos del 1% de tumores de mama son como el mío y hay poca documentación sobre ello. Por eso me gustaría encontrar personas que se dediquen a esto y que quieran investigar con mi caso. Twitter haz tu magia

Español

342

18.9K

Carlos Santana@DotCSV·1d

x.com/i/status/20418…

Carlos Santana@DotCSV

Comparte el vídeo si te ha gustado! 👀 youtu.be/e2M5vRKcyuI?is…

ZXX

4.2K

Carlos Santana@DotCSV·2d

Y ahora con este salto en capacidades yo solo puedo seguir pensando en lo que os contaba ayer en este vídeo ¿qué salto en capacidades tendrá Mythos en matemáticas? ¿qué nuevos descubrimientos desbloqueará? youtu.be/GOhMh__Z4xI

YouTube

Español

19.8K

Carlos Santana@DotCSV·2d

🔴 ¡ANTHROPIC MYTHOS PREVIEW! Anthropic acaba de publicar lo que serían los primeros benchmarks de su "filtrado" gran próximo modelo, Mythos. La verdad es que en programación y razonamiento el salto es BESTIA!

Español

124

1.3K

145.6K

Carlos Santana@DotCSV·1d

Comparte el vídeo si te ha gustado! 👀 youtu.be/e2M5vRKcyuI?is…

YouTube

Español

11K

Carlos Santana@DotCSV·1d

🔥 ¡NUEVO VIDEO en el LAB! 🔥 Claude Mythos Preview nos ha pillado por sorpresa aún cuando estamos acostumbrados al ritmo de progreso de la IA. Un modelo tan potente como peligroso hasta el punto de que no verá la luz... Hoy analizamos esta nueva bestia! Link a continuación

Español

293

11.1K

Keşfet

@ismaMorale115 @dobleio @carlosazaustre @ismaguimarais @grok @elonmusk @BarackObama @taylorswift13