ElliotSecOps

6K posts

ElliotSecOps

@ElliotSecOps

Elliot como alias, seguridad como profesión. Security & fintech founder

/home/user Beigetreten Haziran 2023

847 Folgt5.2K Follower

Angehefteter Tweet

ElliotSecOps@ElliotSecOps·10 May

Los únicos trabajos que no va a poder reemplazar la IA y la robótica son aquellos con mayor conexión humana, no necesariamente porque estas tecnologías no puedan desempeñar esas funciones, sino porque sencillamente la gente prefiere interactuar con otros humanos.

Español

137

10.5K

ElliotSecOps@ElliotSecOps·15h

Mistral es ese amigo honorario que participa solo por participar, para completar al equipo, sabes que no es excelente, no destaca, no es brillante, simplemente te cae bien y una vez cada tanto hablas con él pero nunca es tu primera opción😅

Artificial Analysis@ArtificialAnlys

Mistral has released Mistral Small 4, an open weights model with hybrid reasoning and image input, scoring 27 on the Artificial Analysis Intelligence Index @MistralAI's Small 4 is a 119B mixture-of-experts model with 6.5B active parameters per token, supporting both reasoning and non-reasoning modes. In reasoning mode, Mistral Small 4 scores 27 on the Artificial Analysis Intelligence Index, a 12-point improvement from Small 3.2 (15) and now among the most intelligent models Mistral has released, surpassing Mistral Large 3 (23) and matching the proprietary Magistral Medium 1.2 (27). However, it lags open weights peers with similar total parameter counts such as gpt-oss-120B (high, 33), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, 36), and Qwen3.5 122B A10B (Reasoning, 42). Key takeaways: ➤ Reasoning and non-reasoning modes in a single model: Mistral Small 4 supports configurable hybrid reasoning with reasoning and non-reasoning modes, rather than the separate reasoning variants Mistral has released previously with their Magistral models. In reasoning mode, the model scores 27 on the Artificial Analysis Intelligence Index. In non-reasoning mode, the model scores 19, a 4-point improvement from its predecessor Mistral Small 3.2 (15) ➤ More token efficient than peers of similar size: At ~52M output tokens, Mistral Small 4 (Reasoning) uses fewer tokens to run the Artificial Analysis Intelligence Index compared to reasoning models such as gpt-oss-120B (high, ~78M), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, ~110M), and Qwen3.5 122B A10B (Reasoning, ~91M). In non-reasoning mode, the model uses ~4M output tokens ➤ Native support for image input: Mistral Small 4 is a multimodal model, accepting image input as well as text. On our multimodal evaluation, MMMU-Pro, Mistral Small 4 (Reasoning) scores 57%, ahead of Mistral Large 3 (56%) but behind Qwen3.5 122B A10B (Reasoning, 75%). Neither gpt-oss-120B nor NVIDIA Nemotron 3 Super 120B A12B support image input. All models support text output only ➤ Improvement in real-world agentic tasks: Mistral Small 4 scores an Elo of 871 on GDPval-AA, our evaluation based on OpenAI's GDPval dataset that tests models on real-world tasks across 44 occupations and 9 major industries, with models producing deliverables such as documents, spreadsheets, and diagrams in an agentic loop. This is more than double the Elo of Small 3.2 (339) and close to Mistral Large 3 (880), but behind gpt-oss-120B (high, 962), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, 1021), and Qwen3.5 122B A10B (Reasoning, 1130) ➤ Lower hallucination rate than peer models of similar size: Mistral Small 4 scores -30 on AA-Omniscience, our evaluation of knowledge reliability and hallucination, where scores range from -100 to 100 (higher is better) and a negative score indicates more incorrect than correct answers. Mistral Small 4 scores ahead of gpt-oss-120B (high, -50), Qwen3.5 122B A10B (Reasoning, -40), and NVIDIA Nemotron 3 Super 120B A12B (Reasoning, -42) Key model details: ➤ Context window: 256K tokens (up from 128K on Small 3.2) ➤ Pricing: $0.15/$0.6 per 1M input/output tokens ➤ Availability: Mistral first-party API only. At native FP8 precision, Mistral Small 4's 119B parameters require ~119GB to self-host the weights (more than the 80GB of HBM3 memory on a single NVIDIA H100) ➤ Modality: Image and text input with text output only ➤ Licensing: Apache 2.0 license

Español

476

ElliotSecOps@ElliotSecOps·19h

Moonshot oficialmente aclara que Cursor usó su modelo Kimi-K2.5 para Composer 2! No cabe la menor duda, los modelos chinos ya no tienen nada que envidiarle a los modelos occidentales, la carrera se pone cada vez más interesante cada día

Kimi.ai@Kimi_Moonshot

Congrats to the @cursor_ai team on the launch of Composer 2! We are proud to see Kimi-k2.5 provide the foundation. Seeing our model integrated effectively through Cursor's continued pretraining & high-compute RL training is the open model ecosystem we love to support. Note: Cursor accesses Kimi-k2.5 via @FireworksAI_HQ ' hosted RL and inference platform as part of an authorized commercial partnership.

Español

1.6K

ElliotSecOps retweetet

Kimi.ai@Kimi_Moonshot·20h

English

458

1.2K

18K

2.6M

ElliotSecOps@ElliotSecOps·23h

Composer 2 de Cursor está siendo acusado de simplemente ser Kimi-2 entrenado con toda la data recopilada por Cursor Eso explicaría como "sacaron" un modelo tan bueno en tan poco tiempo

Fynn@fynnso

was messing with the OpenAI base URL in Cursor and caught this accounts/anysphere/models/kimi-k2p5-rl-0317-s515-fast so composer 2 is just Kimi K2.5 with RL at least rename the model ID

Español

1.3K

ElliotSecOps@ElliotSecOps·1d

@feregri_no pues le dices a la misma IA que escribió el codigo que lo debugee, haces un meta-prompting del error bien explicado con el diagnostico y las posibles soluciones, y ya está

Español

136

Antonio Feregrino@feregri_no·1d

A mis devs junior ya no les pregunto si usaron IA. Les pregunto si podrían debuggear lo que acaban de subir.

Español

548

32.1K

ElliotSecOps@ElliotSecOps·1d

@DamianCatanzaro Puedes conectarlo sin API key? Yo tengo ChatGPT Business y ese me da Codex pero no me da un API key sino que me hace iniciar sesión con OAuth

Español

280

Damián Catanzaro ☕️@DamianCatanzaro·1d

No tienen que pagar nada extra para usar Codex/GPT 5.4 en OpenCode, si ya tienen la suscripción de ChatGPT web con eso lo pueden conectar y es aceptado por OpenAI.

Español

6.5K

Damián Catanzaro ☕️@DamianCatanzaro·1d

Otro gran error de Anthropic, OpenCode es muchísimo mejor que ClaudeCode en mis prácticas. OpenCode con Codex es simplemente una joyita.

dax@thdxr

opencode 1.3.0 will no longer autoload the claude max plugin we did our best to convince anthropic to support developer choice but they sent lawyers it's your right to access services however you wish but it is also their right to block whoever they want we can't maintain an official plugin so it's been removed from github and marked deprecated on npm appreciate our partners at openai, github and gitlab who are going the other direction and supporting developer freedom

Español

948

98K

ElliotSecOps@ElliotSecOps·1d

@FakuCrypto Solo faltó Codex

Español

125

Faku@FakuCrypto·1d

si tus amigos no están hablando de: -claude code -creatina -openclaw -looksmaxxing -agentes de IA -mercados de predicción -mac mini es hora de buscar nuevos amigos

Español

103

219

2.8K

84.9K

ElliotSecOps retweetet

arc.@arceyul·1d

Cursor acaba de lanzar su propio modelo de IA, Composer 2 y ya supera a Claude Opus en programación a mitad del coste...

Cursor@cursor_ai

Composer 2 is now available in Cursor.

Español

1.4K

142.4K

ElliotSecOps@ElliotSecOps·1d

@DavidOndrej1 Maybe for cute UI For backend not even close

English

169

David Ondrej@DavidOndrej1·2d

Opus 4.6 >>>>>>>> GPT 5.4

BURKOV@burkov

GPT-5.4 > Opus 4.6 And Google still doesn't have anything even remotely competitive.

English

303

35K

ElliotSecOps@ElliotSecOps·2d

Impresionante, MiniMax-M2.7 está a la par de GLM-5 y Codex-GPT-5.2xh por un tercio del costo, los modelos chinos están cerrando la brecha velozmente contra los modelos occidentales

Artificial Analysis@ArtificialAnlys

MiniMax has released MiniMax-M2.7, delivering GLM-5-level intelligence for less than one third of the cost MiniMax-M2.7 from @MiniMax_AI scores 50 on the Artificial Analysis Intelligence Index, an 8-point improvement over MiniMax-M2.5, which was released one month ago. This is driven by stronger performance on real-world agentic tasks and reduced hallucinations. MiniMax-M2.7 is now ahead of MiMo-V2-Pro (Reasoning, 49) and Kimi K2.5 (Reasoning, 47), and equivalent to GLM-5 (Reasoning, 50) while using 20% fewer output tokens and costing less than a third as much to run. MiniMax-M2.7 is a reasoning-only model and maintains the same per-token pricing as MiniMax-M2.5. Key takeaways: ➤ Strong performance on real-world agentic tasks: MiniMax-M2.7 achieves a GDPval-AA Elo of 1494, a significant improvement from MiniMax-M2.5 (1203) and ahead of MiMo-V2-Pro (Reasoning, 1426), GLM-5 (Reasoning, 1406), and Kimi K2.5 (Reasoning, 1283). It remains behind frontier models such as GPT-5.4 (xhigh, 1667) and Claude Opus 4.6 (Adaptive Reasoning, max effort, 1606) ➤ Reduced hallucinations: MiniMax-M2.7 scores +1 on the AA-Omniscience Index, up from MiniMax-M2.5 (-40). This is competitive with GPT-5.2 (xhigh, -1) and GLM-5 (Reasoning, +2), and well ahead of Kimi K2.5 (Reasoning, -8). The improvement from M2.5 is purely driven by reduced hallucinations, meaning the model is more likely to abstain from answering when it doesn’t know the answer, rather than guessing. M2.7 achieves a hallucination rate of 34%, lower than Claude Sonnet 4.6 (Adaptive Reasoning, max effort, 46%) and Gemini 3.1 Pro Preview (50%). ➤ Gains across most evaluations compared to MiniMax-M2.5: Outside of the GDPval-AA and AA-Omniscience improvements noted above, MiniMax-M2.7 improves in HLE (+9 p.p.), TerminalBench Hard (+5 p.p.), SciCode (+4 p.p.), IFBench (+4 p.p.), GPQA (+3 p.p.), and LCR (+3 p.p.). We saw a notable regression in τ²-Bench (-11 p.p.). ➤ Increased token use: MiniMax-M2.7 used ~87M output tokens to run the Artificial Analysis Intelligence Index, up 55% from MiniMax-M2.5 (~56M). It remains more token-efficient than other models such as GLM-5 (Reasoning, 110M) and Kimi K2.5 (Reasoning, ~89M) ➤ Leading cost efficiency: MiniMax-M2.7 cost $176 to run the Artificial Analysis Intelligence Index, maintaining the same $0.30/$1.20 per 1M input/output pricing as M2.5. This places it on the Pareto frontier of our Intelligence vs. Cost chart. For context, GLM-5 (Reasoning) cost $547 at equivalent intelligence, Kimi K2.5 (Reasoning) cost $371, and Gemini 3 Flash Preview (Reasoning) cost $278 Key model details: ➤ Context window: 200K tokens (equivalent to MiniMax-M2.5). ➤ Pricing: $0.30/$1.20 per 1M input/output tokens (unchanged from MiniMax-M2.5). ➤ Availability: MiniMax first-party API only. ➤ Modality: Text input and output only (no multimodality). ➤ Licensing: MiniMax has not announced whether MiniMax-M2.7 will be open weights. MiniMax-M2.5 is available under the MIT license.

Español

1.3K

ElliotSecOps retweetet

Delfin Abzueta@dabzueta·2d

Hablando del uso de herramientas de IA para desarrollar aplicaciones. Visión de @manodeyvin sobre como usarlas sin sentir culpa de vibecoder. @ElliotSecOps @guajilodev @AndreaDCorreia youtube.com/watch?v=r7iGcI…

YouTube

Español

371

ElliotSecOps retweetet

Carl Zha@CarlZha·3d

That's my hometown Chongqing. China is living in 2026. It's just that America is stuck in 1990s

Kevin Castley 🇨🇦@KevinCastley

China is living in a future era

English

355

3.1K

40.4K

724K

ElliotSecOps@ElliotSecOps·3d

@_denial Hermano en cristo eso es un código Javascript genuinamente de que estás hablando, skill issue confirmado

Español

daniel T.@_denial·3d

@ElliotSecOps "skill issue" 🥴🥴🥴 y es solo un markdown en el root del projecto, meco

English

ElliotSecOps@ElliotSecOps·4d

El trabajo de los programadores no está en crear codigo con IA, ya eso lo hacen los agentes de forma excelsa El verdadero trabajo está en mantenerlo, desplegarlo, escalarlo, reforzarlo, y debuggearlo El problema ya no es sintaxis, es diseño, seguridad y arquitectura

Español

1.2K

ElliotSecOps@ElliotSecOps·3d

@_denial Skill issue, es sabido que para trabajo serio hay que usar los modelos SOTA

Español

daniel T.@_denial·3d

@ElliotSecOps "excelsa" meco cuál es ai y cuál fue generado por el humano (?)

Español

ElliotSecOps retweetet

Mañana@trincherotw·3d

Los planes de Altman y del resto de la plutocracia de Silicon Valley de comoditizar la inteligencia parecen un delirio hasta que te das cuenta que el telón de fondo de ejecución es este

All day Astronomy@forallcurious

🚨: Screen time destroys toddler's brains. For every 30 minutes, the risk of speech delay increases 49%.

Español

1.5K

10K

185.2K

ElliotSecOps@ElliotSecOps·3d

Andrej Karpathy, antiguo miembro fundador de OpenAI, está creando ahora su sitio web con Claude Code. Las vueltas que da la vida.

Param@Param_eth

Andrej Karpathy, former OpenAI founding member, is now building his website with Claude Code.

Español

324

ElliotSecOps@ElliotSecOps·3d

@rcaceres_cl Qwen es rapido y preciso, pero requiere de mucha guía para trabajos de programación - 83 pts GLM es muy inteligente y más autónomo, pero puede ser lento y si usas mucho el WebFetch y análisis de imágenes puede alcanzar el rate limit muy rápido - 85 pts

Español

Roberto Cáceres@rcaceres_cl·4d

@ElliotSecOps ponderación en escala de 0 a 100? con 100 siendo un óptimo ideal

Español

197

ElliotSecOps@ElliotSecOps·4d

Qwen es el mejor modelo chino para redacción y textos en general, y GLM es el mejor modelo para código Kimi y Deepseek todavía no me terminan de convencer, pero reconozco el trabajo

Español

3.3K

ElliotSecOps@ElliotSecOps·4d

@_Ivans_ No lo he probado, dicen que es muy bueno pero no me he tomado el tiempo de ponerlo a prueba

Español

Ivans 🕊️@_Ivans_·4d

@ElliotSecOps yo que uso minimax para programar

Español

245

ElliotSecOps@ElliotSecOps·4d

@JesSC407 yo hablo con ambos modelos en español sin problema pero siento que las mejores respuestas me las da en ingles y GLM sigue mejor las instrucciones cuando le hablo en inglés

Español

Jesús@JesSC407·4d

@ElliotSecOps Y todo en español?

Español

170

ElliotSecOps@ElliotSecOps·4d

@0x4171341 @aguusmood @aquela_cba 👀

QME

ألفيري.@0x4171341·4d

@aguusmood @aquela_cba @ElliotSecOps te llaman

Español

agustina@aguusmood·4d

Busco un Senior DevOps/Infrastructure Engineer Stack: GCP, Kubernetes, Docker, Terraform/Terragrunt, CI/CD (GitLab), Linux y scripting (Bash). Experiencia manejando infraestructura cloud en producción. •Remoto - #LATAM •Full time •Horario flex •Inglés no necesario

Català

122

15.3K

Entdecken

@cursor_ai @FireworksAI_HQ @feregri_no @DamianCatanzaro @FakuCrypto @DavidOndrej1 @manodeyvin @guajilodev