ElliotSecOps

6K posts

ElliotSecOps banner
ElliotSecOps

ElliotSecOps

@ElliotSecOps

Elliot como alias, seguridad como profesión. Security & fintech founder

/home/user Beigetreten Haziran 2023
847 Folgt5.2K Follower
Angehefteter Tweet
ElliotSecOps
ElliotSecOps@ElliotSecOps·
Los únicos trabajos que no va a poder reemplazar la IA y la robótica son aquellos con mayor conexión humana, no necesariamente porque estas tecnologías no puedan desempeñar esas funciones, sino porque sencillamente la gente prefiere interactuar con otros humanos.
Español
12
13
137
10.5K
ElliotSecOps
ElliotSecOps@ElliotSecOps·
Mistral es ese amigo honorario que participa solo por participar, para completar al equipo, sabes que no es excelente, no destaca, no es brillante, simplemente te cae bien y una vez cada tanto hablas con él pero nunca es tu primera opción😅
Artificial Analysis@ArtificialAnlys

Mistral has released Mistral Small 4, an open weights model with hybrid reasoning and image input, scoring 27 on the Artificial Analysis Intelligence Index @MistralAI's Small 4 is a 119B mixture-of-experts model with 6.5B active parameters per token, supporting both reasoning and non-reasoning modes. In reasoning mode, Mistral Small 4 scores 27 on the Artificial Analysis Intelligence Index, a 12-point improvement from Small 3.2 (15) and now among the most intelligent models Mistral has released, surpassing Mistral Large 3 (23) and matching the proprietary Magistral Medium 1.2 (27). However, it lags open weights peers with similar total parameter counts such as gpt-oss-120B (high, 33), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, 36), and Qwen3.5 122B A10B (Reasoning, 42). Key takeaways: ➤ Reasoning and non-reasoning modes in a single model: Mistral Small 4 supports configurable hybrid reasoning with reasoning and non-reasoning modes, rather than the separate reasoning variants Mistral has released previously with their Magistral models. In reasoning mode, the model scores 27 on the Artificial Analysis Intelligence Index. In non-reasoning mode, the model scores 19, a 4-point improvement from its predecessor Mistral Small 3.2 (15) ➤ More token efficient than peers of similar size: At ~52M output tokens, Mistral Small 4 (Reasoning) uses fewer tokens to run the Artificial Analysis Intelligence Index compared to reasoning models such as gpt-oss-120B (high, ~78M), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, ~110M), and Qwen3.5 122B A10B (Reasoning, ~91M). In non-reasoning mode, the model uses ~4M output tokens ➤ Native support for image input: Mistral Small 4 is a multimodal model, accepting image input as well as text. On our multimodal evaluation, MMMU-Pro, Mistral Small 4 (Reasoning) scores 57%, ahead of Mistral Large 3 (56%) but behind Qwen3.5 122B A10B (Reasoning, 75%). Neither gpt-oss-120B nor NVIDIA Nemotron 3 Super 120B A12B support image input. All models support text output only ➤ Improvement in real-world agentic tasks: Mistral Small 4 scores an Elo of 871 on GDPval-AA, our evaluation based on OpenAI's GDPval dataset that tests models on real-world tasks across 44 occupations and 9 major industries, with models producing deliverables such as documents, spreadsheets, and diagrams in an agentic loop. This is more than double the Elo of Small 3.2 (339) and close to Mistral Large 3 (880), but behind gpt-oss-120B (high, 962), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, 1021), and Qwen3.5 122B A10B (Reasoning, 1130) ➤ Lower hallucination rate than peer models of similar size: Mistral Small 4 scores -30 on AA-Omniscience, our evaluation of knowledge reliability and hallucination, where scores range from -100 to 100 (higher is better) and a negative score indicates more incorrect than correct answers. Mistral Small 4 scores ahead of gpt-oss-120B (high, -50), Qwen3.5 122B A10B (Reasoning, -40), and NVIDIA Nemotron 3 Super 120B A12B (Reasoning, -42) Key model details: ➤ Context window: 256K tokens (up from 128K on Small 3.2) ➤ Pricing: $0.15/$0.6 per 1M input/output tokens ➤ Availability: Mistral first-party API only. At native FP8 precision, Mistral Small 4's 119B parameters require ~119GB to self-host the weights (more than the 80GB of HBM3 memory on a single NVIDIA H100) ➤ Modality: Image and text input with text output only ➤ Licensing: Apache 2.0 license

Español
0
0
13
476
ElliotSecOps
ElliotSecOps@ElliotSecOps·
Moonshot oficialmente aclara que Cursor usó su modelo Kimi-K2.5 para Composer 2! No cabe la menor duda, los modelos chinos ya no tienen nada que envidiarle a los modelos occidentales, la carrera se pone cada vez más interesante cada día
Kimi.ai@Kimi_Moonshot

Congrats to the @cursor_ai team on the launch of Composer 2! We are proud to see Kimi-k2.5 provide the foundation. Seeing our model integrated effectively through Cursor's continued pretraining & high-compute RL training is the open model ecosystem we love to support. Note: Cursor accesses Kimi-k2.5 via @FireworksAI_HQ ' hosted RL and inference platform as part of an authorized commercial partnership.

Español
3
6
54
1.6K
ElliotSecOps retweetet
Kimi.ai
Kimi.ai@Kimi_Moonshot·
Congrats to the @cursor_ai team on the launch of Composer 2! We are proud to see Kimi-k2.5 provide the foundation. Seeing our model integrated effectively through Cursor's continued pretraining & high-compute RL training is the open model ecosystem we love to support. Note: Cursor accesses Kimi-k2.5 via @FireworksAI_HQ ' hosted RL and inference platform as part of an authorized commercial partnership.
English
458
1.2K
18K
2.6M
ElliotSecOps
ElliotSecOps@ElliotSecOps·
@feregri_no pues le dices a la misma IA que escribió el codigo que lo debugee, haces un meta-prompting del error bien explicado con el diagnostico y las posibles soluciones, y ya está
Español
0
0
2
136
Antonio Feregrino
Antonio Feregrino@feregri_no·
A mis devs junior ya no les pregunto si usaron IA. Les pregunto si podrían debuggear lo que acaban de subir.
Español
16
11
548
32.1K
ElliotSecOps
ElliotSecOps@ElliotSecOps·
@DamianCatanzaro Puedes conectarlo sin API key? Yo tengo ChatGPT Business y ese me da Codex pero no me da un API key sino que me hace iniciar sesión con OAuth
Español
0
0
0
280
Damián Catanzaro ☕️
Damián Catanzaro ☕️@DamianCatanzaro·
No tienen que pagar nada extra para usar Codex/GPT 5.4 en OpenCode, si ya tienen la suscripción de ChatGPT web con eso lo pueden conectar y es aceptado por OpenAI.
Damián Catanzaro ☕️ tweet mediaDamián Catanzaro ☕️ tweet media
Español
7
1
79
6.5K
Faku
Faku@FakuCrypto·
si tus amigos no están hablando de: -claude code -creatina -openclaw -looksmaxxing -agentes de IA -mercados de predicción -mac mini es hora de buscar nuevos amigos
Español
103
219
2.8K
84.9K
ElliotSecOps
ElliotSecOps@ElliotSecOps·
Impresionante, MiniMax-M2.7 está a la par de GLM-5 y Codex-GPT-5.2xh por un tercio del costo, los modelos chinos están cerrando la brecha velozmente contra los modelos occidentales
Artificial Analysis@ArtificialAnlys

MiniMax has released MiniMax-M2.7, delivering GLM-5-level intelligence for less than one third of the cost MiniMax-M2.7 from @MiniMax_AI scores 50 on the Artificial Analysis Intelligence Index, an 8-point improvement over MiniMax-M2.5, which was released one month ago. This is driven by stronger performance on real-world agentic tasks and reduced hallucinations. MiniMax-M2.7 is now ahead of MiMo-V2-Pro (Reasoning, 49) and Kimi K2.5 (Reasoning, 47), and equivalent to GLM-5 (Reasoning, 50) while using 20% fewer output tokens and costing less than a third as much to run. MiniMax-M2.7 is a reasoning-only model and maintains the same per-token pricing as MiniMax-M2.5. Key takeaways: ➤ Strong performance on real-world agentic tasks: MiniMax-M2.7 achieves a GDPval-AA Elo of 1494, a significant improvement from MiniMax-M2.5 (1203) and ahead of MiMo-V2-Pro (Reasoning, 1426), GLM-5 (Reasoning, 1406), and Kimi K2.5 (Reasoning, 1283). It remains behind frontier models such as GPT-5.4 (xhigh, 1667) and Claude Opus 4.6 (Adaptive Reasoning, max effort, 1606) ➤ Reduced hallucinations: MiniMax-M2.7 scores +1 on the AA-Omniscience Index, up from MiniMax-M2.5 (-40). This is competitive with GPT-5.2 (xhigh, -1) and GLM-5 (Reasoning, +2), and well ahead of Kimi K2.5 (Reasoning, -8). The improvement from M2.5 is purely driven by reduced hallucinations, meaning the model is more likely to abstain from answering when it doesn’t know the answer, rather than guessing. M2.7 achieves a hallucination rate of 34%, lower than Claude Sonnet 4.6 (Adaptive Reasoning, max effort, 46%) and Gemini 3.1 Pro Preview (50%). ➤ Gains across most evaluations compared to MiniMax-M2.5: Outside of the GDPval-AA and AA-Omniscience improvements noted above, MiniMax-M2.7 improves in HLE (+9 p.p.), TerminalBench Hard (+5 p.p.), SciCode (+4 p.p.), IFBench (+4 p.p.), GPQA (+3 p.p.), and LCR (+3 p.p.). We saw a notable regression in τ²-Bench (-11 p.p.). ➤ Increased token use: MiniMax-M2.7 used ~87M output tokens to run the Artificial Analysis Intelligence Index, up 55% from MiniMax-M2.5 (~56M). It remains more token-efficient than other models such as GLM-5 (Reasoning, 110M) and Kimi K2.5 (Reasoning, ~89M) ➤ Leading cost efficiency: MiniMax-M2.7 cost $176 to run the Artificial Analysis Intelligence Index, maintaining the same $0.30/$1.20 per 1M input/output pricing as M2.5. This places it on the Pareto frontier of our Intelligence vs. Cost chart. For context, GLM-5 (Reasoning) cost $547 at equivalent intelligence, Kimi K2.5 (Reasoning) cost $371, and Gemini 3 Flash Preview (Reasoning) cost $278 Key model details: ➤ Context window: 200K tokens (equivalent to MiniMax-M2.5). ➤ Pricing: $0.30/$1.20 per 1M input/output tokens (unchanged from MiniMax-M2.5). ➤ Availability: MiniMax first-party API only. ➤ Modality: Text input and output only (no multimodality). ➤ Licensing: MiniMax has not announced whether MiniMax-M2.7 will be open weights. MiniMax-M2.5 is available under the MIT license.

Español
1
1
38
1.3K
ElliotSecOps
ElliotSecOps@ElliotSecOps·
@_denial Hermano en cristo eso es un código Javascript genuinamente de que estás hablando, skill issue confirmado
Español
0
0
1
23
daniel T.
daniel T.@_denial·
@ElliotSecOps "skill issue" 🥴🥴🥴 y es solo un markdown en el root del projecto, meco
English
1
0
0
31
ElliotSecOps
ElliotSecOps@ElliotSecOps·
El trabajo de los programadores no está en crear codigo con IA, ya eso lo hacen los agentes de forma excelsa El verdadero trabajo está en mantenerlo, desplegarlo, escalarlo, reforzarlo, y debuggearlo El problema ya no es sintaxis, es diseño, seguridad y arquitectura
Español
2
9
44
1.2K
ElliotSecOps
ElliotSecOps@ElliotSecOps·
@_denial Skill issue, es sabido que para trabajo serio hay que usar los modelos SOTA
Español
1
0
0
44
daniel T.
daniel T.@_denial·
@ElliotSecOps "excelsa" meco cuál es ai y cuál fue generado por el humano (?)
daniel T. tweet media
Español
1
0
0
43
ElliotSecOps
ElliotSecOps@ElliotSecOps·
@rcaceres_cl Qwen es rapido y preciso, pero requiere de mucha guía para trabajos de programación - 83 pts GLM es muy inteligente y más autónomo, pero puede ser lento y si usas mucho el WebFetch y análisis de imágenes puede alcanzar el rate limit muy rápido - 85 pts
Español
0
0
1
28
ElliotSecOps
ElliotSecOps@ElliotSecOps·
Qwen es el mejor modelo chino para redacción y textos en general, y GLM es el mejor modelo para código Kimi y Deepseek todavía no me terminan de convencer, pero reconozco el trabajo
Español
6
8
90
3.3K
ElliotSecOps
ElliotSecOps@ElliotSecOps·
@_Ivans_ No lo he probado, dicen que es muy bueno pero no me he tomado el tiempo de ponerlo a prueba
Español
0
0
1
20
ElliotSecOps
ElliotSecOps@ElliotSecOps·
@JesSC407 yo hablo con ambos modelos en español sin problema pero siento que las mejores respuestas me las da en ingles y GLM sigue mejor las instrucciones cuando le hablo en inglés
Español
0
0
0
18
agustina
agustina@aguusmood·
Busco un Senior DevOps/Infrastructure Engineer Stack: GCP, Kubernetes, Docker, Terraform/Terragrunt, CI/CD (GitLab), Linux y scripting (Bash). Experiencia manejando infraestructura cloud en producción. •Remoto - #LATAM •Full time •Horario flex •Inglés no necesario
Català
9
21
122
15.3K