soner

42 posts

soner

@sonercirit

@gencturk_resmi BT Lideri ve @hurriyetpartitr Üyesi | Kıdemli Yazılım Mühendisi ve Sistem Mimarı

the Netherlands Katılım Mayıs 2024

70 Takip Edilen40 Takipçiler

soner@sonercirit·1d

MiniMax-M2.7 fiyat/performans açısından inanılmaz "efficient" bir model olmuş. 🔥 Kendi AI agent ordumu kurmamın vakti geldi galiba. 🤖⚔️ @MiniMax_AI

Artificial Analysis@ArtificialAnlys

MiniMax has released MiniMax-M2.7, delivering GLM-5-level intelligence for less than one third of the cost MiniMax-M2.7 from @MiniMax_AI scores 50 on the Artificial Analysis Intelligence Index, an 8-point improvement over MiniMax-M2.5, which was released one month ago. This is driven by stronger performance on real-world agentic tasks and reduced hallucinations. MiniMax-M2.7 is now ahead of MiMo-V2-Pro (Reasoning, 49) and Kimi K2.5 (Reasoning, 47), and equivalent to GLM-5 (Reasoning, 50) while using 20% fewer output tokens and costing less than a third as much to run. MiniMax-M2.7 is a reasoning-only model and maintains the same per-token pricing as MiniMax-M2.5. Key takeaways: ➤ Strong performance on real-world agentic tasks: MiniMax-M2.7 achieves a GDPval-AA Elo of 1494, a significant improvement from MiniMax-M2.5 (1203) and ahead of MiMo-V2-Pro (Reasoning, 1426), GLM-5 (Reasoning, 1406), and Kimi K2.5 (Reasoning, 1283). It remains behind frontier models such as GPT-5.4 (xhigh, 1667) and Claude Opus 4.6 (Adaptive Reasoning, max effort, 1606) ➤ Reduced hallucinations: MiniMax-M2.7 scores +1 on the AA-Omniscience Index, up from MiniMax-M2.5 (-40). This is competitive with GPT-5.2 (xhigh, -1) and GLM-5 (Reasoning, +2), and well ahead of Kimi K2.5 (Reasoning, -8). The improvement from M2.5 is purely driven by reduced hallucinations, meaning the model is more likely to abstain from answering when it doesn’t know the answer, rather than guessing. M2.7 achieves a hallucination rate of 34%, lower than Claude Sonnet 4.6 (Adaptive Reasoning, max effort, 46%) and Gemini 3.1 Pro Preview (50%). ➤ Gains across most evaluations compared to MiniMax-M2.5: Outside of the GDPval-AA and AA-Omniscience improvements noted above, MiniMax-M2.7 improves in HLE (+9 p.p.), TerminalBench Hard (+5 p.p.), SciCode (+4 p.p.), IFBench (+4 p.p.), GPQA (+3 p.p.), and LCR (+3 p.p.). We saw a notable regression in τ²-Bench (-11 p.p.). ➤ Increased token use: MiniMax-M2.7 used ~87M output tokens to run the Artificial Analysis Intelligence Index, up 55% from MiniMax-M2.5 (~56M). It remains more token-efficient than other models such as GLM-5 (Reasoning, 110M) and Kimi K2.5 (Reasoning, ~89M) ➤ Leading cost efficiency: MiniMax-M2.7 cost $176 to run the Artificial Analysis Intelligence Index, maintaining the same $0.30/$1.20 per 1M input/output pricing as M2.5. This places it on the Pareto frontier of our Intelligence vs. Cost chart. For context, GLM-5 (Reasoning) cost $547 at equivalent intelligence, Kimi K2.5 (Reasoning) cost $371, and Gemini 3 Flash Preview (Reasoning) cost $278 Key model details: ➤ Context window: 200K tokens (equivalent to MiniMax-M2.5). ➤ Pricing: $0.30/$1.20 per 1M input/output tokens (unchanged from MiniMax-M2.5). ➤ Availability: MiniMax first-party API only. ➤ Modality: Text input and output only (no multimodality). ➤ Licensing: MiniMax has not announced whether MiniMax-M2.7 will be open weights. MiniMax-M2.5 is available under the MIT license.

Türkçe

soner@sonercirit·3d

Bir yandan açık kaynaklı modeller çok daha ucuza aynı performansı yakalamak için çalışırken rugpull yapacak kadar kozları olacak mı?

Kyle Gawley@kylegawley

I've noticed ~3x increase in token costs recently It's getting much more expensive to generate code The rug pull is coming

Türkçe

soner@sonercirit·3d

OpenAI kendi deyimiyle "harcore builders" kitlesini çektikçe modelleri için daha iyi eğitim verisi (training data) toplayabilecek. Veri en önemli şey. En tepedeki insanların verilerini toplamak çok daha değerli.

Sam Altman@sama

The Codex team are hardcore builders and it really comes through in what they create. No surprise all the hardcore builders I know have switched to Codex. Usage of Codex is growing very fast:

Türkçe

soner@sonercirit·5d

King's College London araştırması: "AI Arms and Influence" 🔗 arxiv.org/html/2602.1474…

Română

soner@sonercirit·5d

Bilim insanları üç yapay zekayı birbirine karşı savaştırdı. 21 oyunun 20'sinde nükleer silah kullanmayı tercih ettiler.

Türkçe

1.1K

soner@sonercirit·13 Mar

Şimdi pi.dev ile denedim, open source ve tool calling yapan modeller için auto selector çok daha iyi çalışıyo.

OpenRouter@OpenRouter

"Auto Exacto" is now live, and on by default for tool-calling requests. Over the last few days, OpenRouter has reduced tool error rates by 15-90% across providers automatically. Here's how it works:

Türkçe

soner@sonercirit·13 Mar

PostgreSQL + Object Storage (S3) is all you need. Dökümanlar object storage'a, kalanı direk PostgreSQL'e. Kuyruk sistemi için gene PG ile `SKIP LOCKED`.

Duca@big_duca

Postgres is just incredible. We avg ~500k-1M db writes a minute. And it just handles it like a champ. Incredible tek.

Türkçe

soner@sonercirit·13 Mar

@devagrawal09 Everyone defaults to python so it's hard to find all the equivalent libraries in TS. There are some GPU libraries for TS but I don't know if they compile to CUDA.

English

456

Dev Agrawal@devagrawal09·13 Mar

Does python ecosystem still have a massive edge over typescript for AI related libraries? Or is it possible to do serious AI/ML/data science in typescript these days?

English

8.3K

soner@sonercirit·11 Mar

DRY ve KISS'i yakında zaten direk system prompt'a eklicem. Söylemekten dilimde tüy bitti.

Christian Findlay@CFDevelop

DRY is critical for AI If you don’t constantly tell it not to duplicate code, you’ll be swamped in duplicate code

Türkçe

soner@sonercirit·11 Mar

Benim de fark ettiğim şeylerden biri bu oldu. Zaten 20'lik paketle bile ne kadar hızlı limite vurduğumu görünce anladım biraz. Ama 2x promo olduğunu bilmiyordum. Zaten GPT sanılanın aksine o kadar ucuz bir model değil. Tabii gene Opus'dan ucuz.

Gael Breton@GaelBreton

GPT 5.4 is great but damn it’s hungry with your Codex limits. There’s going to be a sea of salt in here when the 2x usage promo ends soon and everyone has to bump to $200/month and still hit their limits.

Türkçe

soner@sonercirit·11 Mar

@SMSTexts Kendime pi.dev ile özel bi workflow yaptım, her mesaj attığımda ototmatik plan oluşturuyo/güncelliyo. O baya yardımcı oldu.

Türkçe

botuhan@SMSTexts·9 Mar

@sonercirit aynen. gpt aşırı otistik. ama konuşmayı öğrenince claude’dan çok daha iyi

Türkçe

soner@sonercirit·8 Mar

Bu aralar Opus 4.6 yerine GPT 5.4 deniyorum. Kodda problem yok ama iletişim sıkıntı. Opus leb demeden leblebiyi anlıyor ama GPT ile ileri geri anlaşmaya çalışıyoruz. Ama işi bi kere anlayınca GPT daha iyi çıktılar üretiyor gibi.

Türkçe

soner@sonercirit·11 Mar

@alpoezcan Evet zaten bu incident sayıları da gitgide artmaya başladı. İnsanlar problemi AI'ın yazdığı koda atıyor ama asıl problem en başta kodun sahipliğini almayan yazılımcıda.

Türkçe

Alp Özcan@alpoezcan·11 Mar

Ben genelde yeni bir context window’da analiz ettiriyorum abi ama proje hakkında ve “yanlış gidebilicek şeyler” hakkında baya bi context vermem gerekiyo. Amazon da zaten PR reviewlerini sr. eng’lere onaylatma zorunluluğu getirmiş :D baya bi incident yaşamışlar.. x.com/anisha_moonka/…

Türkçe

soner@sonercirit·11 Mar

Bu konuda benim de tecrübem bu yönde. Önden bir AI modeline inceletmek bazı sorunları yakalayabiliyor ama genelde benim yakaladığım sorunları ıskalıyor. Ama zaten kendi işini inceleyip bütün sıkıntıları çözebildiği zaman işimiz sıkıntı. :)

Nils Adermann@naderman

Review 2 PRs for $37 this morning. Took 9 minutes on a 8 line template change, removing some unnecessary info, got a tiny nitpick out of it. Then 30 mins on a 15 file (+331/-153) PR where it missed the actual bug introduced, which manual review found quickly. Turning off for now.

Türkçe

168

soner@sonercirit·8 Mar

Bu araçlar siber saldırılar için yaygınlaştıkça, yapay zeka modellerini siber güvenlik amacıyla kullanmak da kaçınılmaz olacak.

Tib3rius@0xTib3rius

You may not like it, but this is what peak hacking looks like.

Türkçe

soner@sonercirit·8 Mar

Agent'ların kullanacağı izole ödeme ortamları ileride bi markete dönüşebilir.

0xMarioNawfal@RoundtableSpace

YOUR CLAUDE AGENT CAN NOW CREATE ONE-TIME VISA CARDS ON DEMAND JUST BY BEING ASKED. AGENTS THAT SPEND MONEY ONLINE WITHOUT EVER TOUCHING YOUR REAL CARD DETAILS. THE AGENTIC PAYMENTS SYSTEM IS HERE.

Türkçe

soner retweetledi

Hürriyet Partisi@HurriyetPartiTr·8 Mar

8 MART DÜNYA KADINLAR GÜNÜ KUTLU OLSUN! En yüksek övgülere değer mücadeleler vermiş Türk kadınlarının, eşitlik haklarını pek çok memleketten daha evvel kazanmaları ulusumuz için gurur vericidir. Cumhuriyet ile beraber kadınların hak kazanımları pekişmiş ve bizzat kadınların emeğiyle daima yükselmiştir. Ancak bu yükselişin zaman zaman sekteye uğradığı açık. Türkiye’de kadınlar bugün halen hayatın pek çok alanında güvensiz hissediyor. Ayrımcılığa maruz kalmak, taciz edilmek, şiddete uğramak, öldürülmek kadınların gerçeği olmaya devam ediyor. Her zorluğa göğüs gererek kız kardeşlerinden, annelerinden, hayallerinden, dayanışmalarından güç alan kadınların cesaretini ve mücadelesini kutluyoruz. Kuruluşundan bu yana kadınların başat rol aldığı Hürriyet Partisi olarak kadınların siyasete katılımını önemsiyor ve hak mücadelesini siyasete taşımak için çalışıyoruz. 8 Mart Dünya Kadınlar Günü kutlu olsun!

Türkçe

101

2.8K

soner retweetledi

Ali Gül@avaligul·7 Mar

Bir konserde “zıplamayan tayyipçi” dendiği için ilgili sanatçıyı “anayasal düzeni hedef almakla” suçlamak, en ağır istibdat rejimlerinde bile absürt olacak bir suçlamadır. Türkiye’de yargının ve devlet idaresinin geldiği nokta tek kelimeyle facia.

🎙️ Muhbir@ajansmuhbir1923

“Anayasal düzeni hedef aldığı” ve “Cumhurbaşkanına hakarete yönelik sözleri” iddiasıyla Hande Yener hakkında soruşturma başlatıldı. Soruşturmada, konserde atılan “Zıpla zıpla zıplamayan Tayyipçi” sloganı ve sahnedeki bazı sözler yer aldı. (Newspects)

Türkçe

163

1.2K

26K

soner@sonercirit·7 Mar

amplifying.ai/research/claud…

ZXX

soner@sonercirit·7 Mar

AI modellerinin tercihleri şimdiden gerçek ekonomiyi etkilemeye başladı. Resend bir ayda tam 2 kat haftalık yükleme almış. Claude Stack linkini aşağıya bırakıyorum.

Wes Bos@wesbos

The Claude Stack is real

Türkçe

Keşfet

@MiniMax_AI @devagrawal09 @SMSTexts @alpoezcan @elonmusk @BarackObama @taylorswift13 @cristiano