Naeem (@identity_matrix) - Twitter Profili | Zamantika Mersobahis Locabet

Naeem@identity_matrix·27m

@lorenlugosch @jasonderulo ?

QAM

0

1

Loren Lugosch@lorenlugosch·7h

Whoa: first author is a one-namer

English

1

0

13

2.4K

Naeem@identity_matrix·37m

Bonus, your competitors are in absolute dumpster fire. like ant and google. x.com/themarginguy/s…

Margins@themarginguy

We are seeing a sustained statistically significant degradation in Claude Code with Opus 4.7 since last Friday May 22nd at marginlab.ai/trackers/claud…

English

0

6

Naeem@identity_matrix·46m

The lead openai have is disgustingly Good. like, I don't know how they created their models such a beauty to work with. Plus, codex is the best harness [period] Never been this bullish on OpenAI.

English

1

0

1

16

Naeem@identity_matrix·1h

Google can stop horrible PR by releasing 3.5 pro ASAP

English

0

2

Naeem@identity_matrix·1h

@umeshjj7 UPI

0

11

uj@umeshjj7·2h

Canada has e-transfer 🇨🇦 America has cash app, venmo, zelle, tipping screens everywhere 🇺🇸 what does your country have?

English

19

1

33

2.3K

Naeem@identity_matrix·1h

@zxcodes Unless it's a multimodality or math benchmark

English

0

18

Mohammed Farmaan.@zxcodes·4h

I’ll trust any benchmark where Google’s models perform poorly.

Serena Ge (Datacurve)@serenaa_ge

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

English

2

0

2

414

Naeem@identity_matrix·3h

@basedjensen Probably otw to interview wenfeng

English

0

41

Hensen Juang@basedjensen·9h

Brah what kind of side quest is lex on ?

Tintin 叮叮@tindingtin

路上遇到两个外国人搭顺风车，看他们这么穷，能帮就帮一下吧。

English

6

0

46

8.5K

Naeem@identity_matrix·3h

@giordanorandone For UI I either use opus OR kimi/Mimo

Norsk

0

44

Giordano Randone@giordanorandone·10h

Do you prefer Composer 2.5 now, or do you still reach for Opus for UI-heavy work?

English

10

0

19

2.6K

Naeem@identity_matrix·3h

Minimax M3 and Qwen 3.7 Plus this week pleaseeee

English

0

37

Naeem@identity_matrix·3h

@nickbaumann_ What did you feed this beast? Damn

English

0

8

Nick@nickbaumann_·7h

Not surprising for people using these models

Serena Ge (Datacurve)@serenaa_ge

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

English

6

3

58

3.1K

Naeem@identity_matrix·4h

@adonis_singh Next version of claude, 100% on DeepSwe 👀

English

0

32

adi@adonis_singh·5h

the rankings are very good, but now that it has been released it will be fucked into oblivion

Serena Ge (Datacurve)@serenaa_ge

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

English

3

0

23

1.2K

Naeem@identity_matrix·5h

@MatternJustus Add qwen 3.7 max, It's a strong model

Magyar

0

95

Justus Mattern@MatternJustus·6h

Composer 2.5 outperforms all open source models and clearly beats its base model Kimi 2.5 as well as Kimi 2.6. It is roughly on par and slightly ahead of Gemini 3.1 Pro We still see a large gap between models from Anthropic / OpenAI and other labs

Proximal@ProximalHQ

Composer 2.5 is ranked #5 on FrontierSWE The model is broadly on par with Gemini 3.1 Pro, with a slight edge in our evaluation, and it beats all open source models. We still observe a significant performance gap between Composer and models from Anthropic and OpenAI

English

3

0

90

9.3K

Naeem@identity_matrix·5h

I hate bot-like speaking humans on this platform, insufferable.

English

0

4

Naeem@identity_matrix·5h

Wtf is growth hacking?

English

0

2

Naeem@identity_matrix·6h

@serenaa_ge Add qwen 3.7 max please

Filipino

0

33

Serena Ge (Datacurve)@serenaa_ge·11h

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.