Datacurve (@datacurve) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Datacurve@datacurve·1d

Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.

English

70

112

1.6K

736.8K

Datacurve@datacurve·1d

@winkey_h 🫡🫡

QME

0

821

Kevin Huang (Wenqi)@winkey_h·1d

Opus 4.8 is a clear efficiency step up over 4.7 on DeepSWE, same-or-better task success with fewer steps and fewer input tokens per task. Worth noting DeepSWE measures backend/implementation, not frontend. Hoping to release a deep dive soon.

Datacurve@datacurve

Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.

English

4

1

41

6.3K

Datacurve@datacurve·1d

@bmptrsn @winkey_h W animated graph 📊

English

1

0

11

447

brayden petersen ⁂@bmptrsn·1d

WEEEEEEEEEEE

Datacurve@datacurve

Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.

2

80

19.7K

Datacurve@datacurve·1d

Full deep dive coming soon. Check out the full benchmark here → deepswe.datacurve.ai

English

2

4

88

17.5K

Datacurve@datacurve·1d

Opus 4.8 delivers efficiency gains by solving tasks in fewer steps, directly reducing the total number of input tokens required per task.

English

8

175

106.2K

Datacurve@datacurve·1d

Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.

English

70

112

1.6K

736.8K

Datacurve retweetledi

Matthew Berman@MatthewBerman·4d

DeepSWE reflects what I’m hearing from engineers better than any other benchmark. They took the hard path to build a good one.

Serena Ge (Datacurve)@serenaa_ge

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

English

19

14

174

34.8K

Datacurve@datacurve·4d

@cbovolo @Neesh774 ok. datacurve.ai

2

0

3

56

Christopher Bovolos@cbovolo·5d

@Neesh774 @datacurve Actually insane Now do mobile

English

1

0

3

242

Neesh 🥭@Neesh774·5d

The new @datacurve site is insane

English

11

6

510

28.3K

Datacurve@datacurve·4d

@Neesh774 lots of love put in here! @bmptrsn @shiqyy @LeonardMainnet & albert 🩵 who said data has to be boring

English

1

0

5

363

Datacurve retweetledi

Garry Tan@garrytan·5d

This is the new standard for engineering evals

Serena Ge (Datacurve)@serenaa_ge

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

English

32

63

858

113.2K

Datacurve retweetledi

Serena Ge (Datacurve)@serenaa_ge·5d

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

English

506

746

6K

1.9M

Datacurve retweetledi

Serena Ge (Datacurve)@serenaa_ge·5 Nis

I presented today at Demo day Day 2 and @TechCrunch featured us @datacurve! Just been reading TC and listening to TC Daily Crunch since high school mornings... a surreal feeling to see us on it. Also, post-demo sadness cuz now YC is coming to an end

English

9

5

150

29.8K

Datacurve

Keşfet