James Shi

123 posts

James Shi

@shiqyy

curving data @datacurve

san francisco Katılım Nisan 2023

305 Takip Edilen215 Takipçiler

James Shi retweetledi

Datacurve@datacurve·1d

Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.

English

1.3K

629K

James Shi@shiqyy·4d

@danielchingwq @datacurve great to have u here!

English

Daniel Ching@danielchingwq·5d

joined @datacurve to work on shipping rl envs for the summer! time to grind and work on hard problems🫡

English

7.7K

James Shi retweetledi

Serena Ge (Datacurve)@serenaa_ge·5d

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

English

504

742

1.9M

James Shi retweetledi

brayden petersen ⁂@bmptrsn·5d

i’ve joined @datacurve to lead design in san francisco (as an intern)! check out our new site :)

English

315

21.4K

James Shi retweetledi

will depue@willdepue·24 May

academics are unprepared for the coming world where much scientific progress is majorly a function of inference compute. whether OpenAI points the Eye of Stargate at your particular field will decide its acceleration. talent will leach away into the labs. it's already begun

English

1.6K

605.6K

James Shi@shiqyy·6d

ever-improving models are a forcing function for us to live more deeply. automation buys us the time to go on longer walks, learn that new language, or play the piano. the richness the world has to offer floods back into our work. our ideas become more abstract, creative, better. better ideas build better models. then the cycle continues : D

English

114

James Shi retweetledi

Kevin Huang (Wenqi)@winkey_h·12 Nis

If Opus 4.6 has felt worse lately, this may be part of why: On a 100-task swe-bench pro sample, opus 4.6 on high finished behind both sonnet 4.6 and opus 4.5 on the same setting! Turning off adaptive reasoning didn’t seem to change much but max effort did

Evan You@evanyou

Honestly don't know what happened to Claude Code. Tried a one-off simple task on a fresh directory yesterday, tried a bunch of things that didn't work, asked for a ton of permissions, and then got stuck for 4 minutes before I got tired of waiting and killed the session. This was on medium effort. Switched to Codex gpt 5.4 with medium effort and one shotted the task in under 1 minute.

English

1.2K

James Shi retweetledi

luffy@0xluffy·24 Şub

every so often a human being does something that rewires how the rest of the world think about what's possible > neil armstrong stepped onto the moon > usain bolt ran 100m in 9.58s > hathor bjornsson deadlifted 501kg before these moments, the achievement existed only as a fairy tale, ambitious but delusional after? it became a target 4 minute mile, someone broke it, dozen followed. the ceiling wasn't physical, it was psychological. someone just had to go first but all these breakthroughs share underlying logic. push harder, grind, better outcome. peak human performance has always been measured by results. the blood, sweat and tears are just the admission ticket but alysa liu broke a different kind of ceiling. what she showed was not a new record. she showed that the highest form of human potential is enjoying the process. the courage to say if the pursuit of winning kills the joy of doing, you've already lost the thing that actually mattered we live in the age of AI. doomers are afraid of getting replaced. you tie your identity to your output. if you are programmer, claude code writes better lines faster. if you are an analyst, claude crunches numbers in seconds. what's left of you is a shell of nothingness alysa liu shows us that we have been asking the wrong question a machine can eventually land a triple axel triple lutz triple toe with perfection. but that doesn't compare to what alysa liu did. the falls, the morning ice, the moment your body finally understands the rotation. the meaning was never in the landing but the learning and act of doing it people who struggle most in this age are the ones who were already disconnected from the experience. the ones who were every only in there for the output, the status, the paycheck. blame AI all you want, but it did not create the emptiness. it just made it impossible to ignore the ones who would thrive are the ones who were already doing things because the doing itself was the point. the programmer who loves the puzzle. the filmmaker who writes because it's a story she wants to express. the violinist who finds something close to nirvana in the music. for them AI is just another tool in a practice that was always about something deeper than the outcome what alysa showed the world, with or without the medal is: decouple your worth from your output. the outcome was never the point. the act of doing, fully, happy, on your own teams, that was always the real feat. just like every other paradigm-breaking achievements before it, now someone has shown us it's possible, and the rest of us can follow

English

515

6.2K

568.6K

Serena Ge (Datacurve)@serenaa_ge·9 Eki

Today we’re announcing we’ve raised $17.5 million in funding across a $15M Series A led by Chemistry and a $2.7M Seed to accelerate foundation model progress through providing frontier training data for LLMs. When we first started Datacurve, it came from a simple realization: foundation model progress is limited not just by compute, but by data quality and complexity. The right data unlocks new capabilities, especially in coding, where accuracy and reasoning matter most. We’re now proud to partner with the world’s leading foundation-model labs, providing them with high-quality, complex training data that helps push the boundaries of what AI can do. This is still just the start. Come build the future of technology with us in San Francisco: datacurve.ai/careers Huge thanks to our incredible team and investors who’ve believed in us since day one and beyond: @garrytan at @ycombinator, @1vnzh from @cohere , @Mark_Goldberg_ from @chemistry_fund, @TheDerrickLi from @AforeVC, @forwarddeploy, @SoheilK, and @shyamalanadkat.

English

150

1.1K

449K

James Shi@shiqyy·9 Eki

@serenaa_ge LFG

365

James Shi retweetledi

Flowers ☾@flowersslop·21 Eyl

There’s this small niche of people with no technical background or flashy résumés, but who are obsessively into AI on a deep, non-technical level. They follow every new model, know their quirks and capabilities, and often end up knowing more about current systems than some computer science folks in the field, simply because they’re constantly experimenting and staying up to date. These people have valuable insights but no real way to apply them: labs don’t see their use, they lack the skills to build things themselves, and they’re not rich enough to hire others. Ironically, they may benefit from AGI more than most technical staff. And if labs were smart, they’d bring them in for their unique outside perspective and uncanny intuition about AI, even if it’s not technical.

English

343

345

599.3K

James Shi@shiqyy·24 Tem

this must be cluely's head of talent

English

1.1K

James Shi retweetledi

Serena Ge (Datacurve)@serenaa_ge·19 Haz

Datacurve is hiring a contract designer ASAP to work on our gamified problem solving / bounty platform! (which turns into data for the tier 1 labs) DM me!

English

10.2K

James Shi@shiqyy·14 Haz

started using orion browser again and its been the best experience ive had with a browser in ages: - nested tabs bar when cmd clicking - all tabs overview view when you pinch out - supports both firefox + chrome extensions (although most i've tried just don't work lol) plus its webkit and it looks fking amazing

English

399

James Shi@shiqyy·4 Haz

@serenaa_ge @0xluffy what the fuck

English

Serena Ge (Datacurve)@serenaa_ge·3 Haz

@0xluffy @shiqyy

QAM

122

James Shi@shiqyy·30 May

@BrianMiki03 lmao they might just have more taste than most devs in general

English

BrianMiki@BrianMiki03·30 May

Windsurf has better product taste then most devs building with it

English

550

James Shi@shiqyy·26 May

@0xluffy loy en

English

luffy@0xluffy·26 May

the lion this the lion that stop lion to yourself bro

English

3.6K

James Shi@shiqyy·22 May

woah sonnet 4 is supposedly first model to cross 80% on swe bench

Anthropic@AnthropicAI

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

English

383

James Shi retweetledi

Serena Ge (Datacurve)@serenaa_ge·14 May

Datacurve is hiring engineers and a designer to join us in SF!

Theo - t3.gg@theo

Do you work a big corporate tech job? Do you want to explore other options? I help a lot of start ups with hiring. The need for experienced engineers down to work in-person in SF is insane. If you’re an experienced dev who wants to see what’s available, DM me.

English

24.6K

James Shi@shiqyy·9 May

@Mahfuzurocks 🙇‍♂️🙇‍♂️

QME

Pogman@Mahfuzurocks·9 May

@shiqyy My company 🥰

English

James Shi@shiqyy·9 May

we are soo back lfg

English

289

Keşfet

@danielchingwq @datacurve @garrytan @ycombinator @1vnzh @cohere @Mark_Goldberg_ @TheDerrickLi