
Announcing a new division of Midjourney called "Midjourney Medical"
Elias Munk
484 posts


Announcing a new division of Midjourney called "Midjourney Medical"






Can coding agents stay coherent over a 1 billion token budget? Can they build Slack from scratch? Rewrite a JAX codebase in PyTorch? Build a C compiler in Rust? Enter SWE-Marathon: a benchmark for autonomous long-horizon software work.






We also asked forecasters to predict the longest 80% success time horizon achieved by the end of 2026. All three groups had medians between 3 and 4 hours, up from a baseline of 1.5 hours at the time of survey launch. During our survey period, METR updated its benchmark to include a preview of Anthropic's Mythos model. This achieved an 80% time horizon of 3 hours and 6 minutes, already within the range of median expert and superforecaster predictions for the end of 2026.