Emily Jones

2.7K posts

Emily Jones banner
Emily Jones

Emily Jones

@Emilyixg8a

No matter how fierce the storms you encounter, I hope you always possess the ability to find happiness within yourself

United States شامل ہوئے Şubat 2018
3K فالونگ1.2K فالوورز
پن کیا گیا ٹویٹ
Emily Jones
Emily Jones@Emilyixg8a·
A good mood is created.
Emily Jones tweet mediaEmily Jones tweet media
English
15
0
84
3.9K
Emily Jones ری ٹویٹ کیا
Andrej Karpathy
Andrej Karpathy@karpathy·
This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time. I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!
Claude@claudeai

Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision. The longer and more complex the task, the larger Fable 5’s lead over our other models.

English
1.3K
2.4K
25.4K
2.7M
Emily Jones ری ٹویٹ کیا
Ethan Mollick
Ethan Mollick@emollick·
There has been a push to use OpenEvidence AI for doctors. But this paper suggests general models are much better: “Frontier LLMs outperformed clinical AI tools in all three evaluations. Clinical AI tools performed comparably to auto-enabled Google Search AI Overview on the RCQ.”
Eric Topol@EricTopol

For medical information, general AI frontier models (Google, OpenAI, Anthropic) outperformed specialized @EvidenceOpen and @UpToDate as assessed by 12 US clinicians, randomized and blinded to which model and extensive testing/benchmarks. This was not anticipated. @NatureMedicine nature.com/articles/s4159…

English
40
48
428
71.9K
Emily Jones ری ٹویٹ کیا
Emily Jones ری ٹویٹ کیا
Michael Saylor
Michael Saylor@saylor·
The hardest thing in business is not seeing the future. It is surviving long enough to build it. My fireside chat with @Julian_Liniger at @BTCPrague on focus, endurance, corporate transformation, and how entrepreneurs can use Bitcoin, AI, and digital finance to create the next generation of products. Full interview below. 00:00 - Bitcoin as the dominant global Digital Capital network: 17 years, hundreds of billions invested, and a potential $100T opportunity 00:51 - Bitcoin near the 200-week moving average: why $BTC is more compelling after a 50% drawdown 01:52 - Strategy’s scale and the media narrative: from ~$600M enterprise value to as high as ~$120B 10:29 - Bitcoin fundamentals: economic empowerment, sovereign property rights, and the dominant digital monetary network 12:16 - Why there is no second best: Bitcoin as Digital Capital, Digital Money, and a potential $100T network 16:09 - Entrepreneur advice: build a simple product using new technology to solve a real problem 20:30 - Focus, endurance, and the danger of dilutive distractions 32:25 - What I would build today: AI plus Digital Assets, especially Digital Money and Digital Yield 33:27 - Digital Credit: taking a 40 vol asset, stripping it to ~4 vol, and creating new yield products 34:57 - Digital Money: 6–8% yield in major currencies with no volatility 38:05 - $STRC, $SATA, and the next layer of bitcoin-backed financial products 48:52 - Q&A: why Strategy sold 32 BTC and why bitcoin-backed capital must support credit and equity 59:29 - Q&A: Strategy as a shock absorber: selling 32 BTC while buying net ~250,000 BTC during the bear market 01:02:39 - Why public companies protect Bitcoin through accounting, tax, legal, political, and economic advocacy 01:07:58 - Strategy as the extension of the Bitcoin network into the free market system
English
304
387
2.9K
136.5K
Emily Jones ری ٹویٹ کیا
Nav Toor
Nav Toor@heynavtoor·
@OpenAI Building software is getting easier day by day.
English
0
5
1
1.3K
Emily Jones ری ٹویٹ کیا
Anthropic
Anthropic@AnthropicAI·
The speedup isn’t just in volume. On open-ended coding problems where answers are unclear, Claude’s success rate is now 76%—a 50 point jump in just 6 months. Many engineers also say Claude’s code quality is now on par with human code; we expect it to be better within the year.
Anthropic tweet media
English
44
115
2.2K
503K
Emily Jones ری ٹویٹ کیا
Phong Le
Phong Le@phongle·
My conversation with @scottmelker on @YahooFinance on @Strategy, the largest holder of Bitcoin in the world, why $GOOG and $MSTR use preferred equity, paying $STRC dividends, buying and selling $BTC, and $MSTR performance.
English
95
134
1K
120.7K
Emily Jones ری ٹویٹ کیا
Perplexity
Perplexity@perplexity_ai·
Two new ways to bring your health data into Perplexity. Perplexity now connects to Apple Health on iPhone. Use your sleep, activity, and HRV data in Computer. Function is now available in Perplexity Health. Add labs and ask about biomarkers, blood draws, or panel results.
English
52
60
624
60.8K
Emily Jones ری ٹویٹ کیا
Gemmie 🧡
Gemmie 🧡@CryptoGemmie_·
@cz_binance so true 💭 AI is amazing for generating snippets but if you just paste without reading you end up debugging a stranger's logic at 3am ✨ understanding > shipping fast
English
0
1
0
28
Emily Jones ری ٹویٹ کیا
Zengyi Qin
Zengyi Qin@qinzytech·
@gdb interesting
English
0
1
0
78
Emily Jones ری ٹویٹ کیا
Google Research
Google Research@GoogleResearch·
Welcome to CVPR! Google is proud to be a Platinum Sponsor of CVPR 2026 in Denver! 🏔️ Visit the Google booth (#557) @CVPR to see how we're pushing the boundaries of CV, see live demos and chat with our researchers. #CVPR2026 goo.gle/4tzRdFQ
Google Research tweet media
English
8
18
216
28.6K
Emily Jones ری ٹویٹ کیا
Emily Jones ری ٹویٹ کیا
Coinbase 🛡️
Coinbase 🛡️@coinbase·
Stablecoin demand is at an all-time high. With Coinbase Payments, businesses get a complete, out-of-the-box solution. Custody, compliance, settlement, fiat rails, and agentic commerce - all covered. You focus on building. We’ll handle the rest.
English
41
33
268
43.3K
Emily Jones ری ٹویٹ کیا
Bindu Reddy
Bindu Reddy@bindureddy·
🚨 FUSION AGENTS - Build Complete SaaS Apps With Open-Source AI Fusion agents combine Kimi 2.7 and GLM with Opus 4.8 and GPT 5.5 - multi-agent architectures with open-source sub-agents - build complete SaaS apps - one click connectors to 100+ products - create companion iOS and android apps with one prompt - accept stripe payments
English
32
55
492
2.6M
Emily Jones ری ٹویٹ کیا
Minn
Minn@minney_cat·
In the age of AI, human ingenuity becomes even more valuable. The likes of @elonmusk @spacex, @nvidia, @anthropic, @OpenAI were founded in the U.S. for a reason. Over the years, I've met thousands of talented founders, engineers, and technologists, and many left everything familiar for the chance to build something new. So we sat down to hear their stories. First Landing 👇
English
75
54
459
81.4K
Emily Jones ری ٹویٹ کیا
hardmaru
hardmaru@hardmaru·
For the past few years, humans have been doing “prompt engineering” to coax the best performance out of different LLMs. In this work, we explored what happens if we train an AI to do that job instead. By training a Conductor model with RL, we found that it naturally learns to write highly effective, custom instructions for a whole pool of other models. It essentially learns to ‘manage’ them in natural language. What surprised me most was how it dynamically adapts. For simple factual questions, it just queries one model. But for hard coding problems, it autonomously spins up a whole pipeline of planners, coders, and verifiers. Really excited to see where this paradigm of “AI managing AI” goes next, especially as we start moving from single-agent chain-of-thought to multi-agent “chain-of-command”. Link to our #ICLR2026 paper: arxiv.org/abs/2512.04388 Along with our TRINITY paper which we announced earlier, this work also powers our new multi-agent system: Sakana Fugu (sakana.ai/fugu-beta) 🐡
Sakana AI@SakanaAILabs

Introducing our new work: “Learning to Orchestrate Agents in Natural Language with the Conductor” accepted at #ICLR2026 arxiv.org/abs/2512.04388 What if we trained an AI not to solve problems directly, but to act as a manager that delegates tasks to a diverse team of other AIs? To solve complex tasks, humans rarely work alone; we form teams, delegate, and communicate. Yet, multi-agent AI systems currently rely heavily on rigid, human-designed workflows or simple routers that just pick a single model. We wanted an AI that could dynamically build its own team. We trained a 7B Conductor model using Reinforcement Learning to orchestrate a pool of frontier models (including GPT-5, Gemini, Claude, and open-source models available during the period leading up to ICLR 2026). Instead of executing code, the Conductor outputs a collaborative workflow in natural language. For any given question, the Conductor specifies: 1/ Which agent to call 2/ What specific subtask to give them (acting as an expert prompt engineer) 3/ What previous messages they can see in their context window Through pure end-to-end reward maximization, amazing behaviors emerged. The Conductor learned to adapt to task difficulty: it 1-shots simple factual questions, but autonomously spins up complex planner-executor-verifier pipelines for hard coding problems. The results are very promising: The 7B Conductor surpasses the performance of every individual worker model in its pool, setting new records on LiveCodeBench (83.9%) and GPQA-Diamond (87.5%) at the time of publication. It also significantly outperforms expensive multi-agent baselines like Mixture-of-Agents at a fraction of the cost. One of our favorite features: Recursive Test-Time Scaling! By allowing the Conductor to select itself as a worker, it reads its own team's prior output, realizes if it failed, and spins up a corrective workflow on the fly. This opens a new axis for scaling compute during inference. This research proves that language models can become elite meta-prompt engineers, dynamically harnessing collective intelligence. Alongside our TRINITY research which we announced a few days earlier, this foundational research powers our new multi-agent system: Sakana Fugu! (sakana.ai/fugu-beta) 🐡 OpenReview: openreview.net/forum?id=U23A2… (ICLR 2026)

English
40
181
1.4K
183.9K
Emily Jones ری ٹویٹ کیا
Emily Jones ری ٹویٹ کیا