Sapient Intelligence

19 posts

Sapient Intelligence banner
Sapient Intelligence

Sapient Intelligence

@Sapient_Int

We are building self-evolving Machine Intelligence to solve the world's most challenging problems.

Palo Alto, CA 가입일 Temmuz 2024
28 팔로잉1.6K 팔로워
Sapient Intelligence
Sapient Intelligence@Sapient_Int·
We were honored to support the global AI community as a Gold Sponsor of the #AAAI26 Conference on Artificial Intelligence. It was truly inspiring to connect with so many brilliant minds across the industry. The future of AGI isn’t just being imagined, it is being built.
Sapient Intelligence tweet mediaSapient Intelligence tweet media
English
0
1
8
434
Sapient Intelligence
Sapient Intelligence@Sapient_Int·
Proud to share that TRM, derived from our HRM model, is highlighted in Nature ! 🎉🎉🎉 This marks an important step forward for HRM-based reasoning systems, demonstrating the strength of small, structured models in complex reasoning tasks.💡
Sapient Intelligence tweet media
English
2
1
9
805
Sapient Intelligence
Sapient Intelligence@Sapient_Int·
🔥It’s official-Sapient HRM Discord Community is now live! This is a place to discuss, connect, and collaborate as we shape HRM’s future together. We will be sharing our latest work, releases, and tips, as well as hosting Q&A sessions💬💬 Hop on this journey with us as we push the boundaries of what HRM and AGI at large can achieve!🙌 ➡️Join us on Discord here discord.gg/sapient
Sapient Intelligence tweet media
English
1
5
36
4.4K
Sapient Intelligence
Sapient Intelligence@Sapient_Int·
Google is expanding its AI mode to 180 countries, offering users a personalized restaurant reservation service. By providing specific details such as date, time, and number of people, AI can precisely filter and recommend restaurants that meet the criteria, greatly enhancing the efficiency and personalization of the search experience. This innovation demonstrates the potential of AI in daily life, further integrating artificial intelligence with user needs. However, as AI continues to enhance its personalized recommendations and predictive capabilities, balancing data privacy, user autonomy, and technological advancement remains an important issue for the industry to continuously address.theverge.com/news/763367/go…
English
0
0
8
1.3K
Sapient Intelligence
Sapient Intelligence@Sapient_Int·
We are on the @arcprize leaderboard now - a good starting point! Meanwhile, we are accelerating the iteration and application of the HRM model; stay tuned!
Guan Wang@makingAGI

Thanks to @arcprize for reproducing and verifying the results! ARC-AGI-1: public 41% pass@2 - semi private 32% pass@2 ARC-AGI-2: public 4% pass@2 - semi private 2% pass@2 Due to differences in testing environments, a certain amount of variance in results is acceptable. According to tests run on our infrastructure, the open-source version of HRM on our GitHub can achieve a score of 5.4% pass@2 on the ARC-AGI-2. We welcome everyone to run it on your own infra and share your scores~ This is our first submission to the leaderboard, and it's a good starting point. We appreciate everyone for your support and feedback on HRM, both before and after our appearance on the ARC leaderboard. All of this encourages and motivates us to improve. The hierarchical architecture is designed to resolve premature convergence in long-horizon tasks, like master-level Sudoku that takes hours for humans to solve. See the comparison with a simple recurrent Transformer. Such a long chain might not be essential for ARC problems, and we only used a high-low ratio of 1/2. Larger ratios are often needed for optimal performance for Sudoku problems. In the case of ARC-AGI, the success of HRM is a testament to the model's ability to exhibit fluid intelligence - that is, its capability to infer and apply abstract rules from independent and flat examples. We are glad it was discovered in a recent blog post that the outer loop and data augmentation are essential for this ability, and we especially thank @fchollet @GregKamradt @k_schuerholt for pointing this out. Finally, we are accelerating the iteration of the HRM model and continuously pushing its limits, with good progress so far. At the same time, we believe the hierarchical architecture is highly effective in many scenarios. Moving forward, we will make further targeted updates to the architecture and validate it on more applications. We will also release an FAQ to address the key questions raised by the community. 🧠 Stay tuned!

English
1
1
18
1.7K
Sapient Intelligence
Sapient Intelligence@Sapient_Int·
Bigger ≠ Better. The GPT-5 rollout reminded everyone that raw scale isn’t a strategy. Real value now lives in agent reliability, not leaderboard one-shots. Our stance: optimize for closed-loop task success (plans → tools → checks → handoff), not just next-token accuracy. We benchmark Sapient HRM against process metrics: tool-call precision, recovery after tool error, and end-to-end SLA success.
English
4
2
22
1.5K
Sapient Intelligence
Sapient Intelligence@Sapient_Int·
At the “Beyond Human: AGI And The Future We’re Building” Town Hall at Fortune Brainstorm AI 2025 in Singapore, William outlined our thoughts for AGI: “We’re exploring new architectures to push the boundaries - to make AI think like a human, not just model probabilities. True AGI will not only advance the AI frontier but also help with everyday tasks and generate real‑world revenue.” #AGI #FortuneAISingapore Want to see the full discussion? 📺 Watch here: fortune.com/videos/watch/t…
Sapient Intelligence tweet media
English
4
6
35
3.4K
Sapient Intelligence
Sapient Intelligence@Sapient_Int·
Our co-founder William Chen is going to share more about the open-sourced Hierarchical Reasoning Model (HRM) at #FortuneAISingapore @FortuneMagazine tomorrow, under the panel theme "Beyond Human: AGI And The Future We’re Building"! We are excited about the practical path towards universally capable reasoning systems that rely on architectures, not scale, to reach real AGI. ⏰16:10-16:40 SGT, July 23, Mainstage
Sapient Intelligence tweet media
English
5
8
32
7.7K
Sapient Intelligence
Sapient Intelligence@Sapient_Int·
Hierarchical Recurrent Models towards AGI Excited to have Sapient Intelligence’s Yue Wu share insights on our Sapient-H Architecture alongside @nvidia in Wuhan, China. Stay tuned—something interesting is brewing!
Sapient Intelligence tweet mediaSapient Intelligence tweet mediaSapient Intelligence tweet media
English
1
2
16
2.5K
Sapient Intelligence
Sapient Intelligence@Sapient_Int·
Add on to point #1 As for DeepSeek's fine-grained MoE architecture, the first advantage is that it can reduce the communication volume of MoE's dispatch and combine by 50% compared to BF16, while based on DeepSeek's tech report, the communication-to-computation ratio is roughly 1:1 with FP8. The second advantage is that FP8 GEMMs are faster than BF16 GEMMs on Hopper GPUs (hardware spec is 2× throughput, but the practical speedup is lower, and DeepSeek adopted an online block-wise/tile-wise quantization strategy which has a larger overhead). The third advantage is memory saving, which can be translated into training efficiency by e.g., increasing the number of micro-batches in PP.
English
0
2
5
1.7K
Sapient Intelligence
Sapient Intelligence@Sapient_Int·
Respectfully disagree: 1. Most SoTA models are trained in BF16 (some operation is mixed precision, but main activations and GEMMs are in BF16), so it's not a FP32→FP8 leap. Also, memory savings won't directly translate into training efficiency. 2. DeepSeek won't do this compression during pretraining. However, the low rank structure of the Q/K/V projection can maintain a low computation cost while increasing the number of attention heads (DeepSeek-R1 has significantly more attention heads than Qwen/Llama), which can increase the capacity of the model. Of course, this optimization can help RL rollouts, but DeepSeek didn't disclose its RL training efficiency. 3. Inference speed can only help RL rollouts, but DeepSeek didn't disclose its RL training efficiency. MTP won't make pretraining faster, but it will make pretraining better -- effectively make it more efficient. 4. DeepSeek didn't train their model on consumer-grade GPUs.
Jared Friedman@snowmaker

Lots of hot takes on whether it's possible that DeepSeek made training 45x more efficient, but @doodlestein wrote a very clear explanation of how they did it. Once someone breaks it down, it's not hard to understand. Rough summary: * Use 8 bit instead of 32 bit floating point numbers, which gives massive memory savings * Compress the key-value indices which eat up much of the VRAM; they get 93% compression ratios * Do multi-token prediction instead of single-token prediction which effectively doubles inference speed * Mixture of Experts model decomposes a big model into small models that can run on consumer-grade GPUs

English
2
2
27
10K
Sapient Intelligence
Sapient Intelligence@Sapient_Int·
Greetings from Sapient Intelligence! Happy New Year! LFG AGI 2025!
Sapient Intelligence tweet media
English
0
2
17
2.8K