
Sapient Intelligence
28 posts

Sapient Intelligence
@Sapient_Int
Building efficient & powerful general intelligence through brain-inspired architecture






















Hierarchical reasoning works well on large language models!🎉



Thanks to @arcprize for reproducing and verifying the results! ARC-AGI-1: public 41% pass@2 - semi private 32% pass@2 ARC-AGI-2: public 4% pass@2 - semi private 2% pass@2 Due to differences in testing environments, a certain amount of variance in results is acceptable. According to tests run on our infrastructure, the open-source version of HRM on our GitHub can achieve a score of 5.4% pass@2 on the ARC-AGI-2. We welcome everyone to run it on your own infra and share your scores~ This is our first submission to the leaderboard, and it's a good starting point. We appreciate everyone for your support and feedback on HRM, both before and after our appearance on the ARC leaderboard. All of this encourages and motivates us to improve. The hierarchical architecture is designed to resolve premature convergence in long-horizon tasks, like master-level Sudoku that takes hours for humans to solve. See the comparison with a simple recurrent Transformer. Such a long chain might not be essential for ARC problems, and we only used a high-low ratio of 1/2. Larger ratios are often needed for optimal performance for Sudoku problems. In the case of ARC-AGI, the success of HRM is a testament to the model's ability to exhibit fluid intelligence - that is, its capability to infer and apply abstract rules from independent and flat examples. We are glad it was discovered in a recent blog post that the outer loop and data augmentation are essential for this ability, and we especially thank @fchollet @GregKamradt @k_schuerholt for pointing this out. Finally, we are accelerating the iteration of the HRM model and continuously pushing its limits, with good progress so far. At the same time, we believe the hierarchical architecture is highly effective in many scenarios. Moving forward, we will make further targeted updates to the architecture and validate it on more applications. We will also release an FAQ to address the key questions raised by the community. 🧠 Stay tuned!


