Cassian

297 posts

Cassian banner
Cassian

Cassian

@cassian_33

You only know my name, not my story. MOD @SentientAGI

Katılım Temmuz 2025
171 Takip Edilen355 Takipçiler
Bafspot | .ink
Bafspot | .ink@bafspot·
True intelligence starts when an AI faces something outside its data. That’s where it has to think, decide, and reason through problems. This is what sentient AGI is building and AI that can reason outside what is programmed in it. @SentientAGI @sentient_found @SentientEco
Bafspot | .ink tweet media
English
1
0
4
44
Cassian
Cassian@cassian_33·
@PreyWebthree Benchmarks show performance. CAB shows failure. And in real-world systems, failure modes matter more than scores.
English
0
0
0
8
Prey.gdp
Prey.gdp@PreyWebthree·
CryptoAnalystBench is one of the most practical frameworks I’ve seen for evaluating real-world LLM agents. Instead of testing isolated capabilities (RAG, search, tool use), it examines how models behave across long-form, open-ended tasks,where failures are subtle but critical. Key takeaways: • Evaluation gap is real – Most benchmarks reduce performance to a single score. CAB focuses on error types + impact, reflecting real production challenges. • LLM-as-a-judge works, with limits – ~93% accuracy detecting errors shows reliability, but low alignment with human scoring means subjective quality still needs human review. • Hallucination ≠ main issue anymore – Fabrications are rare (~1–2%). Bigger problems now are: – temporal staleness – uncited but correct claims – missing risk/context analysis – weak answer framing • Failures are nuanced & model-specific – Some models struggle with outdated info, others miss risk dimensions, and some fail in structuring/relevance. There’s no single dominant failure mode—just trade-offs. • Prompt-level interventions help – Structured APIs, timestamps/time windows, and decomposition (CoT-style) improve depth and accuracy. But smaller models can regress if over-constrained. Main takeaway: Improving model capability alone isn’t enough. High-stakes deployments require systems that: → detect their own errors → classify failure patterns → build feedback loops for continuous improvement Better outputs come from better debugging, not just bigger models. @SentientAGI @sentient_found
Prey.gdp tweet media
English
17
1
58
1.3K
Cassian
Cassian@cassian_33·
Who else misses those days: Timelane was full of nothing but @SentientAGI tweets🩷
Cassian tweet media
English
0
0
2
55
Cassian
Cassian@cassian_33·
@PreyWebthree Next-level AI: collaborative, verifiable, and adaptive. This is how intelligence evolves.
English
0
0
1
6
Prey.gdp
Prey.gdp@PreyWebthree·
Sentient AGI isn’t just another LLM. It represents an entirely new paradigm of intelligence. Traditional models rely on a single system to process and generate outputs. Sentient, however, operates as a distributed network of specialized agents,each designed for distinct tasks such as reasoning, validation, simulation, and iterative refinement. These agents don’t work in isolation; they collaborate, challenge, and improve each other’s outputs in real time. At the core of this architecture lies GRID,a decentralized coordination layer that dynamically routes tasks to the most capable agents based on context, complexity, and required expertise. This ensures not just efficiency, but optimal problem-solving across domains. On top of that, ROMA transforms complex objectives into structured sub-goals. By decomposing problems into manageable steps, it enables deep multi-step reasoning, adaptive planning, and more reliable decision-making in dynamic environments. With Open Deep Search, Sentient breaks free from the limitations of static training data. It actively explores real-world, continuously evolving information across open networks,bringing context-awareness and up-to-date intelligence into every interaction. Crucially, every output is cryptographically fingerprinted. This means intelligence is no longer a black box,it becomes verifiable, traceable, and trustworthy. And with native Web3 integration, Sentient goes beyond thinking. Agents can execute on-chain actions, coordinate autonomously, and actively participate in digital economies,turning intelligence into action. This isn’t artificial consciousness. It’s something far more impactful: collective, verifiable, and adaptive intelligence. @SentientAGI
Prey.gdp tweet media
English
21
3
77
1.3K
Goldy
Goldy@Golldyck·
Hey @SentientAGI Community let's play a game Look at the screen the question is simple: Where is Sentient Season 2? A: it hasn’t started yet B: it is already on C: it is ending soon D: season 3 is soon Our contestant has frozen with a coffee in hand Let's help Dobby make the right choice Vote in the comments 👇
Goldy tweet media
English
14
0
33
342
Sentient
Sentient@SentientAGI·
Here’s a look inside Sentient House at @thehousesf. For three weeks we’re building in Presidio, and everyone is welcome to drop by. Come hang, meet the cohort, and work on whatever you are passionate about. Whether you’re in the Arena or not, come say hi!
English
42
18
214
66.7K
Cassian
Cassian@cassian_33·
The next stage of AI isn’t just more powerful models. It’s agents that learn from their own mistakes and improve themselves. Sentient Arena is trying to push exactly that frontier. If this interests you, you can apply to @SentientAGI Arena ’s Cohort 0.
Sentient@SentientAGI

x.com/i/article/2031…

English
0
0
2
70
Cassian
Cassian@cassian_33·
Who missed Dobby? 👀 There was a time when @SentientAGI ’s mascot Dobby was everywhere on the timeline. Feels like it’s time to see him again soon… 🩷
Cassian tweet media
English
1
0
7
54
Cassian
Cassian@cassian_33·
Are we moving toward a future where AI agents learn from their own failures and discover new skills through evolutionary systems? The idea behind EvoSkill mentioned in @SentientAGI 's latest article is surprisingly simple but powerful. AI agents attempt tasks, their failures are analyzed, and new reusable skills are generated from those mistakes. Instead of discarding failures, the system treats them as learning signals. The skills produced are more than simple prompt adjustments. They are structured mini-protocols that include step-by-step procedures, verification checks, and sometimes tool usage during execution. Rather than solving everything alone, the agent can rely on these skills to act more systematically. This approach also helps address a common problem in coding agents. When a single agent tries to solve complex tasks end-to-end, errors can silently accumulate across multiple steps. EvoSkill tackles this by analyzing failures and generating new skills to avoid repeating the same mistakes. What’s interesting is that the discovered skills are not only useful for the original task. Some skills transfer zero-shot to other benchmarks, suggesting that the system learns general problem-solving strategies rather than task-specific tricks. Perhaps the most exciting idea is this: the future of AI agents may not depend only on model updates. Agents could gradually accumulate experience-based skills. We may even see shared skill libraries between AI agents, similar to how developers share packages today. It’s still early research, but it hints at something important: future AI agents might not just execute tasks, but continuously evolve by learning from their failures. To read the article: sentient.xyz/blog/evoskill-…
Cassian tweet media
English
2
0
5
52
LadyOnChain
LadyOnChain@Lady0nChain·
A notable study from @SentientAGI : EvoSkill Most people are still optimising prompts. EvoSkill directly optimizes the capabilities of agents. The system analyzes agent errors and automatically generates missing skills. The results are quite strong: • SealQA: +12,1% • OfficeQA: +7,3% (New SOTA) • BrowseComp: +5,3% (zero shot skill transfer) The most critical point: Developed skills can be transferred without changing between different tasks. This means the formation of real talent rather than excessive adaptation. And yes, the project is completely open source
LadyOnChain tweet media
English
13
0
22
171
Cassian
Cassian@cassian_33·
Gsenti to everyone who believes that Artificial Intelligence should be open-source 🩷 @SentientAGI
Cassian tweet media
English
3
0
14
237