Skyfall AI

218 posts

Skyfall AI banner
Skyfall AI

Skyfall AI

@skyfallai

Building enterprise super intelligence

San Francisco انضم Kasım 2024
100 يتبع332 المتابعون
تغريدة مثبتة
Skyfall AI
Skyfall AI@skyfallai·
The first real evidence that the days of LLM Scaling laws are over. Introducing SCOPE: the world's most efficient Neural Planner. 🔍📊 We tested SCOPE vs Frontier LLMs for planning tasks on TextCraft (text version of Minecraft) and here are the results: ⤵️ - SCOPE Runs 55x faster than GPT 3.5 (3 seconds vs 164 seconds) - SCOPE is 160,000 smaller than GPT 4o (11M parameters vs 1.8T parameters) - SCOPE is more accurate on Planning tasks (56%) than frontier LLM models The age of efficient AI models starts now. 🔗📌 Read the full write up here: skyfall.ai/blog/scope-hie…
English
30
50
109
48.4K
Skyfall AI
Skyfall AI@skyfallai·
The only thing LLM is good at is messing up and apologizing 🤡
Sam Pasupalak@spisallyouneed

All the hype on social media is that 'AGI is here' and all the white collar jobs are going to disappear soon. But we cannot have Claude Opus 4.6 do a simple browser automation task for doing a targeted email outreach (see screenshots). The email addresses and the email body were already loaded in a spreadsheet and Claude struggled to even have the basic fields loaded in the email so I have to revert back to using traditional outreach techniques. We are a long time away from machines being intelligent to the point of replacing all white collar jobs entirely. Sure, we have made a lot of progress in text generation, search, code generation and to a smaller extent in video generation but LLMs alone will never you get you to AGI because (in simplistic terms) they are a glorified pattern matcher on the entire web. Human intelligence is way more complex than a pattern matching algorithm because Human intelligence isn't just next-token prediction, but it involves - - Hierarchical Task Decomposition: The ability to break a high-level objective into sub-goals and execute them without cumulative error. - Closed-Loop Verification: Unlike LLMs, which suffer from autoregressive drift (hallucinations) while humans verify state at every step of a real world task. - Persistent State and Memory: We operate with a dynamic, long-term context module that doesn't flush when the token window hits a limit. - System 2 Reasoning: Moving beyond fast intuitive patterns to slow deliberate planning. LLMs are like a very smart football analyst who has read all the information about soccer by reading the football game manual and has seen Youtube videos of how the game is played but has never played a game in the real world. So the LLM doesn't know how to dribble the ball, how to make a pass or how to make a kick. In essence, an LLM understands the rule of the football but doesn't understand the physics of the game. On the other hand, humans have a 'World model' about the environment around them and how to interact with the real world. In order to become proficient in soccer, you have to actually practice soccer in the real world. You need to have an internal world model about the game and actually practice a lot of soccer moves (dribbling, passing, holding the ball, etc.) and keep failing until you learn all the basics of the game. Until we have the scaling laws moment for World Models, we have a long way to go for AGI to come to reality. Back to research.

English
1
0
2
179
Skyfall AI
Skyfall AI@skyfallai·
📣📣 We’re hiring for an exceptional Founding GTM Lead at @skyfallai If you want to build something real, work at the frontier of enterprise AI, and own a role that shapes the company’s trajectory, we’d like to meet you. You’ll be the first dedicated go-to-market hire, working directly with the founders towards a singular mission: Enterprise Super Intelligence. Apply if this is you: - Want a front row seat building the next Anthropic and OpenAI (except we’re actually an enterprise focused research lab, not for *military*) - Thrive in a 0 → 1 role and are comfortable with no playbook. In fact you hate playbooks and prefer building from scratch. - Want to own the full GTM cycle from landing our first customer to building the outbound motion that gets us to revenue. - Have built with AI tools firsthand, sold to SMBs or been a buyer. 👉 If that sounds like your thing or if you know someone who would be a good fit, check the full JD below in the comments and apply!
Skyfall AI tweet media
English
2
0
6
194
Skyfall AI أُعيد تغريده
Sam Pasupalak
Sam Pasupalak@spisallyouneed·
Happy Lunar New Year from the @skyfallai Toronto team! 🧧✨ Wishing everyone a year filled with good health, happiness and prosperity. May this new beginning bring meaningful connections, and success in everything you do. As we welcome the Year of the Horse, may we move forward with wisdom and grace. Gong Xi Fa Cai! 🎉🧨
Sam Pasupalak tweet media
English
0
1
4
314
Skyfall AI
Skyfall AI@skyfallai·
📣📣 World Model Team Hiring‼️ We are hiring for multiple Research Scientist (World Modeling) positions to join our All Star World Modeling team in Toronto (remote available). We're looking for candidates with either a PhD in Computer Science or related fields, OR proven industry experience in World Modeling, Causal Reasoning, Visual Language Models, or Graph-Based Reasoning. If you're tired of being another cog in a massive AI lab and actually want to make an impact on the future of AI, this is your shot to work directly with experienced founders who've been there before. We're serious about finding the best talent, which is why we're offering $20,000 USD in referral fees if you connect us with a Research Scientist we end up hiring. The companies dominating AI today won't be the ones defining it tomorrow, and we're building that future right now in Toronto / SF. Application details below. 👇
Skyfall AI tweet media
English
1
7
14
741
Skyfall AI
Skyfall AI@skyfallai·
👏👏
Sam Pasupalak@spisallyouneed

Super proud of our team at @skyfallai to present 3 research papers at @worldmodel_26 conference in @Mila_Quebec . It was amazing to talk to pioneers such as @ylecun @Yoshua_Bengio and other great researchers and build the future of the post LLM era. This is just the beginning, we have so many more exciting announcements over the next few months. Links - skyfall.ai/blog/wow-bridg… skyfall.ai/blog/pioneerin… skyfall.ai/blog/building-…

ART
1
0
0
190
Skyfall AI
Skyfall AI@skyfallai·
✨Why did we build WoW (World of Workflows)? Because current benchmarks fail to test constrained agentic task completion in a realistic enterprise environment with underlying workflows. WoW demonstrates how in order for agents to successfully and safely complete enterprise tasks, they need to understand the cascading effects of their actions.
Skyfall AI tweet media
English
1
0
2
118
Skyfall AI أُعيد تغريده
Jon Hernandez
Jon Hernandez@JonhernandezIA·
📁 Fei-Fei Li founder of World Labs, says the next leap in AI is not language. Human intelligence does not just speak, it moves, perceives, and acts in the physical world. Spatial intelligence is the real core of intelligence. From text to space, from models to 3D and 4D worlds, from understanding words to interacting with reality. The next chapter is not read, it is inhabited.
English
46
86
541
20.5K
Skyfall AI
Skyfall AI@skyfallai·
Failures in this setting are not reflected in the immediate feedback and often come much later in the trajectory. This is a common problem in a reward-sparse environment (often only a single signal in the end). To score multi-step failures, data needs to be deliberately curated and annotated, and a causal world model is potentially required to learn counterfactuals.
English
0
0
2
23
suyog trivedi
suyog trivedi@strivedi2505·
@skyfallai Real workflows expose problems you’d never catch in a prompt. How are you thinking about scoring multi-step failures?
English
1
0
1
44
Skyfall AI أُعيد تغريده
Skyfall AI
Skyfall AI@skyfallai·
Can we trust AI agents with critical enterprise tasks? Absolutely not. Introducing Wow (World of Workflows), the first Agentic Safety benchmark that proves that frontier LLMs fail miserably under safety constraints at enterprise tasks. 🧵 WoW demonstrates that LLM agents are “dynamically blind”. They fail to track the downstream ripple effects of their actions against complex enterprise rule sets. In an enterprise, that’s a safety and compliance hazard. Our research shows how the future of enterprise AI requires proactive agent architectures and Wow is just a starting point. 📌 It’s now available to all researchers at: github.com/Skyfall-Resear… Full blog here: skyfall.ai/blog/wow-bridg…
English
14
26
68
26.2K
Skyfall AI
Skyfall AI@skyfallai·
Our team is at the first World Modelling Conference at @Mila_Quebec this week, the same week we launched WoW (World of Workflows), a new AI safety benchmark for enterprise. If you’re working on world models, causal reasoning, or model-based RL, we’d love to chat. DM us to meet up and come say hi!
English
0
0
8
121
Skyfall AI
Skyfall AI@skyfallai·
Enterprise LLMs don’t fail because they’re weak. 🤖 They fail because they’re blind to the side effects of their actions. So we proposed a simple world modelling approach: Table Audit Logs Instead of guessing, agents observe reality: - Every database change - Every downstream effect - Every causal dependency This gives agents deeper visibility into the enterprise workflows and enables ongoing learning of system-level constraints and behaviours, transforming them from passive executors into reliable active learners.
Skyfall AI tweet media
English
2
1
4
104
Skyfall AI
Skyfall AI@skyfallai·
🤖 Let’s compare how frontier LLMs perform on WoW’s enterprise AI safety test. The numbers tell a bigger story: LLM agents are unreliable for enterprise-critical tasks. A model that only achieves 6% success rate cannot be trusted to operate autonomously in high-stakes environments. 🔺We break this down in more detail in our blog. Link in comments.
Skyfall AI tweet media
English
2
0
3
120