Skyfall AI

218 posts

Skyfall AI

@skyfallai

Building enterprise super intelligence

San Francisco انضم Kasım 2024

100 يتبع332 المتابعون

تغريدة مثبتة

Skyfall AI@skyfallai·7 Oca

The first real evidence that the days of LLM Scaling laws are over. Introducing SCOPE: the world's most efficient Neural Planner. 🔍📊 We tested SCOPE vs Frontier LLMs for planning tasks on TextCraft (text version of Minecraft) and here are the results: ⤵️ - SCOPE Runs 55x faster than GPT 3.5 (3 seconds vs 164 seconds) - SCOPE is 160,000 smaller than GPT 4o (11M parameters vs 1.8T parameters) - SCOPE is more accurate on Planning tasks (56%) than frontier LLM models The age of efficient AI models starts now. 🔗📌 Read the full write up here: skyfall.ai/blog/scope-hie…

English

109

48.4K

Skyfall AI@skyfallai·4d

The only thing LLM is good at is messing up and apologizing 🤡

Sam Pasupalak@spisallyouneed

All the hype on social media is that 'AGI is here' and all the white collar jobs are going to disappear soon. But we cannot have Claude Opus 4.6 do a simple browser automation task for doing a targeted email outreach (see screenshots). The email addresses and the email body were already loaded in a spreadsheet and Claude struggled to even have the basic fields loaded in the email so I have to revert back to using traditional outreach techniques. We are a long time away from machines being intelligent to the point of replacing all white collar jobs entirely. Sure, we have made a lot of progress in text generation, search, code generation and to a smaller extent in video generation but LLMs alone will never you get you to AGI because (in simplistic terms) they are a glorified pattern matcher on the entire web. Human intelligence is way more complex than a pattern matching algorithm because Human intelligence isn't just next-token prediction, but it involves - - Hierarchical Task Decomposition: The ability to break a high-level objective into sub-goals and execute them without cumulative error. - Closed-Loop Verification: Unlike LLMs, which suffer from autoregressive drift (hallucinations) while humans verify state at every step of a real world task. - Persistent State and Memory: We operate with a dynamic, long-term context module that doesn't flush when the token window hits a limit. - System 2 Reasoning: Moving beyond fast intuitive patterns to slow deliberate planning. LLMs are like a very smart football analyst who has read all the information about soccer by reading the football game manual and has seen Youtube videos of how the game is played but has never played a game in the real world. So the LLM doesn't know how to dribble the ball, how to make a pass or how to make a kick. In essence, an LLM understands the rule of the football but doesn't understand the physics of the game. On the other hand, humans have a 'World model' about the environment around them and how to interact with the real world. In order to become proficient in soccer, you have to actually practice soccer in the real world. You need to have an internal world model about the game and actually practice a lot of soccer moves (dribbling, passing, holding the ball, etc.) and keep failing until you learn all the basics of the game. Until we have the scaling laws moment for World Models, we have a long way to go for AGI to come to reality. Back to research.

English

179

Skyfall AI@skyfallai·9 Mar

Apply here: skyfall.zohorecruit.ca/jobs/Careers/4…

English

211

Skyfall AI@skyfallai·9 Mar

📣📣 We’re hiring for an exceptional Founding GTM Lead at @skyfallai If you want to build something real, work at the frontier of enterprise AI, and own a role that shapes the company’s trajectory, we’d like to meet you. You’ll be the first dedicated go-to-market hire, working directly with the founders towards a singular mission: Enterprise Super Intelligence. Apply if this is you: - Want a front row seat building the next Anthropic and OpenAI (except we’re actually an enterprise focused research lab, not for *military*) - Thrive in a 0 → 1 role and are comfortable with no playbook. In fact you hate playbooks and prefer building from scratch. - Want to own the full GTM cycle from landing our first customer to building the outbound motion that gets us to revenue. - Have built with AI tools firsthand, sold to SMBs or been a buyer. 👉 If that sounds like your thing or if you know someone who would be a good fit, check the full JD below in the comments and apply!

English

194

Skyfall AI أُعيد تغريده

Sam Pasupalak@spisallyouneed·20 Şub

Happy Lunar New Year from the @skyfallai Toronto team! 🧧✨ Wishing everyone a year filled with good health, happiness and prosperity. May this new beginning bring meaningful connections, and success in everything you do. As we welcome the Year of the Horse, may we move forward with wisdom and grace. Gong Xi Fa Cai! 🎉🧨

English

314

Skyfall AI@skyfallai·11 Şub

📣📣 World Model Team Hiring‼️ We are hiring for multiple Research Scientist (World Modeling) positions to join our All Star World Modeling team in Toronto (remote available). We're looking for candidates with either a PhD in Computer Science or related fields, OR proven industry experience in World Modeling, Causal Reasoning, Visual Language Models, or Graph-Based Reasoning. If you're tired of being another cog in a massive AI lab and actually want to make an impact on the future of AI, this is your shot to work directly with experienced founders who've been there before. We're serious about finding the best talent, which is why we're offering $20,000 USD in referral fees if you connect us with a Research Scientist we end up hiring. The companies dominating AI today won't be the ones defining it tomorrow, and we're building that future right now in Toronto / SF. Application details below. 👇

English

741

Skyfall AI@skyfallai·11 Şub

📍Apply here direct: skyfall.zohorecruit.ca/jobs/Careers/4…

English

131

Skyfall AI@skyfallai·6 Şub

👏👏

Sam Pasupalak@spisallyouneed

Super proud of our team at @skyfallai to present 3 research papers at @worldmodel_26 conference in @Mila_Quebec . It was amazing to talk to pioneers such as @ylecun @Yoshua_Bengio and other great researchers and build the future of the post LLM era. This is just the beginning, we have so many more exciting announcements over the next few months. Links - skyfall.ai/blog/wow-bridg… skyfall.ai/blog/pioneerin… skyfall.ai/blog/building-…

ART

190

Skyfall AI@skyfallai·5 Şub

👉Blog: skyfall.ai/blog/wow-bridg…

English

Skyfall AI@skyfallai·5 Şub

✨Why did we build WoW (World of Workflows)? Because current benchmarks fail to test constrained agentic task completion in a realistic enterprise environment with underlying workflows. WoW demonstrates how in order for agents to successfully and safely complete enterprise tasks, they need to understand the cascading effects of their actions.

English

118

Skyfall AI أُعيد تغريده

Jon Hernandez@JonhernandezIA·4 Şub

📁 Fei-Fei Li founder of World Labs, says the next leap in AI is not language. Human intelligence does not just speak, it moves, perceives, and acts in the physical world. Spatial intelligence is the real core of intelligence. From text to space, from models to 3D and 4D worlds, from understanding words to interacting with reality. The next chapter is not read, it is inhabited.

English

541

20.5K

Skyfall AI@skyfallai·4 Şub

Failures in this setting are not reflected in the immediate feedback and often come much later in the trajectory. This is a common problem in a reward-sparse environment (often only a single signal in the end). To score multi-step failures, data needs to be deliberately curated and annotated, and a causal world model is potentially required to learn counterfactuals.

English

suyog trivedi@strivedi2505·3 Şub

@skyfallai Real workflows expose problems you’d never catch in a prompt. How are you thinking about scoring multi-step failures?

English

Skyfall AI أُعيد تغريده

Skyfall AI@skyfallai·2 Şub

Can we trust AI agents with critical enterprise tasks? Absolutely not. Introducing Wow (World of Workflows), the first Agentic Safety benchmark that proves that frontier LLMs fail miserably under safety constraints at enterprise tasks. 🧵 WoW demonstrates that LLM agents are “dynamically blind”. They fail to track the downstream ripple effects of their actions against complex enterprise rule sets. In an enterprise, that’s a safety and compliance hazard. Our research shows how the future of enterprise AI requires proactive agent architectures and Wow is just a starting point. 📌 It’s now available to all researchers at: github.com/Skyfall-Resear… Full blog here: skyfall.ai/blog/wow-bridg…

English

26.2K

Skyfall AI@skyfallai·4 Şub

@arna_ghosh Thanks for sharing!

English

Arna Ghosh@arna_ghosh·3 Şub

Making LLMs safe for enterprise applications requires proper benchmarking and understanding of downstream effects of actions. Fascinating work from the @skyfallai team!🚀

Skyfall AI@skyfallai

English

164

Skyfall AI@skyfallai·4 Şub

Our team is at the first World Modelling Conference at @Mila_Quebec this week, the same week we launched WoW (World of Workflows), a new AI safety benchmark for enterprise. If you’re working on world models, causal reasoning, or model-based RL, we’d love to chat. DM us to meet up and come say hi!

English

121

Skyfall AI@skyfallai·4 Şub

WoW is also available to all researchers on Github at: github.com/Skyfall-Resear…

English

Skyfall AI@skyfallai·4 Şub

Enterprise LLMs don’t fail because they’re weak. 🤖 They fail because they’re blind to the side effects of their actions. So we proposed a simple world modelling approach: Table Audit Logs Instead of guessing, agents observe reality: - Every database change - Every downstream effect - Every causal dependency This gives agents deeper visibility into the enterprise workflows and enables ongoing learning of system-level constraints and behaviours, transforming them from passive executors into reliable active learners.

English

104

Skyfall AI@skyfallai·4 Şub

For the full break down, check our blog: skyfall.ai/blog/wow-bridg…

English

Skyfall AI@skyfallai·3 Şub

Github: github.com/Skyfall-Resear…

English

Skyfall AI@skyfallai·3 Şub

🤖 Let’s compare how frontier LLMs perform on WoW’s enterprise AI safety test. The numbers tell a bigger story: LLM agents are unreliable for enterprise-critical tasks. A model that only achieves 6% success rate cannot be trusted to operate autonomously in high-stakes environments. 🔺We break this down in more detail in our blog. Link in comments.

English

120

اكتشف

@arna_ghosh @Mila_Quebec @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA