Unify retweetet
Unify
260 posts

Unify retweetet

Not engaged with Twitter in a lonnnng time, but for anyone interested, I've decided to start doing regular (unfiltered) posts of my own experience onboarding a fully virtual colleague. AGI is certainly not solved (yet), and so I'll focus on what works well, what doesn't work well, and where the biggest gaps are 🔍
In this video I'm just setting the scene, explaining the basics and hiring Rachel (no fireworks quite yet). In the next videos I'll give her access to everything and start to see how well she *really* learns and internalizes the nuances of my own day-to-day, how she fares when the number of different "flows" keep piling up, and how conversational she can be whilst navigating all of this.
On the more technical side, I'm interested:
1) How well do the underlying semantic + symbolic DB storage and search mimic implicit skill storage and memory retrieval that a person would have (DB reads/writes are much less efficient and less coupled than an end-to-end jointly trained implicit memory module would be, more like how our own connectionist brains work)
2) Can a hierachy of fast-thinking (less intelligent) and slow-thinking (smarter-model) sub-agents communicating with one-another really feel as conversational as a real person with their single brain (again, of course not, but how close can we get with a tiered thinking-fast + thinking-slow design for smooth conversation management?)
3) Can repeated post-action storage of skills and functions with continual self-refactoring improve speed and efficiency for future actions (not burning through tokens re-discovering the same thing every time)? How does this scale as the number of self-stored skills and functions grow? Do the embeddings and semantic retrieval hold up when there are maybe 100s of entries?
We've seen very good results on all fronts for smaller-scale tasks (which would take a person a few hours), and it's also worked well when continually learning over the course of a few weeks. Despite this, the above questions remain open, and I'm curious to see how they hold up as I start this longer-running experiment.
Watch along with some of these vids if interested; or scroll right on by if not 😁
Thoughts + feedback welcome as always! 🫶
English
Unify retweetet

We've been heads down building for the past few months (custom stack, not OpenClaw 🦞), and I'm excited to finally launch our virtual teammates! Huge shout out to the team (and many long nights) to get us here ❤️💪
You onboard you new teammate exactly how you'd onboard any other new colleauge. Share your screen and guide them through, send onboarding docs, record voice notes, hop on a call, whatever is easiest. They learn how you (and your team) works, and they continually reflect, ask follow up questions, and improve over time 📈
We built our own stack from scratch because we wanted something that genuinely feels like a colleague, with a fully realtime “there in the room with you” experience. This requires more than a flat tool loop with pluggable skills. We use top-down (ask, interject, pasue, resume, stop) and bottom-up (notify, request_clarification) steerable handles throughout a nested call stack of sub-agents, with concurrent multi-task execution, and a code-first (not JSON tool) engine powering every action. All of this lives inside the terminal and/or live python sessions, and each in a dedicated per-agent computer and filesystem 📟
In practice, this means your new colleague can be simultaneously using their own computer, talking to you via voice over a live meet, following your own guided screenshare instructions, working across multiple concurrent tasks, and consolidating all of these into new skills on-the-fly. They can be interrupted and redirected at any point in time, and they’re continually chunking all of their experience into reusable skills. People don’t perform tasks in “prompt then execute” windows, and neither should your virtual colleagues in our view.
We're really happy with the feedback we’ve received thus far. We’ve helped several teams (in real estate, finance, and housing) streamline day-to-day processes which would have been difficult to “prompt” into hand-crafted skills, because these tasks are hard to fully articulate upfront. They require continual judgment, context, and incremental back-and-forth work with people to really learn and internalize what's needed over time.
The best feedback we've received (which makes us most excited 👀) is that the colleaue is already much better on day 2 than on day 1, and then even better on day 3, with a hollistic understand evolving quickly and organically 🧠
If you're curious to see how it works, then give it a try with this free credit link!
console.unify.ai/assistants?tok…
I would love to hear people's honest thoughts (both positive and negative) 🙏
ps we're also live on Product Hunt, so any feedback or support here would also be appreciated: producthunt.com/products/unify…
Thanks! 🫶
English
Unify retweetet

MCP servers are ONLY as good as their abstractions 🧱 and docs 📄. The official MCP for Google Drive fails at even the most basic tasks (see video). Building an MCP server is VERY EASY. Crafting the correct abstractions is VERY HARD. Very few servers are production ready; most are just POCs (not a criticism, this is their intention). Benchmarking and evals are not only important for system prompts, but will also be increasingly important for MCP designs. Exciting times ahead! 👀
English
Unify retweetet

Unify (@letsunifyai) is building Notion for AI observability— with a lightweight, hackable, fast, and flexible framework.
It's built for products with or without LLMs, letting you focus on the data, plots, and metrics that matter.
ycombinator.com/launches/N5M-u…
GIF
English
Unify retweetet
Unify retweetet

RT @DanielLenton1: Incredibly flattered that @amazon have invited me to be their keynote speaker for this year's AWS Gen AI Loft event. Can…
English
Unify retweetet

Open AI released the new model O1 and I tested and compared it's logical thinking capabilities with Claude 3.5 Sonnet using @letsunifyai
I prompted both of them with a mathematical riddle which required some calculations and guess who won?
Puzzle: Hansa ate a meal at Jugju
English

We’re excited to have @shirleyxiaoyic from @IndianaUniv, co-author of the paper "The Janus Interface: How Fine-Tuning in Large Language Models Amplifies Privacy Risks," joining us this Friday for our Paper Reading Session! 🤩
RSVP 👉 lu.ma/jok7bp2m
The research introduces a novel attack, Janus, which exploits the fine-tuning interface to recover forgotten PIIs from the pre-training data in LLMs also formalizing the privacy leakage problem in LLMs, explaining why forgotten PIIs can be recovered through empirical analysis on open-source language models.🧠
Check out the Paper: arxiv.org/pdf/2310.15469
See you there!

English

We're thrilled to announce that @Vapi_AI will be joining us for our weekly Webinar Series tomorrow! (Wednesday) 🤩
RSVP here: lu.ma/cke3hpft
Join us as we welcome Sahil Suman, Solution Engineer at Vapi AI, to the session. Discover how VAPI enables the quick setup of high-quality voice agents and see the integration of @letsunifyai with VAPI for seamless access to various models and providers. See you there! 🧑💻
Explore Vapi:
⚡️vapi.ai
⚡️github.com/VapiAI

English

We are really excited to welcome @zlwang_cs from @ucsd_cse, who co-authored the paper "Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting". Happening Tomorrow!🤩
RSVP 👉 lu.ma/zesezv3u
The research introduces SPECULATIVE RAG, a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM 🧠🤖
Check out the Paper👉arxiv.org/pdf/2407.08223
See you there!

English

We are really excited to announce that we will be joined by @llmware for our Webinar Series today! (Tuesday)🤩
RSVP👉 lu.ma/e651sgj8
In this session, we're excited to welcome Darren Oberst and Namee Oberst from LLMware. We will explore how small specialized LLMs can compete with the larger models for specific use cases, especially for Financial, Legal, Compliance, and Regulatory-Intensive Industries. See you there! 🧑💻
Checkout LLMWare:
⚡️llmware.ai
⚡️github.com/llmware-ai/llm…

English

We are really excited to welcome Devichand Budagam
from @IITKgp, who co-authored the paper "Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models". Happening Friday! 🤩
RSVP👉 lu.ma/7j11iscl
The research introduces Hierarchical

English

We are really excited to announce that we will be joined by @tavilyai for our Webinar Series this Tuesday!🤩
RSVP👉 lu.ma/a77wgrao
In this session, we'll explore how Tavily API makes search engine optimised for LLMs and RAG, to provide efficient, quick, and persistent search results. We'll also showcase Unify's SSO integration with Tavily 🧠🧑💻
Checkout Tavily:
⚡️tavily.com
⚡️github.com/tavily-ai

English

We are really excited to welcome @sh_reya from @Berkeley_EECS, who co-authored the paper "Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences". Happening Tomorrow! 🤩
RSVP 👉lu.ma/ttwrh0n4
The research introduces EvalGen, an interface that provides automated assistance in generating evaluation criteria and implementing assertions🧠👩💻
Check out the Paper👉 arxiv.org/pdf/2404.12272
See you there!

English

