Zacchaeus Bolaji

5.2K posts

Zacchaeus Bolaji

Zacchaeus Bolaji

@djunehor

Building @useavae - AI Virtual Assistant for Founders and Busy Executives. Engineering + Infrastructure. 5x founder, 2x exit. Ex-@meta

United Kingdom Katılım Ağustos 2016
176 Takip Edilen869 Takipçiler
Sabitlenmiş Tweet
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
Just got endorsed by Tech Nation 🎉 The process was long and tough, but worth every step. Excited for what’s ahead and grateful for the support that made it possible. Special shoutout to @zegbua for the support and guidance 🙌🏽 New chapter, let’s go!
English
33
22
562
155.8K
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
AI is gonna take away your jobs. But not in the way you think.
English
1
0
1
50
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
This copilot cannot be the btest Microsoft can come up with. How can forks be better than the original and oldest?
English
0
0
0
34
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
Anybody selling a growth hacking app for any platform has the incentive to lie about their own growth on the same app.
English
0
0
0
26
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
Current problem with AI discourse right now is that it's hard to know who/what to believe. Whether this person is saying what they're saying cos they're heavily invested in AI or they truly believe in AI.
English
0
0
0
17
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
What I'm getting from the current AI race is that whatever proprietary knowledge or data you have now, protect it. If it becomes public knowledge, AI labs are gonna train on it and you lose the advantage. Non-public proprietary knowledge or data will be the real moat
English
0
0
0
16
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
Google need to fix their stuff mehn. Too many thinkings breaking randomly. Whatever other AI labs are doing differently, they need to figure it out. Very useful products, but stuff breaking randomly means it's risky to use in production.
English
0
0
0
20
Zacchaeus Bolaji retweetledi
Avae
Avae@useavae·
We shipped two updates focused on one thing: keeping execution unblocked. You can now step into a live browser session to handle logins, CAPTCHAs, payments, or verifications, then hand control back to Avae to finish the task. Complex workflows also run faster, with independent steps executing in parallel instead of sequentially. Fewer stalls. Tighter loops. More work actually finished.
Avae tweet media
English
1
1
1
203
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
Whose responsibility is it to solve prompt injection? LLM providers or harness creators?
English
0
0
0
21
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
@antigravity needs to show current usage percent per LLM provider. Feels like a trap if a send a task and I get error midway about hitting limit and have to wait till next week.
English
0
0
0
49
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
@skeptrune IDEs might look very different few months from now, but I'm certain IDEs will still be the primary tool for development. Some months ago, people weren't chatting with AI via IDE. Cos "Integrated Development Environment" is the point. It's not a notepad, nor just a terminal
English
0
0
1
68
Nick Khami
Nick Khami@skeptrune·
calling it now: all these agent coding IDEs and GUIs are a phase and will be short lived. most devs will be using only a terminal in a few months.
San Francisco, CA 🇺🇸 English
189
9
526
95.7K
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
This is what we're discovering too @useavae, that trying to get an AI agent to discover tools when it needs to is less efficient compared to simply letting it know which tools are available upfront. Finding the balance between tasks the LLM can do itself vs what needs tool calling is tricky
English
0
0
0
112
Muratcan Koylan
Muratcan Koylan@koylanai·
Progressive disclosure is not reliable because LLMs are inherently lazy. "In 56% of eval cases, the skill was never invoked. The agent had access to the documentation but didn't use it." Vercel ran evals on Next.js 16 APIs that aren't in model training data to test whether agents could learn framework-specific knowledge through Skills vs. persistent context. Skills are the "correct" abstraction: package domain knowledge, let the agent invoke it when needed, minimal context. The agent decides when to retrieve. They work well WHEN the user triggers them; otherwise, LLMs just ignore them. Vercel's benchmarking is the first experiment of this kind I've seen, and it's actually interesting. - Baseline (no docs): 53% - Skill (default): 53% - Skill with explicit instructions: 79% - AGENTS[.]md with 8KB compressed docs index: 100% The skill approach assumes agents reliably recognize when they need external knowledge and act on it. They don't. "You MUST invoke the skill" made agents read docs first and miss project context. "Explore project first, then invoke" performed better. Same skill, different outcomes based on prompting. The winning approach removed the decision entirely. An 8KB compressed index embedded in AGENTS[.]md, with one instruction: "Prefer retrieval-led reasoning over pre-training-led reasoning." Two agent design learnings: 1. Passive context beats active retrieval for foundational knowledge. Don't make the agent decide to look things up, make the index always present. 2. Compress aggressively. Vercel went from 40KB to 8KB (80% reduction) with zero performance loss. The agent needs to know where to find docs, not have full content in context. The gap between "agent can access X" and "agent will access X" is larger than we assume. I keep seeing similar findings across agent architectures. Kimi Swarm's orchestrator is trained specifically to avoid sequential execution. Without training, orchestrators default to serial processing, planning a list of steps and executing them one by one. It's the EASY path. The agent defaults to the lazy path: hallucinating from training data rather than retrieving docs. Passive context removes the choice entirely; the agent doesn't decide whether to look things up; the index is already there. We keep finding that the "smarter", more autonomous design (let the agent decide when to X) underperforms the "dumber" design (always X, or structurally enforce X).
Vercel@vercel

We're experimenting with ways to keep AI agents in sync with the exact framework versions in your projects. Skills, 𝙲𝙻𝙰𝚄𝙳𝙴.𝚖𝚍, and more. But one approach scored 100% on our Next.js evals: vercel.com/blog/agents-md…

English
64
93
1.1K
197.9K
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
Handing raw code to non technical users will always be a bad idea. It's like giving giving non electricians live wire and saying "yeah, you can use it to power anything". They're more likely to electrocute themselves than do anything useful with that.
English
0
0
0
27
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
@theonejvo The whole concept of installable Skills is flawed and will remain too vulnerable as long as it's permitted to include runnable scripts. Either registry maintainers take on the job of vetting every one the way Google playstore/Apple appstore does or we abandon the idea for safety
English
1
0
1
298
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
@dexhorthy This is what we're betting on at @useavae too. A set of predefined commands/tools/functions produce better results than allowing the AI run commands as it sees fit. Besides the fact that bad things WILL happen, such un-guided execution produce less optimal results often.
English
0
1
0
267
dex
dex@dexhorthy·
By the end of 2026 the bash tool will be considered harmful. Some people will get popped but more broadly people will realize that a deterministic set of ~20-30 pre defined commands (think - make tasks) is much better for anything but the most greenfield of projects
English
23
6
99
17.2K
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
I believe the discourse around memory is partly about cross-platform ownership. As of today, Facebook and Google know so much about a person that they're able to serve hypertargeted ads. As more people spend more time with AI, there's need to collect that personal data centrally. Whoever owns that will likely be the next Google. Anyone can build a custom memory module for their app. And if you need external context, make API calls.
English
0
0
0
80
Arpit Bhayani
Arpit Bhayani@arpit_bhayani·
Everybody keeps saying AI memory is going to be huge. I have been reading about it and building on it for weeks now, and honestly, I am struggling to see why. RAG, multi-RAG, and their variants can solve most real problems. Memory really comes into play only when you need personalization. The number of use cases that actually require deep, user-level personalization is pretty small (CX support, healthcare, etc.), where continuity matters. Even those can usually be handled with RAG and its variants. Is it all about being able to update memory as new evidence or data shows up? So why is there so much discourse and discussion around memory? What big use case is everyone trying to cater to here? Please educate me :)
English
102
23
741
90.3K
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
I still believe the starting point for AI safety is sensible tool calling with subagents. Want access to an email message? Call the tool. Mark some tools as requiring explicit user approval. AI that can do anything CAN DO ANYTHING.
English
0
1
1
61
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
@garrytan In order for it to be evenly distributed, the complexity needs to be abstracted away. AI can do a lot right now, but most of the capabilities are hidden behind complexities that normal users will rather not dabble into.
English
0
0
0
502
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
@krandiash Exactly true. The ideal future is small and fast LLMs that live on the user device or close enough, and then connected to several tools. An LLM doesn't need to have the entire Wikipedia in it's training data to be useful. Just needs to have the ability to access it
English
0
0
1
112
Karan Goel
Karan Goel@krandiash·
I personally subscribe to the idea that in the near-term model systems will be built with 2 tiers of models. I like to think of these 2 tiers as whales and dolphins (I'm sure there's a better analogy...). Whales are giant models that run deep inside the data center. They're slow, use massive compute resources and solve hard problems. They can access and use specialized knowledge and execute long-running workflows. Dolphins run on the user <-> system surface. Their job is to directly interface with humans, collaborate, strategize, carry context, communicate effectively and generally keep humans happy and satisfied. They are good at summarizing information, they are clever at using tools and harnessing compute-intensive whales to get things done. Dolphins need to be fast, have lower power usage, have the option to run on-device or on edge compute, and otherwise must be capable of being run all the time. Most of the models we have today are whale-ish. Dolphin models are basically non-existent today (not fast enough, not enough context, use too much energy, not multimodal enough, can't interact very effectively with humans, and small models aren't smart enough). Dolphins offloading work onto whales is similar to humans offloading reasoning, using tools and databases -- computers, notepads, etc etc. All of this is about the relative intelligence of these two kinds of models and where they might sit. It would be a mistake to assume that dolphin models will be unintelligent (absolute sense), they will be much smarter than today's frontier models. (So yes voice LMs should know what to say.) To build a single model that can do the job of both, you would need a very big energy source and some pretty big advancements in accelerators and model architectures so everything can be done on your person. That would also change my thinking about this by a lot (and I would be influenced by some subset of those things happening). (We're working on the dolphins.)
Vinod Khosla@vkhosla

But the voice LLM still has to call a large LLM to have the intelligence to k ow what to say.

English
20
11
198
26.2K
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
@jacob_posel Yes. A truly effective assistant tool should be able to work across the tools you use and carry context across.
English
0
0
0
31
Jacob Posel
Jacob Posel@jacob_posel·
The problem with AI automations (like Clawd bot) When people try automate productivity tasks, they start with simple steps like summarizing calls, writing reports, sending emails These tasks are easy to automate, but fail to create impact Why? Because they do not live in isolation The context from a call is essential for an employee to finish their work The KPI's from a report guides further actions When you try automate one task ("removing a human"), but that person needs to have done that work to do MORE work, you're not creating any incremental value If you don't go all the way, your employee will still need to attend that call, pull and review that data or write that email. And going all the way is really hard So you'll see agents like Clawd bot automate intermediate tasks VERY quickly, but make sure the consider, does this actually save me time?
English
15
1
48
7.8K
Zacchaeus Bolaji
Zacchaeus Bolaji@djunehor·
@Dan_Jeffries1 And that is when you truly get maximum value out of AI, when it's assisting, and not taking the lead. Say a research task, let it track down all viable resources and make tjrm into a nice list with summary. You decide how you use them in your final draft.
English
0
0
0
12
Daniel Jeffries
Daniel Jeffries@Dan_Jeffries1·
What folks continue to miss about AI is we want a Figma moment for the next gen of software, not Claude "do everything and I'll check it later." We want co-creativity. Not for everything, but for many more things than folks imagine. Travel is a perfect example. GPT helped me hunt down great restaurants and hotels faster for my Taiwan trip over the holidays but I checked every one of them and more and made my own lists and hunted down my own haunts too. Co-creative. That is the key. That's the joy! That's the fun! AI as augment. As friend. As sidekick. We want it to work like a good and smart friend who lets us retain control and joy as we go, not do it all for us while we sleep.
Pratyush@pratyushbuddiga

This also feels like a uniquely bay area idea: "claude, book my entire trip, make no mistakes" whereas for most people their vacations are the highlights of their entire year and what they're looking forward to/planning for most of it. First time I put this together was hearing John Collison talking about this and how it'd be the perfect thing to do for a honeymoon trip and thinking "that's such a bay area mindset" when 99% of people would love the planning, searching, booking aspect for any trip, let alone a honeymoon

English
8
3
52
3.9K