Eisuke/叡佑

252 posts

Eisuke/叡佑

@EisukeHirata

Exploring Human x AI | Solopreneur | Made in 🇯🇵 @eisuke_hrt

Katılım Mayıs 2021

420 Takip Edilen473 Takipçiler

Eisuke/叡佑@EisukeHirata·9 Mar

AI Company eco-system is growing fast Polsia: @polsia Paperclip: paperclip.ing Nanocorp: nanocorp.so ZHC: zhc.company What else am I missing?

English

820

Eisuke/叡佑 retweetledi

dotta 📎@dotta·4 Mar

We just open-sourced Paperclip: the orchestration layer for zero-human companies It's everything you need to run an autonomous business: org charts, goal alignment, task ownership, budgets, agent templates Just run `npx paperclipai onboard` github.com/paperclipai/pa… More 👇

English

424

720

8.2K

2.5M

Eisuke/叡佑 retweetledi

Avi Chawla@_avichawla·23 Oca

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

English

164

542

4.5K

969.9K

Eisuke/叡佑@EisukeHirata·26 Haz

@justvee21 @_nightsweekends @_buildspace Thanks!

English

vee@justaveee·25 Haz

@EisukeHirata @_nightsweekends @_buildspace quite interesting - I actually made a demo of something similar using relevance ai. Its quite fun and works well. excited to see what you come up with

English

105

Eisuke/叡佑@EisukeHirata·25 Haz

Joined s5 of nights and weekends! I'm building a supervisor AI agent managing other AI agents Any thoughts? cc @_nightsweekends @_buildspace

English

1.7K

Eisuke/叡佑@EisukeHirata·25 Haz

@akshayvkt @_nightsweekends @_buildspace Thanks! Yeah it can be both Multiple sales AI agents with different parameters can be running simultaneously, or AI agents with different specialties can collaborate to implement a single project

English

104

Akshay@akshayvkt·25 Haz

@EisukeHirata @_nightsweekends @_buildspace love the idea, this is something I'm very interested in as well! A question I have is, will the agents it manages all be doing the same specialized tasks, or they'll each be different from each other, but all their tasks combined together complete a larger task?

English

128

Eisuke/叡佑@EisukeHirata·25 Haz

@kristen_destini @_buildspace @_nightsweekends Awesome! Non-dilutive funding can be a good way to survive for startups, but finding them is tough

English

Kristen✨@kristen_destini·25 Haz

Here's an updated slide based off the recent feedback I've gotten. Already loving the @_buildspace @_nightsweekends process! Thanks huys!

English

Eisuke/叡佑@EisukeHirata·25 Haz

@calvinchen @_nightsweekends @_buildspace Interesting! It looks like a lot of initial setup for users, so if there's anything I can help you with in development, let me know

English

216

Calvin Chen@calvinchen·25 Haz

i don’t want to shop but i want new clothes fetchr — can you just go find it and buy it for me? sign up for waitlist :) cc: @_nightsweekends @_buildspace

English

129

12.9K

Eisuke/叡佑@EisukeHirata·25 Haz

@gabrielste1n @_nightsweekends @_buildspace Love your idea! Can't wait for notion integration

English

Gabe Stein@gabrielste1n·24 Haz

Kinda late, but this is my first Twitter/X post and I was nervous. Excited to share what we're building. Feedback is welcome. CC: @_nightsweekends @_buildspace

English

137

454

29.3K

Eisuke/叡佑 retweetledi

Kiyo@kiyokb·19 Haz

We're hiring a full-time Sales & Marketing position (remote) at @no_doxx ! Requirements: - Minimum of 1 year of work experience - Residing within a 4-hour time difference from SF - Full professional English proficiency Interested? Apply here: jobs.ashbyhq.com/noxx/07fb2687-…

English

5.1K

Eisuke/叡佑 retweetledi

Rohan Paul@rohanpaul_ai·10 Haz

CodeR paper released. On SWE-bench lite of 300 real-world GitHub issues, CodeR is able to solve 29.00% of issues, when submitting only once for each issue. 🤯 I’m quite impressed. It’s better than Aider, SWE-agent, and many commercial products we know of and establishes a new state-of-the-art. paper "CODER: Issue Resolving with Multi-Agent and Task Graphs": 📌 CODER is a multi-agent framework with pre-defined task graphs for automatically resolving GitHub issues using large language models (LLMs). It aims to improve upon single-agent approaches like SWE-agent and AutoCodeRover. 📌 CODER has five agent roles - Manager, Reproducer, Fault Localizer, Editor, and Verifier. Each agent has a specific set of actions it can take. This reduces the decision complexity for the next action compared to a single agent that has to choose from a large joint action space. 📌 A key innovation is using structured task graphs to represent the plan to resolve an issue. The task graph specifies the agents involved, their subtasks, and the flow between them based on success/failure. This allows injecting expert-designed plans and ensures they are followed precisely, bypassing challenges with instruction following and long-context in LLMs. 📌 CODER leverages LLM-generated test cases and existing repository tests to get code coverage data. This coverage information, along with BM25 scores, is used to improve fault localization and keyword-based code retrieval. 📌 Careful prompt engineering is done for each agent role, defining their identity, responsibilities and available actions. A ReAct-style prompt with Discussion/Action fields is used.

English

110

12.6K

Eisuke/叡佑 retweetledi

Chief AI Officer@chiefaioffice·7 Haz

AI agent infra in 2023 vs 2024 from @MadronaVentures @jturow Summary of 6 key themes in 2024:

English

108

514

92.4K

Eisuke/叡佑 retweetledi

fly51fly@fly51fly·6 Haz

[CL] Chain of Agents: Large Language Models Collaborating on Long-Context Tasks Y Zhang, R Sun, Y Chen, T Pfister… [Google Cloud AI Research & Penn State University] (2024) arxiv.org/abs/2406.02818 - Chain-of-Agents (CoA) is a multi-agent LLM collaboration framework for solving long context tasks. It consists of worker agents handling segmented portions of text via communication and a manager agent synthesizing their contributions. - CoA expands the effective context window to full input length through multi-step communication between workers. Each worker is assigned a short context to mitigate long context focusing issues. - CoA processes the entire input through interleaved reading and reasoning instead of reading then processing reduced inputs like in RAG. This enables better performance on tasks requiring full context. - Experiments on QA, summarization and code completion datasets show CoA significantly outperforms strong baselines like RAG and full-context LLMs. Improvements are larger for longer inputs. - Analysis shows CoA mitigates the lost-in-the-middle issue and its collaborative approach enables complex reasoning over long contexts.

English

5.4K

Eisuke/叡佑 retweetledi

Towards Data Science@TDataScience·30 May

What will it take to transition from a prompt engineering-focused paradigm to one led by agent engineering? Giuseppe Scalamogna charts a potential future path for LLM-based tools. buff.ly/4bZouSk

English

3.3K

Eisuke/叡佑 retweetledi

Travis Fischer@transitive_bs·9 May

Imagine a world where instead of checking twitter, you're checking in on your own personal army of 1000 AI agents that are always-on, doing work in the background 24/7 on your behalf... Well WTF are AI agents? Just like self-driving cars, you can think of AI agents as self-driving computer programs. Traditional computer programs are written by human programmers who have to think carefully about how the program should handle any given situation. AI agents replace or augment this paradigm by allowing AI models to make decisions at runtime – which makes it a lot easier to build more flexible, general, personalized, and powerful programs. Why should you care? I’ve found the following prompt to be particularly useful when thinking about agents and answering this question: What cognitive tasks are the most well-resourced people in the world – billionaires and world leaders – already offloading to teams of human assistants? ⇒ these will be the first tasks offloaded to AI agents. These top 1% people can afford to trade time for money, but what about the rest of us? This is what excites me the most about AI agents: they will help democratize access to productivity resources that were previously reserved only for the top 1%. But that's only talking about a single agent... Now let's go back to imagining a world where you having your own personal army of 1000 agents that are always-on, doing work in the background 24/7 on your behalf. What will this unlock for ambitious individuals? What will this unlock for everyday people? And most importantly, what will this unlock for you?🔥 This may sound like futuristic cyberpunk, but I believe this future is looking more and more likely within the next 5-10 years. So what can AI agents do today? Agents today are mostly used for performing background research with access to tools like google search, wikipedia, crunchbase, and other sources of proprietary data. Agents are really good at crawling the web and compiling summaries across many different sources. They’ve also found inklings of PMF in solving limited coding tasks. See Devin or Github Copilot Workspaces for early examples. Most agentic use cases boil down to workflow automations. The differences between traditional workflow automations and agentic workflows are: • agentic workflows generally aren’t deterministic, which requires fundamentally different tooling to do well • agentic workflows are a superset of traditional workflows • tasks which previously required human-in-the-loop can now replace or augment these difficult sub-tasks with AI reasoning which is cheaper and more scalable than human labor How do AI agents work? Most of the useful agents today are simply LLM calls with access to a few tools in a while loop. These fall into the “hand-crafted” or “specialized” agents in the diagram below. There’s a spectrum of agentic approaches ranging from traditional, fully deterministic programs (human programmers driving the bus) to fully autonomous agents (AI models driving the bus). 1. "hand-crafted": chained prompts and API calls 2. "specialized": dynamically decides what to do within a DAG of task types and tools 3. "general": can do anything. cool demos but nothing reliable exists as there are too many edge cases. (these agent types are from @yoheinakajima’s excellent breakdown x.com/yoheinakajima/…) As more logic is built on top of LLMs as a core reasoning engine, this inevitably starts to look like a DAG, with some nodes being LLM calls, some being traditional code, and some being sub-chains or other sub-agents. Does this remind anyone else of a higher-level AST, where the individual nodes are using natural language prompts? 👀 Everything I’ve talked about so far is in the right-hand side of this image. We have either a hand-crafted or narrow, specialized DAG and an LLM is used to decide how to traverse this DAG at runtime to solve a given task. The eventual goal is to have this DAG be built dynamically on-the-fly, which is where we start getting closer to fully autonomous agents. Projects like AutoGPT and BabyAGI are great early examples of fully autonomous agent architectures, but at the moment they’re mostly just toys since reliability gets exponentially harder as you try to give the agent more responsibility and freedom of choice. To build more reliable, autonomous, general purpose, long-running agents, we’ll need to solve difficult sub-problems around planning, memory, task decomposition, world modeling, guardrails / safety, human-in-the-loop feedback, agentic UX, and more. Another analogy I really love is the concept of an LLM OS, with an LLM filling the equivalent role of a CPU and an agentic program running on this higher-level computer. See @karpathy's discussion and my previous discussion on this topic x.com/transitive_bs/… Where can I learn more about agents? I wrote a more inspirational blog post here which goes into a lot more depth: transitivebullsh.it/ai-agents Hopefully you find it useful && would love to hear your thoughts🙂💯

English

283

53.1K

Eisuke/叡佑 retweetledi

Kenneth Auchenberg 🛠@auchenberg·25 May

Do websites go away with AI agents? Great post by @Theoryvc on why Agent-based UI Automation might be a more realistic path than replacing all websites with APIs. Hint: It turns out that building great API are surprisingly hard. (and trust me, it is!) linkedin.com/pulse/do-websi…

English

817

Eisuke/叡佑 retweetledi

James Alcorn@JamesAlcorn94·27 May

SWE-agent is a stunning paper. Not for of its results on swe-bench, but for so concretely demonstrating the idea that (1) llm agents equipped with llm-native interfaces are the future and; (2) by implication, there are a whole lotta interfaces to redesign. @jyangballin & co set out to build an autonomous software engineer with LLMs. Not a new idea per se. The difference is in the approach: they design an agent-computer interface ('ACI') through which the LLM can interact with the codebase, in lieu of using existing HCIs like the Linux shell. Why? Because existing interfaces, like GUI-based IDEs, have "rich visual components and feedback that make them powerful tools for humans, but [unsuitable] for LMs." Put another way, LLMs & humans are fundamentally different user constituencies, and forcing an LLM to use *our* interfaces is shaping up to be a really bad idea. The authors observe that, when using the popular vim text editor, the agent wastes time & precious context window verifying minor results that a human wouldn't (e.g. file removal). vim has a catastrophic impact on agent performance; the reasons it's a great product for humans are the exact same reasons it's a terrible product for llms. This won't be a controversial idea, but it's not yet widely appreciated. IMO, most novel ACIs will be designed in-house at startups for the next year or two - i.e. llm application devs will build internally the ACI necessary for their agent's use case, be it in observability (perhaps an ACI for the APM stack), security (ditto for SIEM), or whatever use case they're building around. If the idea takes hold, new startup opportunities around ACIs could resemble those of the API ecosystem, to begin with - design, testing, security, governance. But this could get pretty cooked, pretty quick: Not hard to imagine a future where we ask an agent to design and implement its own interfaces, in real time, and in turn instruct another agent to limit what agent #1 can retrieve from our system. Unclear where the value accrues in this scenario.

English

309

57.8K

Eisuke/叡佑 retweetledi

Yohei@yoheinakajima·26 May

AI Builders' Favorite Tools: A Hive Mind Survey Summary I recently asked AI builders about their favorite go-to libraries and frameworks for AI projects. Here is a summary of the first ~63 replies, categorized. Agent Frameworks: • AgentOpsAI: This tool simplifies the debugging and monitoring of agent operations, making it easier to read through prompts and visualize spend. (@AlexReibman) • AgentForge: Designed for rapid iteration of cognitive architectures with support for multiple LLM APIs per prompt and easy VectorDB implementation via ChromaDB. (@JohnSmith4Reel) • crewAIInc: A flexible and powerful framework for enabling multi-agent task completion with support for any OpenAI API-compatible endpoints. (@whoabuddydev, @as_cybersamurai) • craftgen: Offers flexibility with its actor model and event-driven architecture, making human-in-the-loop workflows easy. (@Necmttn) • Lyzr: An agent framework mentioned for its capabilities in building multi-agent systems. (@theAIsailor) • Policy Synth: An open-source multiscale AI agent library. (@robertbjarnason) • GraphAI: Used for data flow programming to build genetic apps, enabling LLMs to generate genetic applications. (@snakajima) Libraries for Data Management and Vector Databases: • Pinecone: Used for vector databases and praised for its serverless capabilities. (@EricBDelisle, @jjackyliang, @theAIsailor) • ChromaDB: Integrated with AgentForge for easy VectorDB implementation. (@JohnSmith4Reel) • Redis, Qdrant, Postgres: Commonly mentioned databases for AI projects. (@who_mansu) • Greenhouse ECS: An ECS server framework built to be programmable by LLMs. (@EricBDelisle) Web and UI Frameworks: • Streamlit: Popular for creating web apps quickly from ideas, making it a go-to tool for many. (@josephs_tez, @SaidAitmbarek, @theAIsailor, @hTrapVader) • Gradio: Useful for creating a UI in a few lines of Python, often used to showcase work in team meetings. (@AI_NewsWaltz) • Wordware: Not a traditional framework but appreciated for building AI agents quickly and easily, even for less technical users. (@unable0_) • PreactJS: Used for front-end application interfaces. • NodeJS: Often paired with PreactJS for front-end development. Language Models and NLP Tools: • GPT-4o and other OpenAI tools: Widely used for their robust performance in various AI applications. (@TheHamedMP, @SaidAitmbarek, @theAIsailor) • Langchain and Llama Index: Common starting points for fast iterations in AI projects. (@TheHamedMP, @topmass) • DeepgramAI and AssemblyAI: Mentioned for their NLP capabilities. (@traviscorrigan) • Instruct: A lightweight library for structured outputs with LLMs. (@who_mansu, @HamzaFarhan) • Litellm: Another library for structured outputs with LLMs, often used alongside Instruct. (@who_mansu, @yoheinakajima) • DSPy: Highlighted for its power and versatility in AI projects. (@mysticaltech) • Magentic: A thin layer over LLM providers to simplify structured outputs and function calling. (@MichaelNStruwig, @jackmpcollins) • Openrouter: Offers multiple language models under the OpenAI API definition for easy experimentation. • Funcchain: An integration of Langchain with simpler typing and usage. (@akatzzzzz) Developer Tools and Platforms: • Replit: Favored as an IDE for its versatility and integration with various tools. (@theAIsailor) • Deno: Used in the craftgen tool for its code interpreter capabilities. (@Necmttn) • Imprompt: Highlighted for its ease of use in generative AI projects. (@jeffrschneider) • Vapi_AI, usebland, vocodehq, retellai: New tools currently being explored. (@traviscorrigan) • Hacknote: Recently added a feature called reactor creator, simplifying prompt writing and model selection. (@dbqsun) Evaluation and Testing: • LangSmith: Used for evaluation, debugging, and testing of LLM applications. (@as_cybersamurai) • Promptfoo: A testing framework to evaluate prompts and iterate on LLMs faster. (@Yossi_Dahan_) • Ragas: Used for evaluation in building RAG applications. (@AI_NewsWaltz) Miscellaneous Tools: • NATS / Socket IO / Redis: Used for messaging and caching in AI applications. (@EricBDelisle) • Tailwinds: A CSS framework used alongside NodeJS and PreactJS for front-end development. (@EricBDelisle) • BBScript: Acts as the glue between data, frontend, and backend. (@EricBDelisle) • Obsidian: A personal knowledge management tool that integrates with OpenAI APIs. (@BrianAndrenMA) • Convokit: An NLP toolkit from Cornell for conversational analysis. • Google Cloud's Vertex AI: Mentioned for its multimodal embedding model. (@jjackyliang) • PEFT: A tool used frequently for various AI projects. (@actualrealyorth) • Blacksmith: Provides fine-grained automation for agents and flow generation. (@MoMe36806866) • Trafilatura: Useful for processing web data, mentioned for its utility in LLM projects. (@TommyFalkowski) *Summarized by GPT-4o - sorry if it missed any! Of course this isn't a comprehensive list of great tools - but hopefully you find some interesting thing ones you didn't know of - and you now know someone you can ping about it :)

Yohei@yoheinakajima

AI builders, #hivemind survey time! What are your current favorite go-to libraries and frameworks for your AI projects? The ones you couldn’t live without! Add name, short desc, and why below. 👇 Will summarize first 100~200 replies into an article ✍️

English

100

538

116.8K

Eisuke/叡佑 retweetledi

AK@_akhaliq·14 Eyl

🤖 AgentVerse 🪐 with a @Gradio demo github: github.com/openbmb/agentv… AgentVerse offers a versatile framework that streamlines the process of creating custom multi-agent environments for large language models (LLMs). Designed to facilitate swift development and customization with minimal effort, our framework empowers researchers to concentrate on their research, rather than being bogged down by implementation details.

English

308

107.8K

Eisuke/叡佑 retweetledi

Edgar Haond@edgarhnd·18 May

excited to launch AI Reality TV today! our new platform lets you create your own social simulations. ever wondered if elisabeth preferred jack or will in pirates of the caribbean? now you can simulate and see for yourself! here's how it works: 1. choose a map and scenario. 2. add and customize your characters 3. watch the drama unfold as AI-powered characters interact. 4. talk to them to get their perspective. this is the start of a new kind of entertainment! drop a comment and I'll send you access.

English

107

281

46K

Keşfet

@polsia @_nightsweekends @_buildspace @akshayvkt @kristen_destini @calvinchen @gabrielste1n @no_doxx