
bindas-bol 💯
781 posts



how to set up hermes agent step by step. built-in memory, 40+ tools, works on your phone, and what to think of hermes vs openclaw: 1. hermes is a personal AI agent that runs in your terminal. think of it like open claw but with built-in memory, 40+ tools out of the box, and 90% cheaper token costs. you install it with one command. 2. the 3 problems with open claw that hermes solves: no memory (you keep repeating yourself), constant gateway restarts, and zero visibility into what you're spending on tokens. 3. hermes remembers everything. every completed task gets saved to memory. it searches through past logs to find solutions. over time it literally gets smarter at your specific workflows. 4. connect it to open router. you see exact costs per model per task. free models rotate weekly. one founder went from $130 every five days on open claw to $10 on hermes. same output. 5. it comes preloaded with skills. apple notes, imessage, find my, browser, web search, image generation, cron jobs. no hunting for plugins. 6. connect it to obsidian so it reads your entire vault. connect it to gstack for your dev environment. create custom skills for your specific workflows. 7. the biggest money saver: have it write code once for recurring tasks. then it runs without burning tokens every time. stop paying an LLM to do the same scrape or report daily. 8. run it on android via telegram. name your agents. talk to them like coworkers. in this episode imran shows you how to set this up. 9. you can run it bare metal, in docker, or serverless on modal. pick your risk level. i begged @imranye to come on @startupideaspod and walk through the full installation live. he made it impossibly clear. if you've heard of Hermes Agent and want the clearest explanation of how to get set up like a pro let me know what you want me to cover on the next ep this is the best personal agent setup video on the internet right now. watch






Holy shit... Microsoft open sourced an inference framework that runs a 100B parameter LLM on a single CPU. It's called BitNet. And it does what was supposed to be impossible. No GPU. No cloud. No $10K hardware setup. Just your laptop running a 100-billion parameter model at human reading speed. Here's how it works: Every other LLM stores weights in 32-bit or 16-bit floats. BitNet uses 1.58 bits. Weights are ternary just -1, 0, or +1. That's it. No floats. No expensive matrix math. Pure integer operations your CPU was already built for. The result: - 100B model runs on a single CPU at 5-7 tokens/second - 2.37x to 6.17x faster than llama.cpp on x86 - 82% lower energy consumption on x86 CPUs - 1.37x to 5.07x speedup on ARM (your MacBook) - Memory drops by 16-32x vs full-precision models The wildest part: Accuracy barely moves. BitNet b1.58 2B4T their flagship model was trained on 4 trillion tokens and benchmarks competitively against full-precision models of the same size. The quantization isn't destroying quality. It's just removing the bloat. What this actually means: - Run AI completely offline. Your data never leaves your machine - Deploy LLMs on phones, IoT devices, edge hardware - No more cloud API bills for inference - AI in regions with no reliable internet The model supports ARM and x86. Works on your MacBook, your Linux box, your Windows machine. 27.4K GitHub stars. 2.2K forks. Built by Microsoft Research. 100% Open Source. MIT License










