CBir

876 posts

CBir

@c__bir

เข้าร่วม Aralık 2023

1.1K กำลังติดตาม59 ผู้ติดตาม

ทวีตที่ปักหมุด

CBir@c__bir·12 Nis

Here is my take on alignment. I think it is solvable @elonmusk @sama @ylecun @demishassabis @mustafasuleyman We use Language in order to communicate goals with symbolic representations. By designing powerfull AI Systems like this we can understand them and grow alongside them

English

8.4K

CBir@c__bir·1d

@poetengineer__ cool stuff 😎😎

English

163

Kat ⊷ the Poet Engineer@poetengineer__·1d

trying to use topological data analysis to map the shape of my x bookmarks through mapper + embedding extraction and generated 3 views: - density: where attention keeps gravitating - pca: the dominant axes of variation - centroid: center vs edge (typical -> outlier)

English

135

617

6.1K

721.3K

CBir@c__bir·3d

@DillonUzar @Zai_org @128k is the code open source? i can help running these models on mlx

English

Dillon Uzar@DillonUzar·3d

@c__bir @Zai_org @128k Working on it! Hitting rate limits for Qwen 3.5 and 3.6 models. For Gemma 4 hitting a couple of issues in my setup, but hoping to finish the results this week :)

English

Dillon Uzar@DillonUzar·3d

Context Arena: Added @Zai_org's GLM-5.1 on 8-needle GDM-MRCRv2. All scores below are with reasoning enabled. AUC @128k: - Kimi K2.6: 53.8% - GLM-5.1: 50.2% - Kimi K2.5: 46.2% Cum AVG @128k: - Kimi K2.6: 65.8% - GLM-5.1: 63.3% - Kimi K2.5: 60.5% See image attached for non-reasoning. GLM-5.1 slots right between Kimi K2.6 and K2.5. All three models are essentially tied through 8k-32k (within CI). GLM-5.1 stays neck-and-neck with K2.6 through 64k. The gap only opens up at 128k, where GLM drops to 27.4% vs K2.6's 39.0%. @Zai_org @Kimi_Moonshot (New website at contextarena.ai isn't live yet - just sharing results for those interested while we finalize things. ~aiming for tomorrow)

English

1.3K

CBir@c__bir·14 Nis

@arcprize can we have a arc game app where one can solve random daily challanges from any of the arc public puzzles until all are completed? Would be fun and reaching more people 😎 @fchollet

English

ARC Prize@arcprize·14 Nis

ARC-AGI-3 Efficiency ARC-AGI-3 offers the first formal measure of learning efficiency in the ARC-AGI series Efficiency measures not just whether a level is completed, but how many actions it took relative to the human baseline. This connects directly to the definition of intelligence, skill-acquisition efficiency The chart below shows the distribution of human per-level efficiency across the public demo set

English

4.2K

ARC Prize@arcprize·14 Nis

ARC-AGI-3 Human Baseline Dataset Today we're open-sourcing the ARC-AGI-3 Human Baseline. This is the most exhaustive human testing study in the ARC-AGI series Every environment was solved by at least 2 people (many by more) from the general public, with no prior training

English

541

160.8K

CBir@c__bir·14 Nis

@fchollet bit of random deviations + curiosity (fun of exloring) we now there?

English

François Chollet@fchollet·14 Nis

Simply retrieving a reasoning trace looks a lot like human reasoning, until it's time to navigate uncharted territory. If you memorized all reasoning traces of humans from 10,000 BC, you could automate their lives but you could not invent modern civilization.

English

564

39.4K

CBir@c__bir·7 Nis

@Teknium @0xSimony whats the best inference stack for hermes gemma 31B agents? lmstudio with multi access local server or vllm?

English

Teknium 🪽@Teknium·6 Nis

@0xSimony Can I add one more for Hermes ;)

English

1.2K

Simony@0xSimony·6 Nis

Hermes ou OpenClaw ?

English

19.7K

CBir@c__bir·6 Nis

@kimmonismus there is already a link to arxive: arxiv.org/abs/2602.19260

English

174

Chubby♨️@kimmonismus·6 Nis

6/ sciencedaily.com/releases/2026/…

6.4K

CBir@c__bir·5 Nis

@digitalix why is vllm so much worse in terms of KV Cache - max Context size than LMStudio?

English

Alex Ziskind@digitalix·5 Nis

New vid coming shortly

English

131

17.5K

CBir@c__bir·3 Nis

@karpathy are you useing obsidian cli?

English

96.7K

Andrej Karpathy@karpathy·2 Nis

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English

2.8K

6.9K

57.6K

20.6M

CBir@c__bir·2 Nis

@demishassabis 😎 local model from the Gs

English

Demis Hassabis@demishassabis·2 Nis

Excited to launch Gemma 4: the best open models in the world for their respective sizes. Available in 4 sizes that can be fine-tuned for your specific task: 31B dense for great raw performance, 26B MoE for low latency, and effective 2B & 4B for edge device use - happy building!

English

328

888

979.3K

CBir@c__bir·2 Nis

@OfficialLoganK 😎 What a great day it is @demishassabis

English

Logan Kilpatrick@OfficialLoganK·2 Nis

Introducing Gemma 4, our series of open weight (Apache 2.0 licensed) models, which are byte for byte the most capable open models in the world! Gemma 4 is build to run on your hardware: phones, laptops, and desktops. Frontier intelligence with a 26B MOE and a 31B Dense model!

English

287

598

6.2K

521.6K

CBir@c__bir·30 Mar

@UnslothAI Is it performing better than default Qwen 27B? U got some Benchmarks to throw at it @UnslothAI ?

English

Unsloth AI@UnslothAI·30 Mar

This model has been #1 trending for 3 weeks now. It's Qwen3.5-27B fine-tuned on distilled data from Claude-4.6-Opus (reasoning). Trained via Unsloth. Runs locally on 16GB in 4-bit or 32GB in 8-bit. Model: huggingface.co/Jackrong/Qwen3…

English

228

2.8K

207.9K

CBir@c__bir·22 Mar

@deedydas is the code open source? want to test with 4B and 9B LLMs

English

Deedy@deedydas·22 Mar

Karpathy's Autoresearch pushed my vibecoded Rust chess engine AI from "expert" to a top 50 grandmaster, a #311 chess engine. It ran over 70 experiments on its own and tried to hill climb to the top ELO score it could, landing at 2718!

English

230

3.5K

377.6K

CBir@c__bir·20 Mar

@kepano @obsdmd Obsidian mobile is really useful nowadays - thx 😎 @kepano - auto full screen feature is one of the good ones

English

kepano@kepano·30 Ara

Obsidian plugins reached 100 million total downloads! 2026 will be a big year for @obsdmd plugins. The plan is to make plugins easier to discover, safer to use, simpler to build, and faster to approve.

kepano@kepano

many Obsidian plugins were created by someone scratching their own itch, and discovering that hundreds of thousands of people have that itch too

English

722

65.8K

CBir@c__bir·15 Mar

@elonmusk @beffjezos @demishassabis i think this is AI alignment. Help living things fight entropy and live a full range, long dynamic life.

English

Elon Musk@elonmusk·15 Mar

@beffjezos Entropy, entropy, no escaping that for me.

English

929

328

3.9K

335.3K

Beff (e/acc)@beffjezos·15 Mar

We are at war with entropy, and computation is our weapon

Shivani Patel@shivanijpatel

men used to go to war, now they go to San Jose for GTC

English

136

106

1.5K

434.1K

CBir@c__bir·15 Mar

@elonmusk @beffjezos thats the way 😎

English

CBir@c__bir·10 Mar

@TheRealAdamG co learning - of concepts has huge potential 😎

English

Adam.GPT@TheRealAdamG·10 Mar

openai.com/index/new-ways… "Today, we’re making learning these [math and science] concepts in ChatGPT even more interactive with new dynamic visual explanations. Starting with more than 70 core math and science concepts, ChatGPT will guide learners by showing how formulas, variables, and relationships behave in real time. These experiences will be available globally across all plans starting today."

English

110

980

104.2K

CBir@c__bir·10 Mar

@OfficialLoganK awesome 😎

English

Logan Kilpatrick@OfficialLoganK·10 Mar

Say hello to Gemini Embedding 2, our new SOTA multimodal model that lets your bring text, images, video, audio, and docs into the same embedding space! 👀

English

273

451

5.5K

858.4K

CBir@c__bir·10 Mar

@demishassabis thank you Demis 😎 keep going 💯

English

139

Demis Hassabis@demishassabis·10 Mar

Read about AlphaGo’s incredible impact over the past 10 years and our vision for the future: deepmind.google/blog/10-years-…

English

465

41.5K

Demis Hassabis@demishassabis·10 Mar

Ten years ago, AlphaGo’s legendary match in Seoul heralded the start of the modern era in AI. Its famous ‘Move 37’ signaled to us that AI techniques were ready to tackle real-world problems in areas like science - and ideas inspired by these methods are critical to building AGI