CBir

876 posts

CBir banner
CBir

CBir

@c__bir

เข้าร่วม Aralık 2023
1.1K กำลังติดตาม59 ผู้ติดตาม
ทวีตที่ปักหมุด
CBir
CBir@c__bir·
Here is my take on alignment. I think it is solvable @elonmusk @sama @ylecun @demishassabis @mustafasuleyman We use Language in order to communicate goals with symbolic representations. By designing powerfull AI Systems like this we can understand them and grow alongside them
CBir tweet media
English
6
0
14
8.4K
Kat ⊷ the Poet Engineer
Kat ⊷ the Poet Engineer@poetengineer__·
trying to use topological data analysis to map the shape of my x bookmarks through mapper + embedding extraction and generated 3 views: - density: where attention keeps gravitating - pca: the dominant axes of variation - centroid: center vs edge (typical -> outlier)
English
135
617
6.1K
721.3K
Dillon Uzar
Dillon Uzar@DillonUzar·
@c__bir @Zai_org @128k Working on it! Hitting rate limits for Qwen 3.5 and 3.6 models. For Gemma 4 hitting a couple of issues in my setup, but hoping to finish the results this week :)
English
1
0
1
72
Dillon Uzar
Dillon Uzar@DillonUzar·
Context Arena: Added @Zai_org's GLM-5.1 on 8-needle GDM-MRCRv2. All scores below are with reasoning enabled. AUC @128k: - Kimi K2.6: 53.8% - GLM-5.1: 50.2% - Kimi K2.5: 46.2% Cum AVG @128k: - Kimi K2.6: 65.8% - GLM-5.1: 63.3% - Kimi K2.5: 60.5% See image attached for non-reasoning. GLM-5.1 slots right between Kimi K2.6 and K2.5. All three models are essentially tied through 8k-32k (within CI). GLM-5.1 stays neck-and-neck with K2.6 through 64k. The gap only opens up at 128k, where GLM drops to 27.4% vs K2.6's 39.0%. @Zai_org @Kimi_Moonshot (New website at contextarena.ai isn't live yet - just sharing results for those interested while we finalize things. ~aiming for tomorrow)
Dillon Uzar tweet media
English
1
0
14
1.3K
CBir
CBir@c__bir·
@arcprize can we have a arc game app where one can solve random daily challanges from any of the arc public puzzles until all are completed? Would be fun and reaching more people 😎 @fchollet
English
0
0
0
89
ARC Prize
ARC Prize@arcprize·
ARC-AGI-3 Efficiency ARC-AGI-3 offers the first formal measure of learning efficiency in the ARC-AGI series Efficiency measures not just whether a level is completed, but how many actions it took relative to the human baseline. This connects directly to the definition of intelligence, skill-acquisition efficiency The chart below shows the distribution of human per-level efficiency across the public demo set
ARC Prize tweet media
English
5
2
50
4.2K
ARC Prize
ARC Prize@arcprize·
ARC-AGI-3 Human Baseline Dataset Today we're open-sourcing the ARC-AGI-3 Human Baseline. This is the most exhaustive human testing study in the ARC-AGI series Every environment was solved by at least 2 people (many by more) from the general public, with no prior training
English
21
73
541
160.8K
CBir
CBir@c__bir·
@fchollet bit of random deviations + curiosity (fun of exloring) we now there?
English
0
0
1
10
François Chollet
François Chollet@fchollet·
Simply retrieving a reasoning trace looks a lot like human reasoning, until it's time to navigate uncharted territory. If you memorized all reasoning traces of humans from 10,000 BC, you could automate their lives but you could not invent modern civilization.
English
65
48
564
39.4K
CBir
CBir@c__bir·
@Teknium @0xSimony whats the best inference stack for hermes gemma 31B agents? lmstudio with multi access local server or vllm?
English
1
0
1
45
Simony
Simony@0xSimony·
Hermes ou OpenClaw ?
English
52
0
32
19.7K
CBir
CBir@c__bir·
@digitalix why is vllm so much worse in terms of KV Cache - max Context size than LMStudio?
English
0
0
0
35
Alex Ziskind
Alex Ziskind@digitalix·
New vid coming shortly
Alex Ziskind tweet media
English
31
2
131
17.5K
CBir
CBir@c__bir·
@karpathy are you useing obsidian cli?
English
1
0
19
96.7K
Andrej Karpathy
Andrej Karpathy@karpathy·
LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.
English
2.8K
6.9K
57.6K
20.6M
Demis Hassabis
Demis Hassabis@demishassabis·
Excited to launch Gemma 4: the best open models in the world for their respective sizes. Available in 4 sizes that can be fine-tuned for your specific task: 31B dense for great raw performance, 26B MoE for low latency, and effective 2B & 4B for edge device use - happy building!
Demis Hassabis tweet media
English
328
888
8K
979.3K
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Introducing Gemma 4, our series of open weight (Apache 2.0 licensed) models, which are byte for byte the most capable open models in the world! Gemma 4 is build to run on your hardware: phones, laptops, and desktops. Frontier intelligence with a 26B MOE and a 31B Dense model!
Logan Kilpatrick tweet media
English
287
598
6.2K
521.6K
CBir
CBir@c__bir·
@UnslothAI Is it performing better than default Qwen 27B? U got some Benchmarks to throw at it @UnslothAI ?
English
1
0
2
3K
Unsloth AI
Unsloth AI@UnslothAI·
This model has been #1 trending for 3 weeks now. It's Qwen3.5-27B fine-tuned on distilled data from Claude-4.6-Opus (reasoning). Trained via Unsloth. Runs locally on 16GB in 4-bit or 32GB in 8-bit. Model: huggingface.co/Jackrong/Qwen3…
Unsloth AI tweet media
English
88
228
2.8K
207.9K
CBir
CBir@c__bir·
@deedydas is the code open source? want to test with 4B and 9B LLMs
English
0
0
0
62
Deedy
Deedy@deedydas·
Karpathy's Autoresearch pushed my vibecoded Rust chess engine AI from "expert" to a top 50 grandmaster, a #311 chess engine. It ran over 70 experiments on its own and tried to hill climb to the top ELO score it could, landing at 2718!
English
98
230
3.5K
377.6K
CBir
CBir@c__bir·
@kepano @obsdmd Obsidian mobile is really useful nowadays - thx 😎 @kepano - auto full screen feature is one of the good ones
English
0
0
0
19
Elon Musk
Elon Musk@elonmusk·
@beffjezos Entropy, entropy, no escaping that for me.
English
929
328
3.9K
335.3K
CBir
CBir@c__bir·
@TheRealAdamG co learning - of concepts has huge potential 😎
English
0
0
0
48
Adam.GPT
Adam.GPT@TheRealAdamG·
openai.com/index/new-ways… "Today, we’re making learning these [math and science] concepts in ChatGPT even more interactive with new dynamic visual explanations. Starting with more than 70 core math and science concepts, ChatGPT will guide learners by showing how formulas, variables, and relationships behave in real time. These experiences will be available globally across all plans starting today."
English
45
110
980
104.2K
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Say hello to Gemini Embedding 2, our new SOTA multimodal model that lets your bring text, images, video, audio, and docs into the same embedding space! 👀
Logan Kilpatrick tweet media
English
273
451
5.5K
858.4K
Demis Hassabis
Demis Hassabis@demishassabis·
Ten years ago, AlphaGo’s legendary match in Seoul heralded the start of the modern era in AI. Its famous ‘Move 37’ signaled to us that AI techniques were ready to tackle real-world problems in areas like science - and ideas inspired by these methods are critical to building AGI
English
175
503
3.6K
708.4K
CBir
CBir@c__bir·
@ficlive curious about Qwen3.5 35BA3B and 27B Scores 👀
English
0
0
0
29
Fiction.live
Fiction.live@ficlive·
qwen-3.5 plus and qwen3-max-thinking and opus 4.6
Fiction.live tweet media
English
11
15
143
22.8K