Jd3d

60 posts

Jd3d

Jd3d

@Jd3d4

Katılım Mart 2019
183 Takip Edilen4 Takipçiler
Dillon Uzar
Dillon Uzar@DillonUzar·
Context Arena Update: Added @MoonshotAI's Kimi K2.5 to the MRCR leaderboards (2-, 4-, 8-needle)! K2.5 is a major step up from K2 and trades blows with @Google's Gemini 3 Flash (base) - beating it on 4 and 8-needle retrieval despite being half the cost. moonshotai/kimi-k2.5:thinking results (@ 128k): 2-Needle Performance: AUC: 81.9% (vs Gemini 3 Flash: 85.5%) Pointwise: 81.9% (vs K2: 52.1%) 4-Needle Performance: AUC: 55.6% (vs Gemini 3 Flash: 50.4%) Pointwise: 56.3% 8-Needle Performance: AUC: 30.4% (vs Gemini 3 Flash: 29.3%) Pointwise: 26.9% Full data: contextarena.ai @Kimi_Moonshot @GoogleDeepMind
Dillon Uzar tweet mediaDillon Uzar tweet media
English
6
3
29
2K
Jd3d
Jd3d@Jd3d4·
@ben_burtenshaw @huggingface This is great! Can you add additional benchmarks like: MRCR v2, SWE-Bench Pro, ARC-AGI 2, OSWorld, GDPval-AA, Terminal-Bench Hard, SciCode, AA-Omniscience, CritPt
English
0
0
1
143
Ben Burtenshaw
Ben Burtenshaw@ben_burtenshaw·
Eval scores in 2026 are broken. MMLU at 91%+, GSM8K at 94%+, yet models still can't handle basic multi-step tasks. And reported scores don't even agree across model cards, papers, and platforms. We just shipped Community Evals on @huggingface: - Benchmark datasets now host live leaderboards (MMLU-Pro, GPQA, HLE) - Scores live in model repos as versioned YAML - Anyone can submit evals to any model via PR without merging. - Verified badges for reproducible runs via Inspect AI This won't fix saturation or stop test set contamination. But it makes the game visible. What was evaluated, how, when, and by whom. Done trusting black-box leaderboards. Time to decentralize evals.
English
10
11
92
19.3K
Bindu Reddy
Bindu Reddy@bindureddy·
Finding it super hard to shut up about this new thing I am testing! So much so that we are going to embargo it - that is you can’t talk about it even if you are invited to test it. The best things are built when when you can combine multiple frontier LLMs with insanely good infrastructure
English
37
18
150
18.4K
Anuja U
Anuja U@heyanuja·
AidanBench Scores 🥳 —— >some of these results might raise some eyebrows and we implore you to take a look at the data on our website: aidanbench.com >AidanBench is the brainchild of me, James, and Aidan & VERY soon we’ll be dropping officially on arXiv, stay tuned!
Anuja U tweet media
English
17
10
166
27.7K
Topaz Labs
Topaz Labs@topazlabs·
🚀Big news! We’re launching Project Starlight: the first-ever diffusion model for video restoration. Enhance old, low-quality videos to stunning high-resolution. This is our biggest leap since Video AI was first launched. Like & comment Starlight 👇 to get early-access!
English
2.2K
996
10K
868K
Jd3d
Jd3d@Jd3d4·
@Andercot Note: The entire article is duplicated twice in a row.
English
0
0
1
162
Andrew McCarthy
Andrew McCarthy@AJamesMcCarthy·
What do you think is the most underrated sci-fi movie? I have a couple hours free tomorrow and I’m thinking I want to turn off my brain for a bit.
English
2.6K
89
1.9K
469.6K
adi
adi@adonis_singh·
give me build ideas to try make it challenging. The models are getting quite good.
English
55
3
88
7.4K
Jd3d
Jd3d@Jd3d4·
@_xjdr Note that NanoGPT is only a 127M parameter model, so it's around 10x smaller than GPT-2 1.5B that was deemed 'too dangerous to release'. I think it is around 14 hours for training the 1.5B on 8xH100s. ($233 worth of compute).
English
0
0
2
94
xjdr
xjdr@_xjdr·
i think its worth taking a moment to put into perspective how cool this work is. GPT2 is really what the entire OpenAI empire was built on / was deemed too dangerous to release a few short years ago and it is now reproducible in less than 8 min on a single (large) machine
Keller Jordan@kellerjordan0

New NanoGPT training speed record: 3.28 FineWeb val loss in 7.23 minutes on 8xH100 Previous record: 7.8 minutes Changelog: - Added U-net-like connectivity pattern - Doubled learning rate This record is by @brendanh0gan

English
12
33
503
33.2K
Jd3d
Jd3d@Jd3d4·
@phill__1 Note that Llama 3 70B Instruct version gets 81.7% on Human Eval. It remains to be seen what the 3.1 Instruct will get on it.
English
0
0
1
101
Phil
Phil@phill__1·
Llama 3.1 70B seems like the most interesting model launching tomorrow. HumanEval jumped from 39% to 79% between llama 3 and 3.1 70B
English
10
15
161
15.9K
Jd3d
Jd3d@Jd3d4·
@ml_perception Can you clarify why the 8B version has a March, 2023 knowledge cutoff instead of December 2023?
English
0
0
0
323
Jd3d
Jd3d@Jd3d4·
@mixedrealityTV Yes so good! You should read the books too, they are one of my favorite trilogies.
English
0
0
1
53
Sebastian Ang
Sebastian Ang@mixedrealityTV·
3 Body Problem. Wow.
English
6
0
16
1.6K
Jd3d
Jd3d@Jd3d4·
@mixedrealityTV Quest 2/3 have Auto Wake. You can turn it on or off in the settings.
English
1
0
1
100
Sebastian Ang
Sebastian Ang@mixedrealityTV·
Putting the Quest 3 on after prolonged AVP usage: you wonder why it doesn’t turn on. Right…you have to push a button to do so! Putting it on is not good enough to show your intent of using it. #avp #quest3
English
3
0
5
1.3K
Jd3d
Jd3d@Jd3d4·
@casper_hansen_ Keep in mind Gemma is a significantly larger model than Mistral 7B. Much closer to 8B. That makes a big difference.
English
0
0
0
106
Casper Hansen
Casper Hansen@casper_hansen_·
it seems likely to me that - mistral was trained on 4T tokens based on - gemma was trained on 6T tokens - yi was trained on 3T tokens
Casper Hansen tweet media
English
2
0
26
3.4K
Jd3d
Jd3d@Jd3d4·
@JeffDean Can you make a statement on when this will be available for paying Gemini Advanced customers?
English
0
0
0
69
Jeff Dean
Jeff Dean@JeffDean·
Gemini 1.5 Pro - A highly capable multimodal model with a 10M token context length Today we are releasing the first demonstrations of the capabilities of the Gemini 1.5 series, with the Gemini 1.5 Pro model. One of the key differentiators of this model is its incredibly long context capabilities, supporting millions of tokens of multimodal input. The multimodal capabilities of the model means you can interact in sophisticated ways with entire books, very long document collections, codebases of hundreds of thousands of lines across hundreds of files, full movies, entire podcast series, and more. Gemini 1.5 was built by an amazing team of people from @GoogleDeepMind, @GoogleResearch, and elsewhere at @Google. @OriolVinyals (my co-technical lead for the project) and I are incredibly proud of the whole team, and we’re so excited to be sharing this work and what long context and in-context learning can mean for you today! There’s lots of material about this, some of which are linked to below. Main blog post: blog.google/technology/ai/… Technical report: “Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context” goo.gle/GeminiV1-5 Videos of interactions with the model that highlight its long context abilities: Understanding the three.js codebase: youtube.com/watch?v=SSnsmq… Analyzing a 45 minute Buster Keaton movie: youtube.com/watch?v=wa0MT8… Apollo 11 transcript interaction: youtube.com/watch?v=LHKL_2… Starting today, we’re offering a limited preview of 1.5 Pro to developers and enterprise customers via AI Studio and Vertex AI. Read more about this on these blogs: Google for Developers blog: developers.googleblog.com/2024/02/gemini… Google Cloud blog: cloud.google.com/blog/products/… We’ll also introduce 1.5 Pro with a standard 128,000 token context window when the model is ready for a wider release. Coming soon, we plan to introduce pricing tiers that start at the standard 128,000 context window and scale up to 1 million tokens, as we improve the model. Early testers can try the 1 million token context window at no cost during the testing period. We’re excited to see what developer’s creativity unlocks with a very long context window. Let me walk you through the capabilities of the model and what I’m excited about!
YouTube video
YouTube
YouTube video
YouTube
YouTube video
YouTube
Jeff Dean tweet media
English
184
1.1K
6K
1.7M
Jd3d
Jd3d@Jd3d4·
@ylecun Back in December I told the LocalLlama community on reddit that Llama3 was coming Feb 2024. My post was seen by 15k people and I don't want to let them down so could you see it in your heart to release maybe just the smallest Llama3 model before March? Thanks in advance!
English
0
0
0
91
Jd3d
Jd3d@Jd3d4·
@PranMan @levelsio The CO2 reduction in the paper you linked to is for a single plant in a 1 cubic meter volume enclosure for a time duration of 8 hours. It's the time duration that's the big problem here. If you introduced a human they would overwhelm the plant, or even 100 of them.
English
0
0
0
9
Jd3d retweetledi
MrBeast
MrBeast@MrBeast·
I’m gonna give 10 random people that repost this and follow me $25,000 for fun (the $250,000 my X video made) I’ll pick the winners in 72 hours
English
383.7K
2.6M
1.9M
282.1M
Jd3d
Jd3d@Jd3d4·
@burny_tech An NVIDIA H200 does 38 terabits/s.
English
0
0
3
498
Burny - Effective Curiosity
Burny - Effective Curiosity@burny_tech·
This really feels like singularity New ultra-high speed processor to advance AI, driverless vehicles and more that operates more than 10,000 times faster than typical electronic processors that operate in Gigabyte/s, at a record 17 Terabits/s. The system processes 400,000 video signals concurrently, performing 34 functions simultaneously that are key to object edge detection, edge enhancement and motion blur. Photonic signal processor based on a Kerr microcomb for real-time video image processing
Burny - Effective Curiosity tweet mediaBurny - Effective Curiosity tweet media
English
10
34
240
38.1K
Jd3d
Jd3d@Jd3d4·
@ylecun I think you meant to say that PaLM2 was trained on 2 trillion tokens (not billion)?
English
0
0
0
61
Yann LeCun
Yann LeCun@ylecun·
Dear journalists, it makes absolutely no sense to write: "PaLM 2 is trained on about 340 billion parameters. By comparison, GPT-4 is rumored to be trained on a massive dataset of 1.8 trillion parameters." It would make more sense to write: "PaLM 2 possesses about 340 billion parameters and is trained on a dataset of 2 billion tokens (or words). By comparison, GPT-4 is rumored to possess a massive 1.8 trillion parameters trained on untold trillions of tokens." Parameters are coefficients inside the model that are adjusted by the training procedure. The dataset is what you train the model on. Language models are trained with tokens that are subword units (e.g. prefix, root, suffix). Saying "trained a dataset of X billion parameters" reveals that you have absolutely no understanding of what you're talking about.
English
132
518
4.5K
1.1M