erogol

8.6K posts

erogol banner
erogol

erogol

@erogol

Doing ML | Web - https://t.co/yxKAwSSkgR | Substack - https://t.co/W9Qg4M3AZg

Katılım Ekim 2008
581 Takip Edilen1.5K Takipçiler
Sabitlenmiş Tweet
erogol
erogol@erogol·
XTTS is still being downloaded almost 5m times every month and 2.1m only on HF. It is greater than many recent hyped models. Hope people use it well for that to worth to my burnout that I’m still recovering from. Coqui has been one of the most successful broke startups
erogol tweet media
English
13
10
110
4.6K
erogol
erogol@erogol·
@jeremyphoward I found Gemini more tamed and more like an assistant. However it Is agreeing with everything
English
0
0
0
198
Jeremy Howard
Jeremy Howard@jeremyphoward·
Opus & Sonnet 4.6 haven't been a great hit for most of my work, or our customers, since (as warned in their tech report) they're over-enthusiastic about agentically taking over, rather than letting the human lead. Any suggestions for competent models that are patient followers?
English
92
15
380
80.4K
erogol
erogol@erogol·
i was vibing on a few LLM projects with zero visibility into what was actually happening. silent retry bugs burning tokens, wrong API keys, traffic to dead endpoints. couldn't debug any of it. so i created this github.com/erogol/toklog
English
0
0
5
206
erogol retweetledi
Hugging Models
Hugging Models@HuggingModels·
Meet XTTS-v2: a text-to-speech model that's changing how we create voice content. It generates natural, expressive speech from text, supporting multiple languages and voices. With over 6.7M downloads, it's clearly a community favorite!
Hugging Models tweet media
English
2
15
99
6.5K
erogol
erogol@erogol·
Working on my personal ai I shared a screenshot of a workout, it automatically added it to my Garmin as a workout plan. Garmin people would know what it means That’s agi to me :)
English
0
0
1
163
erogol
erogol@erogol·
Machine Learns 66 is out This issue is heavy on model architecture + training tricks: • Nemotron v3 • Mamba-3 • Attention Residuals • LM head as a gradient bottleneck • Fish Audio S2 • speculative decoding models Full issue 👇 erogol.substack.com/p/machine-lear…
English
0
0
6
300
erogol
erogol@erogol·
Machine Learns #65 🤖📬 Steerling-8B (causal diffusion+interp), LK Losses for speculative decoding, MLRA+Untied Ulysses for KV/memory, SD‑MoE on “fake experts”, plus DashengTokenizer + FlexiCodec/MSR‑Codec + MeanVoiceFlow. erogol.substack.com/p/machine-lear…
English
0
1
2
262
erogol retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
We’re excited to introduce Doc-to-LoRA and Text-to-LoRA, two related research exploring how to make LLM customization faster and more accessible. pub.sakana.ai/doc-to-lora/ By training a Hypernetwork to generate LoRA adapters on the fly, these methods allow models to instantly internalize new information or adapt to new tasks. Biological systems naturally rely on two key cognitive abilities: durable long-term memory to store facts, and rapid adaptation to handle new tasks given limited sensory cues. While modern LLMs are highly capable, they still lack this flexibility. Traditionally, adding long-term memory or adapting an LLM to a specific downstream task requires an expensive and time-consuming model update, such as fine-tuning or context distillation, or relies on memory-intensive long prompts. To bypass these limitations, our work focuses on the concept of cost amortization. We pay the meta-training cost once to train a hypernetwork capable of producing tasks or document specific LoRAs on demand. This turns what used to be a heavy engineering pipeline into a single, inexpensive forward pass. Instead of performing per-task optimization, the hypernetwork meta-learns update rules to instantly modify an LLM given a new task description or a long document. In our experiments, Text-to-LoRA successfully specializes models to unseen tasks using just a natural language description. Building on this, Doc-to-LoRA is able to internalize factual documents. On a needle-in-a-haystack task, Doc-to-LoRA achieves near-perfect accuracy on instances five times longer than the base model's context window. It can even generalize to transfer visual information from a vision-language model into a text-only LLM, allowing it to classify images purely through internalized weights. Importantly, both methods run with sub-second latency, enabling rapid experimentation while avoiding the overhead of traditional model updates. This approach is a step towards lowering the technical barriers of model customization, allowing end-users to specialize foundation models via simple text inputs. We have released our code and papers for the community to explore. Doc-to-LoRA Paper: arxiv.org/abs/2602.15902 Code: github.com/SakanaAI/Doc-t… Text-to-LoRA Paper: arxiv.org/abs/2506.06105 Code: github.com/SakanaAI/Text-…
GIF
English
74
354
2.2K
596.8K
erogol
erogol@erogol·
For AI thinking tokens Chinese and Japanese could be ~50% more token-efficient than English as they have higher information density. Same info, fewer tokens, less money&energy. It might even give a big edge for those companies in the long term.
English
0
1
2
206
erogol retweetledi
Larry Dial
Larry Dial@classiclarryd·
New NanoGPT Speedrun WR at 89.1 (-0.7s) from @sisovicm , with a technique called partitioned hyperconnections. The learned weights reveal that the final attn modules prefer to ignore the prediction vectors generated by the final MLPs, and instead query representations from slightly earlier layers. github.com/KellerJordan/m…
Larry Dial tweet media
English
1
15
141
18.2K
erogol
erogol@erogol·
isn't "developing an ai agent framework" an oxymoron?
English
0
0
0
110
erogol
erogol@erogol·
No bad intentions but Gemini 3.1 pro is quite dumb as an agent. Confusing thing in the context a lot
English
0
0
1
386
erogol
erogol@erogol·
Here is a great tip, keep your openclaw memories as emails. Works like magic 🪄
English
0
0
1
173
erogol
erogol@erogol·
@michalwols nice! I'm also vibe coding something atm. pls ping me when it is out. I dont use WB but happy to check
English
0
0
0
26
Michal Wolski
Michal Wolski@michalwols·
@erogol I hacked together a multimodal logging tool that writes to ducklake or lance / parquet files and supports having custom monitors for things like outlier or drift detection. will probably open source it soon with a wandb / datadog like ui on top
English
2
0
1
91
erogol
erogol@erogol·
there is a space for Tensorboard but with AI that monitors training runs and detects irregularities, possible improvements and inform you about the overall health of the training run. let me know if you do it. I'm the first customer.
English
1
0
1
185