Craig Pfeifer

22.4K posts

Craig Pfeifer banner
Craig Pfeifer

Craig Pfeifer

@aCraigPfeifer

currently AI integration @ TCG, ex-@lightningai, ex-@Mitrecorp, @purduecs, PhD dropout @umbccsee

Flyover Country, USA Katılım Eylül 2011
3.4K Takip Edilen725 Takipçiler
Sabitlenmiş Tweet
Craig Pfeifer
Craig Pfeifer@aCraigPfeifer·
@vboykis Producer: Pitch me. Me: It's a psych horror about a software engineer who will be auctioned off to someone who will inhabit his body. His only clues are commit messages in a code repository. It's called "Git Out." Producer: Get out. Me: no, git out. Git is a Producer: Get out.
English
1
33
226
0
Craig Pfeifer retweetledi
Benjamin Van Durme
Benjamin Van Durme@ben_vandurme·
JHU mmBERT extended from 8k to 32k token length by vLLM Semantic Router Team. Cutting edge results on 1,800+ languages, now with longer context! huggingface.co/llm-semantic-r…
English
0
7
28
1.6K
Craig Pfeifer retweetledi
Akshay 🚀
Akshay 🚀@akshay_pachaar·
A dead-simple trick to improve LLM performance: Just repeat your prompt twice. No fancy prompting techniques, no chain-of-thought, just plain repetition. Google researchers tested this across Gemini, GPT, Claude, and Deepseek, and the results were surprisingly good. Here's why it works: LLMs are causal, meaning tokens can only see what came before them. When you ask a question after providing context, the question tokens never "saw" the full picture. By repeating the prompt, every token gets to attend to every other token during prefill. The best part: - No increase in output length - No increase in latency - Works as a simple drop-in replacement On one task, Gemini Flash-Lite jumped from 21% to 97% accuracy just by repeating the input. Important note: This helps most when reasoning is disabled. If you're already using "think step-by-step," the gains are mostly neutral since reasoning models tend to repeat the prompt internally anyway. Paper: "Prompt Repetition Improves Non-Reasoning LLMs" from Google Research. Sometimes the simplest ideas win. Link to the paper in the next tweet.
Akshay 🚀 tweet media
English
35
63
347
33.3K
Craig Pfeifer retweetledi
Akshay 🚀
Akshay 🚀@akshay_pachaar·
Stanford researchers built a new prompting technique! By adding ~20 words to a prompt, it: - boosts LLM's creativity by 1.6-2x - raises human-rated diversity by 25.7% - beats fine-tuned model without any retraining - restores 66.8% of LLM's lost creativity after alignment Let's understand why and how it works: Post-training alignment methods like RLHF make LLMs helpful and safe, but they unintentionally cause mode collapse. This is where the model favors a narrow set of predictable responses. This happens because of typicality bias in human preference data: When annotators rate LLM responses, they naturally prefer answers that are familiar, easy to read, and predictable. The reward model then learns to boost these "safe" responses, aggressively sharpening the probability distribution and killing creative output. But here's the interesting part: The diverse, creative model isn't gone. After alignment, the LLM still has two personalities. The original pre-trained model with rich possibilities, and the safety-focused aligned model. Verbalized Sampling (VS) is a training-free prompting strategy that recovers the diverse distribution learned during pre-training. The idea is simple: Instead of prompting "Tell me a joke" (which triggers the aligned personality), you prompt: "Generate 5 responses with their corresponding probabilities. Tell me a joke." By asking for a distribution instead of a single instance, you force the model to tap into its full pre-trained knowledge rather than defaulting to the most reinforced answer. Results show verbalized sampling enhances diversity by 1.6-2.1x over direct prompting while maintaining or improving quality. Variants like VS-based Chain-of-Thought and VS-based Multi push diversity even further. You can find the paper link in the next tweet. 👉 Over to you: What other methods can be used to improve LLM diversity?
Akshay 🚀 tweet media
English
57
315
1.7K
135.4K
Craig Pfeifer retweetledi
Sebastian Raschka
Sebastian Raschka@rasbt·
Just read Apple's "OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework". Similar to the OLMo, it's refreshing to see an LLM paper that shares details discussing the architecture, training methods, and training data. Let's start with the most interesting tidbits: - OpenELM comes in 4 relatively small and convenient sizes: 270M, 450M, 1.1B, and 3B - OpenELM performs slightly better than OLMo even though it's trained on 2x fewer tokens - The main architecture tweak is a layer-wise scaling strategy Sharing details is not the same as explaining them, which is what research papers were aimed to do when I was a graduate student. For instance, they sampled a relatively small subset of 1.8T tokens from various publicly available datasets (RefinedWeb, RedPajama, The PILE, and Dolma). This subset was 2x smaller than Dolma, which was used for training OLMo. What was the rationale for this subsampling, and what were the criteria? The layer-wise scaling strategy (adopted from the "DeLighT: Deep and Light-weight Transformer" paper) is very interesting. I wish there was an ablation studio training an LLM with and without this strategy on the same dataset. But those experiments are expensive, and I can understand why they didn't do it. An interesting bonus that I didn't expect was that the researchers compared LoRA and DoRA (which I discussed a few weeks ago) for parameter-efficient finetuning! It turns out that there wasn't a noticeable difference between the two methods, though. Anyways, great work, and big kudos to the researchers (and Apple) for sharing!
Sebastian Raschka tweet media
English
7
181
863
82.2K
Craig Pfeifer retweetledi
Linus
Linus@thesephist·
A while ago I complained here about persistent storage in Google Colab. Have been using @LightningAI Studios for a while now for: - Full VSCode (incl. GH Copilot) - Persisted files shared across notebooks - Multi-GPU/node (!!) It's been great. Feels like a remote ML workstation
Linus tweet media
English
7
33
259
56.2K
Craig Pfeifer
Craig Pfeifer@aCraigPfeifer·
1: did you hear bob was fired? 2: I didn't, what did they even do? 1: no one knows, maybe that's why they got fired (Two weeks later) 2: oh, yeah bob kind of did a lot.
English
0
0
1
57
Craig Pfeifer
Craig Pfeifer@aCraigPfeifer·
"How long have you been working in deep learning?" "Since import theano"
English
0
0
4
132
Joseph Fasano
Joseph Fasano@Joseph_Fasano_·
What line of poetry would you tattoo on your body?
English
292
43
451
349.6K
Craig Pfeifer
Craig Pfeifer@aCraigPfeifer·
Q: What song is the @IndianaUniv Computer Science Marching Band most famous for? A: String, String, String
English
0
0
0
66
Craig Pfeifer
Craig Pfeifer@aCraigPfeifer·
@deliprao Also necessary vs sufficient. Models keep getting bigger, but what is necessary for different use cases? What is the 'right size's for different tasks? Different data sets? When does a small domain specific model beat a large, general model?
English
0
0
1
50
Delip Rao e/σ
Delip Rao e/σ@deliprao·
Asking on behalf of a prospective #nlproc PhD student. If you were to start a PhD program today, what are some broader issues you would absolutely study? - Be specific (i.e., don’t just say “multimodal LLMs”, “interpretability in LLMs”) - Don’t be too specific either
English
41
19
168
291.2K
Craig Pfeifer
Craig Pfeifer@aCraigPfeifer·
@deliprao Representation learning. Everyone looks at what you can do with LLMs, but few understand what they actually are. Open the hood and poke around.
English
1
0
1
69
Craig Pfeifer
Craig Pfeifer@aCraigPfeifer·
My favorite part of big data? Big debugging. Said no one ever.
English
0
0
0
67
Craig Pfeifer retweetledi
Jason Wei
Jason Wei@_jasonwei·
One pattern I noticed is that great AI researchers are willing to manually inspect lots of data. And more than that, they build infrastructure that allows them to manually inspect data quickly. Though not glamorous, manually examining data gives valuable intuitions about the problem. The canonical example here is Andrej Karpathy doing the ImageNet 2000-way classification task himself. And in the era of large language models, manually examining data is probably even more insightful since completions are hard to evaluate via benchmarks. In this spirit, I recently did a few days of pair programming with @hwchung27 where we were starting on a new problem. Instead of trying to replicate baselines and design new methods, we ran some evaluations and manually inspected them to gain insights. We first paid about one day of overhead getting all the relevant information in a single UI so we could examine the data without having to click through multiple web pages. The second day, we spent an afternoon reading examples together and taking notes on the patterns that we noticed in the examples. ChatGPT generates long text, and we actually read the whole thing carefully, even if one example took 20 minutes to understand. I think we both gained a deeper understanding of the problem that we could not have gotten from reading research papers. (In 2018, for example, I helped pathologists label a lot of data to train a lung cancer classifier. After having manually labeled 200+ images (with pathologist correction), I’d probably gained a pathologist-level understanding at that one particular lung cancer classification task :))
English
43
198
1.8K
381.9K
Craig Pfeifer
Craig Pfeifer@aCraigPfeifer·
TFW your interviewer says "we've built our own ML tech stack from the ground up"
GIF
English
0
0
0
54
Craig Pfeifer retweetledi
Alec Stapp
Alec Stapp@AlecStapp·
The backstory to how GPS became freely available for civilian use 🤯
Alec Stapp tweet media
English
95
1.2K
8K
835.6K
Craig Pfeifer retweetledi
hardmaru
hardmaru@hardmaru·
TinyML and Efficient Deep Learning Computing MIT 6.5940 (efficientml.ai) “This course will introduce efficient AI computing techniques that enable powerful deep learning applications on resource-constrained devices. Topics include model compression, pruning, quantization, neural architecture search, distributed training, data/model parallelism, gradient compression, and on-device fine-tuning. It also introduces application-specific acceleration techniques for large language models, diffusion models, video recognition, and point cloud. This course will also cover topics about quantum machine learning. Students will get hands-on experience deploying large language models (e.g., LLaMA 2) on a laptop.”
English
22
218
1.4K
239.5K