Craig Pfeifer

22.4K posts

Craig Pfeifer

@aCraigPfeifer

currently AI integration @ TCG, ex-@lightningai, ex-@Mitrecorp, @purduecs, PhD dropout @umbccsee

Flyover Country, USA Katılım Eylül 2011

3.4K Takip Edilen725 Takipçiler

Sabitlenmiş Tweet

Craig Pfeifer@aCraigPfeifer·9 Tem

@vboykis Producer: Pitch me. Me: It's a psych horror about a software engineer who will be auctioned off to someone who will inhabit his body. His only clues are commit messages in a code repository. It's called "Git Out." Producer: Get out. Me: no, git out. Git is a Producer: Get out.

English

226

Craig Pfeifer retweetledi

Benjamin Van Durme@ben_vandurme·4 Şub

JHU mmBERT extended from 8k to 32k token length by vLLM Semantic Router Team. Cutting edge results on 1,800+ languages, now with longer context! huggingface.co/llm-semantic-r…

English

1.6K

Craig Pfeifer retweetledi

Akshay 🚀@akshay_pachaar·15 Oca

A dead-simple trick to improve LLM performance: Just repeat your prompt twice. No fancy prompting techniques, no chain-of-thought, just plain repetition. Google researchers tested this across Gemini, GPT, Claude, and Deepseek, and the results were surprisingly good. Here's why it works: LLMs are causal, meaning tokens can only see what came before them. When you ask a question after providing context, the question tokens never "saw" the full picture. By repeating the prompt, every token gets to attend to every other token during prefill. The best part: - No increase in output length - No increase in latency - Works as a simple drop-in replacement On one task, Gemini Flash-Lite jumped from 21% to 97% accuracy just by repeating the input. Important note: This helps most when reasoning is disabled. If you're already using "think step-by-step," the gains are mostly neutral since reasoning models tend to repeat the prompt internally anyway. Paper: "Prompt Repetition Improves Non-Reasoning LLMs" from Google Research. Sometimes the simplest ideas win. Link to the paper in the next tweet.

English

347

33.3K

Craig Pfeifer retweetledi

Benjamin Van Durme@ben_vandurme·7 Oca

This deadline for post-doc applications is coming up. There are so many great people in AI at JHU, even more with the >20 tenure track hires that started this fall.

Benjamin Van Durme@ben_vandurme

Postdoctoral opportunities at JHU!

English

683

Craig Pfeifer retweetledi

Akshay 🚀@akshay_pachaar·31 Ara

Stanford researchers built a new prompting technique! By adding ~20 words to a prompt, it: - boosts LLM's creativity by 1.6-2x - raises human-rated diversity by 25.7% - beats fine-tuned model without any retraining - restores 66.8% of LLM's lost creativity after alignment Let's understand why and how it works: Post-training alignment methods like RLHF make LLMs helpful and safe, but they unintentionally cause mode collapse. This is where the model favors a narrow set of predictable responses. This happens because of typicality bias in human preference data: When annotators rate LLM responses, they naturally prefer answers that are familiar, easy to read, and predictable. The reward model then learns to boost these "safe" responses, aggressively sharpening the probability distribution and killing creative output. But here's the interesting part: The diverse, creative model isn't gone. After alignment, the LLM still has two personalities. The original pre-trained model with rich possibilities, and the safety-focused aligned model. Verbalized Sampling (VS) is a training-free prompting strategy that recovers the diverse distribution learned during pre-training. The idea is simple: Instead of prompting "Tell me a joke" (which triggers the aligned personality), you prompt: "Generate 5 responses with their corresponding probabilities. Tell me a joke." By asking for a distribution instead of a single instance, you force the model to tap into its full pre-trained knowledge rather than defaulting to the most reinforced answer. Results show verbalized sampling enhances diversity by 1.6-2.1x over direct prompting while maintaining or improving quality. Variants like VS-based Chain-of-Thought and VS-based Multi push diversity even further. You can find the paper link in the next tweet. 👉 Over to you: What other methods can be used to improve LLM diversity?

English

315

1.7K

135.4K

Craig Pfeifer retweetledi

Aniket@aniketmaurya·19 Ara

As simple as 4 lines of code using Agentor! github.com/CelestoAI/agen…

English

112

Craig Pfeifer retweetledi

Sebastian Raschka@rasbt·25 Nis

Just read Apple's "OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework". Similar to the OLMo, it's refreshing to see an LLM paper that shares details discussing the architecture, training methods, and training data. Let's start with the most interesting tidbits: - OpenELM comes in 4 relatively small and convenient sizes: 270M, 450M, 1.1B, and 3B - OpenELM performs slightly better than OLMo even though it's trained on 2x fewer tokens - The main architecture tweak is a layer-wise scaling strategy Sharing details is not the same as explaining them, which is what research papers were aimed to do when I was a graduate student. For instance, they sampled a relatively small subset of 1.8T tokens from various publicly available datasets (RefinedWeb, RedPajama, The PILE, and Dolma). This subset was 2x smaller than Dolma, which was used for training OLMo. What was the rationale for this subsampling, and what were the criteria? The layer-wise scaling strategy (adopted from the "DeLighT: Deep and Light-weight Transformer" paper) is very interesting. I wish there was an ablation studio training an LLM with and without this strategy on the same dataset. But those experiments are expensive, and I can understand why they didn't do it. An interesting bonus that I didn't expect was that the researchers compared LoRA and DoRA (which I discussed a few weeks ago) for parameter-efficient finetuning! It turns out that there wasn't a noticeable difference between the two methods, though. Anyways, great work, and big kudos to the researchers (and Apple) for sharing!

English

181

863

82.2K

Craig Pfeifer retweetledi

Linus@thesephist·26 Nis

A while ago I complained here about persistent storage in Google Colab. Have been using @LightningAI Studios for a while now for: - Full VSCode (incl. GH Copilot) - Persisted files shared across notebooks - Multi-GPU/node (!!) It's been great. Feels like a remote ML workstation

English

259

56.2K

Craig Pfeifer@aCraigPfeifer·19 Nis

1: did you hear bob was fired? 2: I didn't, what did they even do? 1: no one knows, maybe that's why they got fired (Two weeks later) 2: oh, yeah bob kind of did a lot.

English

Craig Pfeifer retweetledi

Leonie@helloiamleonie·11 Nis

Ready to build a “Chat with your GitHub repository” application with Mistral via @ollama, @weaviate_io, and @llama_index? I’ve just dropped a @LightningAI Studio template. No setup, just copy and dive right into action. Jump right in here: lightning.ai/weaviate/studi…

English

143

15.9K

Craig Pfeifer@aCraigPfeifer·15 Şub

"How long have you been working in deep learning?" "Since import theano"

English

132

Craig Pfeifer@aCraigPfeifer·2 Şub

@Joseph_Fasano_ "you who are free / rescue the dead" Two lines, but still poetryfoundation.org/poetrymagazine…

English

Joseph Fasano@Joseph_Fasano_·8 Ara

What line of poetry would you tattoo on your body?

English

292

451

349.6K

Craig Pfeifer@aCraigPfeifer·2 Şub

Q: What song is the @IndianaUniv Computer Science Marching Band most famous for? A: String, String, String

English

Craig Pfeifer@aCraigPfeifer·8 Ara

@deliprao Also necessary vs sufficient. Models keep getting bigger, but what is necessary for different use cases? What is the 'right size's for different tasks? Different data sets? When does a small domain specific model beat a large, general model?

English

Delip Rao e/σ@deliprao·26 Kas

Asking on behalf of a prospective #nlproc PhD student. If you were to start a PhD program today, what are some broader issues you would absolutely study? - Be specific (i.e., don’t just say “multimodal LLMs”, “interpretability in LLMs”) - Don’t be too specific either

English

168

291.2K

Craig Pfeifer@aCraigPfeifer·8 Ara

@deliprao Representation learning. Everyone looks at what you can do with LLMs, but few understand what they actually are. Open the hood and poke around.

English

Craig Pfeifer@aCraigPfeifer·8 Ara

My favorite part of big data? Big debugging. Said no one ever.

English

Craig Pfeifer retweetledi

Jonathan K. Kummerfeld@jkkummerfeld·4 Eki

I'm making a list of NLP faculty who are recruiting PhD students: forms.gle/k8xkEzVGWD95fk… Results are shared here (after I confirm the submission): docs.google.com/spreadsheets/d… This is an experiment intended to help students find advisers and help advisers find students

English

108

379

46.9K

Craig Pfeifer retweetledi

Jason Wei@_jasonwei·2 Eki

One pattern I noticed is that great AI researchers are willing to manually inspect lots of data. And more than that, they build infrastructure that allows them to manually inspect data quickly. Though not glamorous, manually examining data gives valuable intuitions about the problem. The canonical example here is Andrej Karpathy doing the ImageNet 2000-way classification task himself. And in the era of large language models, manually examining data is probably even more insightful since completions are hard to evaluate via benchmarks. In this spirit, I recently did a few days of pair programming with @hwchung27 where we were starting on a new problem. Instead of trying to replicate baselines and design new methods, we ran some evaluations and manually inspected them to gain insights. We first paid about one day of overhead getting all the relevant information in a single UI so we could examine the data without having to click through multiple web pages. The second day, we spent an afternoon reading examples together and taking notes on the patterns that we noticed in the examples. ChatGPT generates long text, and we actually read the whole thing carefully, even if one example took 20 minutes to understand. I think we both gained a deeper understanding of the problem that we could not have gotten from reading research papers. (In 2018, for example, I helped pathologists label a lot of data to train a lung cancer classifier. After having manually labeled 200+ images (with pathologist correction), I’d probably gained a pathologist-level understanding at that one particular lung cancer classification task :))

English

198

1.8K

381.9K

Craig Pfeifer@aCraigPfeifer·4 Eki

TFW your interviewer says "we've built our own ML tech stack from the ground up"

GIF

English

Craig Pfeifer retweetledi

Alec Stapp@AlecStapp·2 Eki

The backstory to how GPS became freely available for civilian use 🤯

English

1.2K

835.6K

Craig Pfeifer retweetledi

hardmaru@hardmaru·23 Eyl

TinyML and Efficient Deep Learning Computing MIT 6.5940 (efficientml.ai) “This course will introduce efficient AI computing techniques that enable powerful deep learning applications on resource-constrained devices. Topics include model compression, pruning, quantization, neural architecture search, distributed training, data/model parallelism, gradient compression, and on-device fine-tuning. It also introduces application-specific acceleration techniques for large language models, diffusion models, video recognition, and point cloud. This course will also cover topics about quantum machine learning. Students will get hands-on experience deploying large language models (e.g., LLaMA 2) on a laptop.”

English

218

1.4K

239.5K

Keşfet

@LightningAI @ollama @weaviate_io @llama_index @Joseph_Fasano_ @IndianaUniv @deliprao @hwchung27