James Barry

193 posts

James Barry banner
James Barry

James Barry

@jamesarbarry

Research Scientist @ IBM Research | PhD from ADAPT Centre, Dublin City University

Dublin, Ireland Katılım Mart 2017
1.1K Takip Edilen215 Takipçiler
James Barry
James Barry@jamesarbarry·
If any Irish speakers are interested in helping annotate some of the BLEnD examples to Irish, please let me know (can be as little as a few examples). We aim to have Irish included in the next release. huggingface.co/datasets/nayeo…
English
0
6
9
1.2K
Stas Bekman
Stas Bekman@StasBekman·
At his GTC talk, @JordanNanos included an important slide that shows that it's not about whether SLURM is better/worse than k8s, but it's about using the right tool for the right job, the rough mapping is: - you want SLURM for pre- and post-training - you want k8s for inference
Stas Bekman tweet media
Jordan Nanos@JordanNanos

Last week at @NVIDIA GTC I had a talk on what it takes to build a GPU cluster with @HPE and @HPE_Cray It breaks down into a 12 step program 🧵

English
3
4
59
5.9K
James Barry
James Barry@jamesarbarry·
@mark_l_watson Totally get the sentiment but sometimes it's good to be able to see the main functionality without the layers of abstraction of large libraries. I think the first example of tool calling agents I'd seen in vanilla python code was this helpful reference: github.com/ScaDS/BIDS-lec…
English
0
1
2
275
mark_l_watson
mark_l_watson@mark_l_watson·
Sort of sad, but I think I’ll shelve my current writing project: a book on building LLM-based agents from scratch, with separate book editions featuring examples in Common Lisp, Racket, Clojure, Python, and Haskell. The turning point came after spending a few days experimenting with Microsoft’s AutoGen v0.4.2, where I found it remarkably straightforward to create versatile agents without writing everything from the ground up. Sometimes, leveraging the innovations of others, while not as much fun, is far more effective than reinventing the wheel. I’m also quite impressed by Hugging Face’s smol agents, but AutoGen feels more effective, at least for right now. github.com/microsoft/auto…
English
2
2
46
2.4K
James Barry
James Barry@jamesarbarry·
@_philschmid @huggingface Been following your guides this year for distributed training, instruction tuning, model serving, they've all been so relevant and helpful. Keep it up!
English
0
0
0
532
Philipp Schmid
Philipp Schmid@_philschmid·
The only fine-tuning guide you need for 2025 ‼️ Excited to share “How to fine-tune open LLMs in 2025 with @huggingface” covering everything from Q-LoRA to Spectrum methods with focus on optimization, efficiency and distributed training. 👀 Fine-tuning still matters for specialized use cases despite better models - especially for consistency, domain expertise, controlling output style, or reducing costs. In this guide, you will learn how: 🎯 Define good use cases for fine-tuning vs. prompting 🛠️ Set up your development environment using Hugging Face libraries 📚 Create and prepare datasets in conversation format ⚡ Use Q-LoRA for efficient 4-bit training or Spectrum method to selectively fine-tune important layers 💨 Speed up training with Flash Attention and Liger Kernels 💻 Scale across multiple GPUs with DeepSpeed and accelerate 📊 Test and evaluate models with evaluation harness 🔥 Run for production using TGI/vLLM
Philipp Schmid tweet media
English
20
221
1.2K
81.1K
James Barry
James Barry@jamesarbarry·
IBM Research Europe (UK) are looking for interns in generative AI in sustainability: reasoning over multimodal data, foundations models in the climate/materials domain, and exploring agentic systems. Please consider applying here: careers.ibm.com/job/21206334/r…
English
0
0
2
277
James Barry
James Barry@jamesarbarry·
@carrigmat Ah I see, yes the v0.3 Mistral Instruct model has new special tokens like [TOOL_CALL], [TOOL_RESULT] and their closing equivalents, so I can see how they would be interleaved in the chat, thanks!
English
1
0
1
27
Matthew Carrigan
Matthew Carrigan@carrigmat·
@jamesarbarry The chat templates are modified to handle this! Usually, it'll get inserted into the formatted text with something like: [TOOL_RESULTS] {"content": "22.0"} [/TOOL_RESULTS]
English
1
0
1
46
Matthew Carrigan
Matthew Carrigan@carrigmat·
Even the closed-source APIs often have messy documentation spread between "tool use" and "assistants" workflows. Figuring it out isn't easy! That's changed, though. Let's walk through a simple tool use process:
English
1
1
10
1.2K
James Barry
James Barry@jamesarbarry·
@carrigmat When we have a "role": "tool" message, does this get aliased as a "user" or "assistant" message in the chat history, or are the chat templates modified to handle these cases?
English
1
0
0
17
Matthew Carrigan
Matthew Carrigan@carrigmat·
Step 4: Call the tool function with the arguments the model requested, and add the result to the chat as well.
Matthew Carrigan tweet media
English
2
0
7
597
James Barry
James Barry@jamesarbarry·
Had a great time presenting our Demo Paper, KnowledgeHub: An End-to-End Tool for Assisted Scientific Discovery at IJCAI 2024 in Jeju Island, Korea today!
James Barry tweet media
English
2
2
18
1.3K
Liliang Ren
Liliang Ren@liliang_ren·
Introducing Samba 3.8B, a simple Mamba+Sliding Window Attention architecture that outperforms Phi3-mini on major benchmarks (e.g., MMLU, GSM8K and HumanEval) by a large margin.😮 And it has an infinite context length with linear complexity.🤯 Paper: arxiv.org/abs/2406.07522 (1/6)
Liliang Ren tweet media
English
31
269
1.5K
243.5K
James Barry
James Barry@jamesarbarry·
@francoisfleuret Reminds me of Norvig’s PAIP that mentions a student used to write 1 letter names so his variables would be looked up faster, but “Every variable, regardless of its name, is just a memory location, and the time to access the location does not depend on the name of the variable.”
English
0
0
1
374
James Barry
James Barry@jamesarbarry·
With #Nvidia stock rallying, competitors are no doubt trying to capture back some of that market share. Realistically how big are Nvidia's hardware/software moats - how likely is wider spread adoption of: - AMD/ROCm - Intel/Habana - Apple/MLX - Tesla/Dojo?
English
0
0
3
413
James Barry
James Barry@jamesarbarry·
@jackminong @StasBekman True. Another problem I find with remote editing is that there can be latency when opening and saving files etc. (esp. if someone's running some heavy program on the remote), and if you're using emacs or vim where keystrokes feel automatic, the latency messes up the flow
English
0
0
1
41
Jackmin
Jackmin@jackminong·
@jamesarbarry @StasBekman This really depends on your connectivity to the remote machine. I have a machine in Berlin where I live and a machine in Kuala Lumpur. Running notebooks on the KL machine is basically infeasible.
English
1
0
2
69
Stas Bekman
Stas Bekman@StasBekman·
Is it true that Mac users have absolutely no way of doing NVidia GPU-based software development on their laptops and must connect to a remote server with GPUs? Why choose Mac to do ML development then? Isn't it counterproductive for fast development? I understand that even eGPU is a no go.
English
82
13
257
179.9K
James Barry
James Barry@jamesarbarry·
@StasBekman @jackminong Developing on a remote cluster is never as seamless as local. VSCode remote extension can be quite finicky, with random disconnects and having to open your project again on a login and then compute node. Alternatives such as Emacs in Tramp mode aren't painless either
English
1
0
3
1.1K
Stas Bekman
Stas Bekman@StasBekman·
I hear you that it's doable, Jackmin From seeing my colleagues work, it doesn't look easy, especially if you have to use SLURM to first allocate a compute node, then figure out its hostname, etc., etc, this is not fast. And if you spin up a rentable gpu then you have to be careful to spin it down not to pay more. This doesn't sound like a very developer-friendly situation to me. I was just wondering if anybody is trying to solve it.
English
19
0
18
21.3K
Sara Hooker
Sara Hooker@sarahookr·
what papers establish data leakage between train and test sets in pretrained large language models? I vaguely remember a few, but struggling to recall names.
English
11
5
73
27.8K
Sasha Rush
Sasha Rush@srush_nlp·
I wanted to pay my respects to Eugene Charniak who recently passed away. I only met him a couple of times, but his work was a major inspiration to me as a researcher.
English
2
4
59
10.4K
James Barry
James Barry@jamesarbarry·
If anyone is interested in learning about the Common Lisp Object System (classes and object-oriented programing in #Lisp), I wrote a blog post about it here, using the example of scoring a game of bowling: jbrry.github.io/2022/09/22/bow…
English
0
1
8
0
James Barry
James Barry@jamesarbarry·
@MOMeachair For me I was mainly exposed to NLP from the deep learning side, so I always wanted to learn more about previous approaches. And you're right, there's too many things on the list and too little time 😅
English
0
0
0
0
Mícheál Johnny
Mícheál Johnny@MOMeachair·
@jamesarbarry Ah yeh, sometimes it's the best way of knowing/remembering where we are now. I've liked reviewing courses too, hopefully I'll have the time again... Some day... 😂😅🙏
English
1
0
1
0
James Barry
James Barry@jamesarbarry·
I'm going through Michael Collins' 2013 Columbia NLP course that was on Coursera. Here is an implementation of a trigram UD POS tagger using the Viterbi Algorithm as described in the Chapter 2 notes: github.com/jbrry/HMM-Tagg…
English
1
1
10
0
James Barry
James Barry@jamesarbarry·
@MOMeachair Thanks Mícheál! I took a brief look at this course in 2016 but am only really going through it properly now. It's a real gem, and it's nice to look at some of the older stuff too!
English
1
0
0
0