James Barry

193 posts

James Barry

@jamesarbarry

Research Scientist @ IBM Research | PhD from ADAPT Centre, Dublin City University

Dublin, Ireland Katılım Mart 2017

1.1K Takip Edilen215 Takipçiler

James Barry@jamesarbarry·13 Haz

If any Irish speakers are interested in helping annotate some of the BLEnD examples to Irish, please let me know (can be as little as a few examples). We aim to have Irish included in the next release. huggingface.co/datasets/nayeo…

English

1.2K

James Barry@jamesarbarry·24 Mar

@NotTheBudha @MSripadarao @StasBekman @JordanNanos @OpenAI @AnthropicAI @xai Handy thing about slurm is you can just activate a venv and quickly launch your script, most IDEs have good support for remote dev. With k8s you need to build your image, host it on a registry, mount a volume etc. Can be a bit harder to debug things/fix env issues as they arise.

English

Gautam Kumar@CuriousGoswami·24 Mar

@MSripadarao @StasBekman @JordanNanos Indeed k8s has stepper curve, but most of the frontier lab like @OpenAI @AnthropicAI or @xai runs pre-training on K8s not on slurm.

English

Stas Bekman@StasBekman·24 Mar

At his GTC talk, @JordanNanos included an important slide that shows that it's not about whether SLURM is better/worse than k8s, but it's about using the right tool for the right job, the rough mapping is: - you want SLURM for pre- and post-training - you want k8s for inference

Jordan Nanos@JordanNanos

Last week at @NVIDIA GTC I had a talk on what it takes to build a GPU cluster with @HPE and @HPE_Cray It breaks down into a 12 step program 🧵

English

5.9K

James Barry@jamesarbarry·19 Oca

@mark_l_watson Totally get the sentiment but sometimes it's good to be able to see the main functionality without the layers of abstraction of large libraries. I think the first example of tool calling agents I'd seen in vanilla python code was this helpful reference: github.com/ScaDS/BIDS-lec…

English

275

mark_l_watson@mark_l_watson·19 Oca

Sort of sad, but I think I’ll shelve my current writing project: a book on building LLM-based agents from scratch, with separate book editions featuring examples in Common Lisp, Racket, Clojure, Python, and Haskell. The turning point came after spending a few days experimenting with Microsoft’s AutoGen v0.4.2, where I found it remarkably straightforward to create versatile agents without writing everything from the ground up. Sometimes, leveraging the innovations of others, while not as much fun, is far more effective than reinventing the wheel. I’m also quite impressed by Hugging Face’s smol agents, but AutoGen feels more effective, at least for right now. github.com/microsoft/auto…

English

2.4K

James Barry@jamesarbarry·20 Ara

@_philschmid @huggingface Been following your guides this year for distributed training, instruction tuning, model serving, they've all been so relevant and helpful. Keep it up!

English

532

Philipp Schmid@_philschmid·20 Ara

The only fine-tuning guide you need for 2025 ‼️ Excited to share “How to fine-tune open LLMs in 2025 with @huggingface” covering everything from Q-LoRA to Spectrum methods with focus on optimization, efficiency and distributed training. 👀 Fine-tuning still matters for specialized use cases despite better models - especially for consistency, domain expertise, controlling output style, or reducing costs. In this guide, you will learn how: 🎯 Define good use cases for fine-tuning vs. prompting 🛠️ Set up your development environment using Hugging Face libraries 📚 Create and prepare datasets in conversation format ⚡ Use Q-LoRA for efficient 4-bit training or Spectrum method to selectively fine-tune important layers 💨 Speed up training with Flash Attention and Liger Kernels 💻 Scale across multiple GPUs with DeepSpeed and accelerate 📊 Test and evaluate models with evaluation harness 🔥 Run for production using TGI/vLLM

English

221

1.2K

81.1K

James Barry@jamesarbarry·30 Eki

IBM Research Europe (UK) are looking for interns in generative AI in sustainability: reasoning over multimodal data, foundations models in the climate/materials domain, and exploring agentic systems. Please consider applying here: careers.ibm.com/job/21206334/r…

English

277

James Barry@jamesarbarry·12 Ağu

@carrigmat Ah I see, yes the v0.3 Mistral Instruct model has new special tokens like [TOOL_CALL], [TOOL_RESULT] and their closing equivalents, so I can see how they would be interleaved in the chat, thanks!

English

Matthew Carrigan@carrigmat·12 Ağu

@jamesarbarry The chat templates are modified to handle this! Usually, it'll get inserted into the formatted text with something like: [TOOL_RESULTS] {"content": "22.0"} [/TOOL_RESULTS]

English

Matthew Carrigan@carrigmat·12 Ağu

Even the closed-source APIs often have messy documentation spread between "tool use" and "assistants" workflows. Figuring it out isn't easy! That's changed, though. Let's walk through a simple tool use process:

English

1.2K

James Barry@jamesarbarry·12 Ağu

@carrigmat When we have a "role": "tool" message, does this get aliased as a "user" or "assistant" message in the chat history, or are the chat templates modified to handle these cases?

English

Matthew Carrigan@carrigmat·12 Ağu

Step 4: Call the tool function with the arguments the model requested, and add the result to the chat as well.

English

597

James Barry@jamesarbarry·8 Ağu

Details of the paper can be found here: ijcai.org/proceedings/20…

English

215

James Barry@jamesarbarry·8 Ağu

Had a great time presenting our Demo Paper, KnowledgeHub: An End-to-End Tool for Assisted Scientific Discovery at IJCAI 2024 in Jeju Island, Korea today!

English

1.3K

James Barry@jamesarbarry·13 Haz

@liliang_ren Congrats Liliang, very exciting work!

English

682

Liliang Ren@liliang_ren·13 Haz

Introducing Samba 3.8B, a simple Mamba+Sliding Window Attention architecture that outperforms Phi3-mini on major benchmarks (e.g., MMLU, GSM8K and HumanEval) by a large margin.😮 And it has an infinite context length with linear complexity.🤯 Paper: arxiv.org/abs/2406.07522 (1/6)

English

269

1.5K

243.5K

James Barry@jamesarbarry·1 Mar

@francoisfleuret Reminds me of Norvig’s PAIP that mentions a student used to write 1 letter names so his variables would be looked up faster, but “Every variable, regardless of its name, is just a memory location, and the time to access the location does not depend on the name of the variable.”

English

374

François Fleuret@francoisfleuret·1 Mar

Rewrote a tiny VAE for updating my course, and I took variable names seriously. fleuret.org/cgi-bin/gitweb…

English

425

68.4K

James Barry@jamesarbarry·28 Oca

With #Nvidia stock rallying, competitors are no doubt trying to capture back some of that market share. Realistically how big are Nvidia's hardware/software moats - how likely is wider spread adoption of: - AMD/ROCm - Intel/Habana - Apple/MLX - Tesla/Dojo?

English

413

James Barry@jamesarbarry·14 Eki

@jackminong @StasBekman True. Another problem I find with remote editing is that there can be latency when opening and saving files etc. (esp. if someone's running some heavy program on the remote), and if you're using emacs or vim where keystrokes feel automatic, the latency messes up the flow

English

Jackmin@jackminong·14 Eki

@jamesarbarry @StasBekman This really depends on your connectivity to the remote machine. I have a machine in Berlin where I live and a machine in Kuala Lumpur. Running notebooks on the KL machine is basically infeasible.

English

Stas Bekman@StasBekman·14 Eki

Is it true that Mac users have absolutely no way of doing NVidia GPU-based software development on their laptops and must connect to a remote server with GPUs? Why choose Mac to do ML development then? Isn't it counterproductive for fast development? I understand that even eGPU is a no go.

English

257

179.9K

James Barry@jamesarbarry·14 Eki

@StasBekman @jackminong Developing on a remote cluster is never as seamless as local. VSCode remote extension can be quite finicky, with random disconnects and having to open your project again on a login and then compute node. Alternatives such as Emacs in Tramp mode aren't painless either

English

1.1K

Stas Bekman@StasBekman·14 Eki

I hear you that it's doable, Jackmin From seeing my colleagues work, it doesn't look easy, especially if you have to use SLURM to first allocate a compute node, then figure out its hostname, etc., etc, this is not fast. And if you spin up a rentable gpu then you have to be careful to spin it down not to pay more. This doesn't sound like a very developer-friendly situation to me. I was just wondering if anybody is trying to solve it.

English

21.3K

James Barry@jamesarbarry·17 Tem

@sarahookr Catalan BERT has a section: 6.3 Data contamination which takes a look at this (aclanthology.org/2021.findings-…) cc @OnadeGibert

English

468

Sara Hooker@sarahookr·17 Tem

what papers establish data leakage between train and test sets in pretrained large language models? I vaguely remember a few, but struggling to recall names.

English

27.8K

James Barry@jamesarbarry·16 Haz

@srush_nlp His (with Mark Johnson) chapter on parsing is brilliant: cs.brown.edu/courses/csci14… I can't seem to find the complete collection of chapters, best I could find was a draft from 2013: cs.brown.edu/courses/csci29… I hope that these notes will be collected and preserved somewhere

English

102

Sasha Rush@srush_nlp·16 Haz

I wanted to pay my respects to Eugene Charniak who recently passed away. I only met him a couple of times, but his work was a major inspiration to me as a researcher.

English

10.4K

James Barry@jamesarbarry·13 Haz

Circulating to anyone interested in working with Celtic Languages!

Matthew Carrigan@carrigmat

The Irish language section might be of interest to either @jamesarbarry or @emmacarrigan! (I am not nearly confident enough in my fluency to try)

English

538

James Barry@jamesarbarry·28 Eki

If anyone is interested in learning about the Common Lisp Object System (classes and object-oriented programing in #Lisp), I wrote a blog post about it here, using the example of scoring a game of bowling: jbrry.github.io/2022/09/22/bow…

English

James Barry@jamesarbarry·20 Eki

@MOMeachair For me I was mainly exposed to NLP from the deep learning side, so I always wanted to learn more about previous approaches. And you're right, there's too many things on the list and too little time 😅

English

Mícheál Johnny@MOMeachair·20 Eki

@jamesarbarry Ah yeh, sometimes it's the best way of knowing/remembering where we are now. I've liked reviewing courses too, hopefully I'll have the time again... Some day... 😂😅🙏

English

James Barry@jamesarbarry·20 Eki

I'm going through Michael Collins' 2013 Columbia NLP course that was on Coursera. Here is an implementation of a trigram UD POS tagger using the Viterbi Algorithm as described in the Chapter 2 notes: github.com/jbrry/HMM-Tagg…

English

James Barry@jamesarbarry·20 Eki

@MOMeachair Thanks Mícheál! I took a brief look at this course in 2016 but am only really going through it properly now. It's a real gem, and it's nice to look at some of the older stuff too!

English

Mícheál Johnny@MOMeachair·20 Eki

@jamesarbarry Sounds like a fun and worthwhile course, James! Good man

English

Keşfet

@MSripadarao @StasBekman @JordanNanos @OpenAI @AnthropicAI @xai @mark_l_watson @_philschmid