Austin Veselka

71 posts

Austin Veselka

@further_ai

Senior RL Engineer at @OpenPipeAI/@CoreWeave Goal: further AI

انضم Şubat 2024

119 يتبع65 المتابعون

Austin Veselka@further_ai·9 Nis

Qwen model here: huggingface.co/lightonai/OriO…

Magyar

107

Austin Veselka@further_ai·9 Nis

This is similar to implicit CoT work, which is all pretty cool stuff. But this is a new way to internalize reasoning and a new control mechanism over performance. Check out the paper for details and the ablations that narrow down explanations!

English

105

Austin Veselka@further_ai·9 Nis

I use task arithmetic/model merging to limit degradation (0.25 * SFT) and during evaluation, I noticed that with <cot>, the model doesn't reason out loud, but removing <cot> still degrades performance. Thus, internalized reasoning.

English

Austin Veselka@further_ai·9 Nis

I made a paper! arxiv.org/abs/2604.02371 Essentially: Extract evidence from pages and sort the top K to make a reasoning trace. Add in a control token and we can turn it on or off. Internalize with model merging.

Austin Veselka@further_ai

So, I used a control tok - <cot> - and trained the model without the CoT when it is not in the prompt. If you train entirely without the synthetic CoT traces, it performs **very** similarly to not prompting <cot>. So, the model can turn the internalized algorithm on or off? Cool

English

Austin Veselka@further_ai·9 Nis

I also trained two versions of the models, one with the mixed think + non-think examples and one with only non-think examples. I evaluate both versions with and without the <cot> token in the system prompt:

English

Austin Veselka@further_ai·9 Nis

When building examples, I make most examples with a control token <cot> in the system prompt. These examples include the reasoning trace. This gives me a switch that affects the model's reasoning or path to the answer.

English

Austin Veselka@further_ai·9 Nis

There are two exclusive branches next: - Vision: receives the sorted pages and the question only - Text: sorted, extracted evidence, the question, and context on the input (ties the answer causally to the reasoning reasoning trace). Both branches generate a final answer.

English

Austin Veselka@further_ai·9 Nis

This seems to be key to getting good performance. The first version just had each page's evidence from the whole document (some pages as "irrelevant"), but during inference, the model tends to loop on "irrelevant". I believe the model learns v2's RAG-like algorithm.

English

Austin Veselka@further_ai·9 Nis

Thus, ground truth pages are ~always marked relevant, with some score flexibility. I filter for scores above a threshold, sort the pages + their evidence from greatest to least and take the top K (16).

English

Austin Veselka@further_ai·9 Nis

I take a document and a synthetic question. For each page, I task a VLM with extracting evidence relevant to the question, along with a relevance score in [0.0, 10.0]. If the page was used to generate the question, I prompt the model to score between [6.0, 10.0].

English

Austin Veselka@further_ai·9 Nis

I trained Qwen3 VL 32B to a new SOTA on MMLongBenchDoc, 58.3 (leaderboard update coming soon). I also trained Mistral Small 3.1 24B and the method is highly impactful across both models. These models are also very token efficient. Here's a little more detail on how it works:

English

147

Austin Veselka@further_ai·17 Mar

@thsottiaux Code can be 10x as many lines as needed with hasattr()s, .get()s , etc. and it raises errors for everything (assert isinstance(count, int), "3 line message") even when we fully know that count is an int. These waste space with insanely defensive code and it can fallback silently

English

Tibo@thsottiaux·17 Mar

What are we consistently getting wrong with codex that you wish we would improve / fix?

English

1.2K

871

144.9K

Austin Veselka@further_ai·20 Şub

@oskar_hallstrom Thanks for your help and advice with the project, it went a long ways towards its success

English

Oskar Hallström@oskar_hallstrom·19 Şub

@further_ai Congrats to the release @further_ai 🎉🎉

English

Austin Veselka@further_ai·18 Şub

Excited to share my work "How to Train Your Long-Context Visual Document Model." (arxiv.org/abs/2602.15257) Research and recipes for training long-context VLMs for document understanding is entirely lacking. In this paper, I explore this frontier with extensive ablations.

English

2.3K

Austin Veselka@further_ai·18 Şub

Check out the paper for the extensive details!

English

Austin Veselka@further_ai·18 Şub

For reproducibility and open insights, I am releasing a full leaderboard of my training runs with data recipes included for the community to explore! Please enjoy huggingface.co/spaces/lighton…

English

152

اكتشف

@thsottiaux @oskar_hallstrom @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA