Austin Veselka

71 posts

Austin Veselka

Austin Veselka

@further_ai

Senior RL Engineer at @OpenPipeAI/@CoreWeave Goal: further AI

انضم Şubat 2024
119 يتبع65 المتابعون
Austin Veselka
Austin Veselka@further_ai·
This is similar to implicit CoT work, which is all pretty cool stuff. But this is a new way to internalize reasoning and a new control mechanism over performance. Check out the paper for details and the ablations that narrow down explanations!
English
1
0
1
105
Austin Veselka
Austin Veselka@further_ai·
I use task arithmetic/model merging to limit degradation (0.25 * SFT) and during evaluation, I noticed that with <cot>, the model doesn't reason out loud, but removing <cot> still degrades performance. Thus, internalized reasoning.
Austin Veselka tweet media
English
1
0
1
66
Austin Veselka
Austin Veselka@further_ai·
I made a paper! arxiv.org/abs/2604.02371 Essentially: Extract evidence from pages and sort the top K to make a reasoning trace. Add in a control token and we can turn it on or off. Internalize with model merging.
Austin Veselka@further_ai

So, I used a control tok - <cot> - and trained the model without the CoT when it is not in the prompt. If you train entirely without the synthetic CoT traces, it performs **very** similarly to not prompting <cot>. So, the model can turn the internalized algorithm on or off? Cool

English
1
4
15
3K
Austin Veselka
Austin Veselka@further_ai·
I also trained two versions of the models, one with the mixed think + non-think examples and one with only non-think examples. I evaluate both versions with and without the <cot> token in the system prompt:
Austin Veselka tweet media
English
1
0
1
50
Austin Veselka
Austin Veselka@further_ai·
When building examples, I make most examples with a control token <cot> in the system prompt. These examples include the reasoning trace. This gives me a switch that affects the model's reasoning or path to the answer.
English
1
0
1
48
Austin Veselka
Austin Veselka@further_ai·
There are two exclusive branches next: - Vision: receives the sorted pages and the question only - Text: sorted, extracted evidence, the question, and context on the input (ties the answer causally to the reasoning reasoning trace). Both branches generate a final answer.
English
1
0
1
43
Austin Veselka
Austin Veselka@further_ai·
This seems to be key to getting good performance. The first version just had each page's evidence from the whole document (some pages as "irrelevant"), but during inference, the model tends to loop on "irrelevant". I believe the model learns v2's RAG-like algorithm.
English
1
0
1
49
Austin Veselka
Austin Veselka@further_ai·
Thus, ground truth pages are ~always marked relevant, with some score flexibility. I filter for scores above a threshold, sort the pages + their evidence from greatest to least and take the top K (16).
English
1
0
1
54
Austin Veselka
Austin Veselka@further_ai·
I take a document and a synthetic question. For each page, I task a VLM with extracting evidence relevant to the question, along with a relevance score in [0.0, 10.0]. If the page was used to generate the question, I prompt the model to score between [6.0, 10.0].
Austin Veselka tweet media
English
1
0
1
80
Austin Veselka
Austin Veselka@further_ai·
I trained Qwen3 VL 32B to a new SOTA on MMLongBenchDoc, 58.3 (leaderboard update coming soon). I also trained Mistral Small 3.1 24B and the method is highly impactful across both models. These models are also very token efficient. Here's a little more detail on how it works:
English
1
0
0
147
Austin Veselka
Austin Veselka@further_ai·
@thsottiaux Code can be 10x as many lines as needed with hasattr()s, .get()s , etc. and it raises errors for everything (assert isinstance(count, int), "3 line message") even when we fully know that count is an int. These waste space with insanely defensive code and it can fallback silently
English
0
0
0
4
Tibo
Tibo@thsottiaux·
What are we consistently getting wrong with codex that you wish we would improve / fix?
English
1.2K
14
871
144.9K
Austin Veselka
Austin Veselka@further_ai·
@oskar_hallstrom Thanks for your help and advice with the project, it went a long ways towards its success
English
0
0
1
12
Austin Veselka
Austin Veselka@further_ai·
Excited to share my work "How to Train Your Long-Context Visual Document Model." (arxiv.org/abs/2602.15257) Research and recipes for training long-context VLMs for document understanding is entirely lacking. In this paper, I explore this frontier with extensive ablations.
English
1
5
20
2.3K
Austin Veselka
Austin Veselka@further_ai·
Check out the paper for the extensive details!
English
1
0
3
71
Austin Veselka
Austin Veselka@further_ai·
For reproducibility and open insights, I am releasing a full leaderboard of my training runs with data recipes included for the community to explore! Please enjoy huggingface.co/spaces/lighton…
English
1
1
4
152