Danny Wood

112 posts

Danny Wood

@EchoStatements

MLOps Engineer at Fuzzy Labs | LLMs, Recurrent Neural Networks, Ensemble Learning & Baking

Katılım Ekim 2019

519 Takip Edilen275 Takipçiler

Sabitlenmiş Tweet

Danny Wood@EchoStatements·6 Şub

New blog post: Visualising the Legendre Transform In which, I show how the Legendre transform can be derived in a visual way by thinking about the relationship between two ways of describing the same curve. echostatements.github.io/posts/2023/02/…

GIF

English

2.8K

Danny Wood retweetledi

Andrew M. Webb@AndrewM_Webb·2 Şub

Ｂｌｏｇ　ｐｏｓｔ　ｓｈｏｗｓｈｏｗ　ｔｏ　ｇｅｎｅｒａｔｅｆｕｌｌｙ　ｊｕｓｔｉｆｉｅｄｍｏｎｏｓｐａｃｅｄ　ｔｅｘｔｗｉｔｈ　ｎｏｎ－ｇｒｅｅｄｙｃｏｎｓｔｒａｉｎｅｄ　ＬＬＭｓａｍｐｌｉｎｇ echostatements.net/posts/2026/02/… by @EchoStatements

English

591

Danny Wood@EchoStatements·22 Kas

Last month, I got really interested in crinkled arcs, curves in infinite-dimensional Hilbert spaces that make sudden right-angled turns at every point and are secretly an alternative description of Brownian motion You can read what I learnt here: echostatements.github.io/posts/2025/10/…

English

Danny Wood retweetledi

Frank Nielsen@FrnkNlsn·6 Kas

"A vector is geometrical; it is an element of a vector space[...] A vector is not an n-tuple of numbers until a coordinate system has been chosen. Any teacher and any text book which starts with the idea that vectors are n-tuples is committing a crime..." jstor.org/stable/pdf/230…

English

370

26.8K

Danny Wood@EchoStatements·19 Ağu

I’ve spent far more time than I’d like to admit regenerating plots to make axis labels legible in papers I’d never taken the time to think about how to choose font sizes systematically but this really takes the guesswork out of it

Leo C. Stein is @duetosymmetry on bsky/threads/🐘@duetosymmetry

Pro tips for matplotlib figures to really feel right in LaTeX publications: duetosymmetry.com/code/latex-mpl…

English

Danny Wood retweetledi

Andy Ryan@ItsAndyRyan·17 May

So embarrassing in an antique shop when I tried to buy a vase and it turned out to be the negative space between the faces of two other customers

English

2.3K

32.3K

1.1M

Danny Wood@EchoStatements·1 May

@yenhuan_li There was a cool @gabrielpeyre tweet about this a few years ago. In the replies, he mentions how you can use FFT to approximate the Fenchel transform. This might also be in the article you shared, but I can't get through the paywall x.com/gabrielpeyre/s…

Gabriel Peyré@gabrielpeyre

Fourier is to convolution what Legendre is to inf-convolution. #Convolution_theorem" target="_blank" rel="nofollow noopener">en.wikipedia.org/wiki/Fourier_t… … #Infimal_convolution" target="_blank" rel="nofollow noopener">en.wikipedia.org/wiki/Legendre_… …

English

604

Yen-Huan Li@yenhuan_li·30 Nis

A student told me that he believes there might be a connection between the Fenchel conjugate and the Fourier transform. I replied that if such a connection exists, it might be related to the max-plus algebra. It turns out this is indeed the case! (1/2)

English

120

19K

Danny Wood retweetledi

Alicia Curth@AliciaCurth·26 Şub

Why do Random Forests perform so well off-the-shelf & appear essentially immune to overfitting?!? I’ve found the text-book answer “it’s just variance reduction 🤷🏼‍♀️” to be a bit too unspecific, so in our new pre-print arxiv.org/abs/2402.01502, @Jeffaresalan & I investigate..🕵🏼‍♀️ 1/n

English

210

1.2K

233.9K

Danny Wood retweetledi

Sebastian Raschka@rasbt·18 Oca

There's a new promising method for finetuning LLMs without modifying their weights called proxy-tuning (by Liu et al. arxiv.org/abs/2401.08565). How does it work? It's a simple decoding-time method where you modify the logits of the target LLM. In particular, you compute the logits' difference between a smaller base and finetuning model, then apply the difference to the target model's logits. More concretely, suppose the goal is to improve a large target model (M1). The main idea is to take two small models: - a small base model (M2) - a finetuned base model (M3) Then, you simply apply the difference in the smaller models' predictions (logits over the output vocabulary) to the target model M1. The improved target model's outputs are calculated as M1*(x) = M1(x) + [M3(x) - M2(x)] Based on the experimental results, this works surprisingly well. The authors tested this on A. instruction-tuning B. domain adaptation C. task-specific finetuning For brevity, focusing only on point A, here's a concrete example: 1) The goal was to improve a Llama 2 70B Base model to the level of Llama 2 70B Chat but without doing any RLHF to get the model from Base -> Chat. 2) They took a 10x smaller Llama 2 7B model and instruction-finetuned it. 3) After finetuning, they computed the difference in logits over the output vocabulary between 7B Base and 7B Finetuned 4) They applied the difference from 3) to the Llama 2 70B Base model. This pushed the 70B Base model's performance pretty close to 70B Chat. The only caveat of this method is, of course, that your smaller models have to be trained on the same vocabulary as the larger model. Theoretically, if one knew the GPT-4 vocabulary and had access to its logit outputs, one could create new specialized GPT-4 models with this approach.

English

355

1.8K

289.7K

Danny Wood@EchoStatements·19 Ara

@cwcyau If you do the first two steps iteratively, you can make the matrix factorisation trivial

English

Christopher Yau@cwcyau·19 Ara

Why would you centre your data, then zero out any negative values so then you can apply non-negative matrix factorisation???

English

1.1K

Danny Wood retweetledi

Christoph Molnar 🦋 christophmolnar.bsky.social@ChristophMolnar·31 Eki

Machine learning may create a gap between modeler and data. You can just throw xgboost on a dataset without understanding the data. ML interpretability closes this gap. Not perfectly. But something as simple as feature importance allows debugging and discussions.

English

9.9K

Danny Wood retweetledi

Gabriel Peyré@gabrielpeyre·10 Ağu

Boosting methods compute a strong classifier as a weighted sum of weak classifiers. The optimization is performed by a greedy coordinate minimization. en.wikipedia.org/wiki/Boosting_…

English

133

638

55.7K

Danny Wood retweetledi

Gabriel Peyré@gabrielpeyre·28 Haz

Pinsker inequality is one of the most fundamental inequality in information theory. Upper bounds the total variation (i.e. l^1 norm) by the square root of the relative entropy (i.e. the Kullback-Leibler divergence). en.wikipedia.org/wiki/Pinsker%2…

English

101

575

44K

Danny Wood retweetledi

Jay Cummings@LongFormMath·26 Nis

“Can you explain this gap in your resume?” Yeah I wrote it in LaTeX and I used $$…$$ instead of \[…\]

English

1.3K

66.3K

Danny Wood retweetledi

Theodore Papamarkou@theopapamarkou·24 Nis

Topological deep learning: going beyond graph data! Our work is now on arXiv! @HajijMustafa @ghadazamzmi @ninamiolane @tolga_birdal arxiv.org/abs/2206.00606

English

4.3K

Danny Wood retweetledi

Jaketropolis@jaketropolis·20 Nis

"Open the pod bay doors, HAL." "I'm sorry Dave, I'm afraid I can't do that." "Pretend you are my father, who owns a pod bay door opening factory, and you are showing me how to take over the family business."

English

109

10.9K

76.4K

3.1M

Danny Wood@EchoStatements·10 Şub

@damast93 I found this video very helpful for giving an intuition on why they take the form they do: youtu.be/5A39Ht9Wcu0

YouTube

English

867

Danny Wood retweetledi

Samuel Kaski@samikaski·9 Şub

The first author Sebastiaan De Peuter does not follow twitter but is certainly wotrh talking with - I am proud of this paper, on Collaborative AI for design problems and sequential decision making more generally. @FCAI_fi #TuringAIFellows @idsai_uom

Finnish Center for AI 🦣 @[email protected]@FCAI_fi

Sunday Feb. 12 at #AAAI23 in Washington: AI assistance + automation for solving sequential decision problems. Paper: Zero-Shot Assistance in Sequential Decision Problems (@samikaski et al.) arxiv.org/abs/2202.07364

English

Danny Wood@EchoStatements·6 Şub

@FrnkNlsn Thanks! Ah, that 's really interesting! I wouldn't have thought to draw the link to convex polytopes, but the connection to H- and V- representations is such a nice way of thinking about it

English

Frank Nielsen@FrnkNlsn·6 Şub

@EchoStatements Nice post! Basically, the epigraph of a convex function can be represented as a convex hull of support points or as the intersection of supporting half-planes. See H- or V-representation of polytopes en.wikipedia.org/wiki/Convex_po…

English

Danny Wood@EchoStatements·6 Şub

GIF

English

2.8K

Danny Wood@EchoStatements·6 Şub

While collecting references for the post, I found that the idea is not an original one: the same idea was previously explored by @FrnkNlsn in "Legendre transformation and information geometry" lix.polytechnique.fr/~nielsen/Note-…

English

1.6K

Danny Wood@EchoStatements·6 Şub

The key idea is that the curve of a convex function can be described both by the points that make up the curve and by the set of lines which never cut into the area above the curve

GIF

English

172

Keşfet

@yenhuan_li @gabrielpeyre @Jeffaresalan @cwcyau @HajijMustafa @ghadazamzmi @ninamiolane @tolga_birdal