Ivan's Cat

344 posts

Ivan's Cat banner
Ivan's Cat

Ivan's Cat

@IvansCat1

@[email protected]

เข้าร่วม Nisan 2019
2.8K กำลังติดตาม98 ผู้ติดตาม
Ivan's Cat
Ivan's Cat@IvansCat1·
@StefanFSchubert This is just horrific. So sad that Western Europe and the US unconditionally support this movement.
English
1
0
6
1K
Ivan's Cat
Ivan's Cat@IvansCat1·
@kadirnardev What kind of data are you training and testing on? outperforming whisper-large-v3 is quite a big ask for many languages.
English
1
0
0
120
Kadir Nar
Kadir Nar@kadirnardev·
I started training an omni model. Right now I'm working on the ASR feature for stage 1 training. I'm going to use the Qwen3.5-2B model. I want it to outperform Whisper-large-v3 and Qwen3-ASR. I plan to open-source the first versions. And I'm currently experimenting with 2 different omni models. I wish I had more GPUs 😢
Kadir Nar tweet media
English
2
0
44
4.8K
Ivan's Cat
Ivan's Cat@IvansCat1·
@_philschmid Video understanding is the best thing about Gemini! Believe its still the only model that natively analyzes videos in a multimodal way?
English
0
0
3
161
Philipp Schmid
Philipp Schmid@_philschmid·
One of the most underrated features of Gemini is that i can ace minutes/hour of video understanding in seconds! Below is an example of how to analyze Youtube Videos with a single API call using the Gemini Interactions API! Give it a try! You will be surprised how much progress we made.
Philipp Schmid tweet media
English
27
44
532
33.5K
Ivan's Cat
Ivan's Cat@IvansCat1·
@StefanFSchubert Rare case that I disagree with you. Being terminally online makes you an uninteresting person.
English
0
0
0
27
Ivan's Cat
Ivan's Cat@IvansCat1·
@xriskology How should you respond to someone who falsely accuses you of eugenics out of the blue?
English
0
0
1
82
Dr. Émile P. Torres (they/them)
This is a wild overreaction to Emily Bender's response. Read that response, and then look at what some of the EA people are saying. They seem to be "triggered" by anything she writes.
Dylan HadfieldMenell@dhadfieldmenell

@xriskology It is a combination of namecalling, ad hominem, and in-group signaling of disdain/contempt. It’s communicating that these people are beneath our concern and critiques like this are taboo and deserving of ridicule. It doesn’t engage with the substance of the article.

English
6
0
14
9K
Armin Ronacher ⇌
Armin Ronacher ⇌@mitsuhiko·
I do wish there was a way/an app to actually stream the transcription in as it happens, but seemingly nobody does that today. Might need private APIs?
English
16
0
13
4.8K
Armin Ronacher ⇌
Armin Ronacher ⇌@mitsuhiko·
I switched from a quantized Whisper Large v3 Turbo to Parakeet V3 in VoiceInk and the latency is much better.
English
19
8
207
18.5K
Ivan's Cat
Ivan's Cat@IvansCat1·
@christianmiele Aber Berlin ist CDU regiert und München wird von der SPD und den Grünen regiert? Und wird offensichtlich deutlich besser regiert als Berlin.
Deutsch
0
0
0
40
Christian Miele
Christian Miele@christianmiele·
Da hat Nils leider recht. Dieses Anbiedern der CDU in Berlin in Richtung der linken Kräfte hält man immer weniger aus und ich kann’s keinem Founder / VC verübeln da auch persönliche Konsequenzen zu ziehen. Der Braindrain Richtung anderen Städten ist real, zumindest anekdotisch im Bekanntenkreis. Und wer noch nicht gegangen ist, der redet zumindest beim Dinner mit Freunden über die Optionen.
Dr. Nils Heisterhagen@N_Heisterhagen

München und Hamburg werden die großen Gewinner des aktuellen politischen Niedergangs von Berlin sein München wird die neue Start-Up Metropole Ansonsten Paris und London

Deutsch
8
9
131
14.6K
Ivan's Cat รีทวีตแล้ว
Barack Obama
Barack Obama@BarackObama·
The killing of Alex Pretti is a heartbreaking tragedy. It should also be a wake-up call to every American, regardless of party, that many of our core values as a nation are increasingly under assault.
Barack Obama tweet media
English
66K
116.3K
813.1K
43.8M
Ivan's Cat
Ivan's Cat@IvansCat1·
@emollick This may not test the frontier, but non-reproducible research on closed APIs that can change every day is not useful research either. In the NHS case, the sensitivity of the data probably makes it legally difficult to send them to the computer of some dudes in California.
English
2
0
5
294
Ivan's Cat
Ivan's Cat@IvansCat1·
@kavaslug @Shayan86 This could actually be valid, because Google includes invisible watermarks in their generated images, which can be detected by their own chatbots: blog.google/technology/ai/… Does not work if the image was generated by different systems, though.
English
1
0
10
1.3K
Shayan Sardarizadeh
Shayan Sardarizadeh@Shayan86·
Trains were cancelled in Lancaster, UK, after an AI-generated image that seemed to show major damage to a railway bridge was posted on social media following an earthquake. bbc.co.uk/news/articles/…
English
20
293
1.8K
813.7K
Ivan's Cat
Ivan's Cat@IvansCat1·
@dwarkesh_sp Any of the authors of this paper would be great guests: Lake, B. M., Ullman, T. D., Tenenbaum, J. B., and Gershman, S. J. (2017). Building machines that learn and think like people. Behavioral and Brain Sciences, 40, E253. cambridge.org/core/journals/…
English
0
0
0
38
Dwarkesh Patel
Dwarkesh Patel@dwarkesh_sp·
Looking for a neuroscientist to interview on my podcast. Keen for someone who can draw ML analogies for how the brain works (what's the architecture & loss/reward function of different parts, why can we generalize so well, how important is the particular hardware, etc).
English
359
50
1.4K
146.7K
Lazarz
Lazarz@Laz4rz·
so @MistralAI just opened in Zurich and Lausanne lol With Paris, Zurich (ETH), Lausanne (EPFL), Warsaw (UW), they're sucking like 70% of EU talent, only Tubingen missing
Lazarz tweet media
English
38
44
1.3K
91.4K
Ivan's Cat
Ivan's Cat@IvansCat1·
@abeirami I seem to be unable to find the link to the paper? Even a google search for the title did not turn it up. Do you happen do have a link?
English
1
0
0
186
Ahmad Beirami
Ahmad Beirami@abeirami·
If you care about Eval, this cool paper is highly recommended! It goes deep into one of the common sources of bias in LLM-as-a-judge evaluations and gives practical guidance.
Kangwook Lee@Kangwook_Lee

LLM as a judge has become a dominant way to evaluate how good a model is at solving a task, since it works without a test set and handles cases where answers are not unique. But despite how widely this is used, almost all reported results are highly biased. Excited to share our preprint on how to properly use LLM as a judge. 🧵 === So how do people actually use LLM as a judge? Most people just use the LLM as an evaluator and report the empirical probability that the LLM says the answer looks correct. When the LLM is perfect, this works fine and gives an unbiased estimator. If the LLM is not perfect, this breaks. Consider a case where the LLM evaluates correctly 80 percent of the time. More specifically, if the answer is correct, the LLM says "this looks correct" with 80 percent probability, and the same 80 percent applies when the answer is actually incorrect. In this situation, you should not report the empirical probability, because it is biased. Why? Let the true probability of the tested model being correct be p. Then the empirical probability that the LLM says "correct" (= q) is q = 0.8p + 0.2(1 - p) = 0.2 + 0.6p So the unbiased estimate should be (q - 0.2) / 0.6 Things get even more interesting if the error pattern is asymmetric or if you do not know these error rates a priori. === So what does this mean? First, follow the suggested guideline in our preprint. There is no free lunch. You cannot evaluate how good your model is unless your LLM as a judge is known to be perfect at judging it. Depending on how close it is to a perfect evaluator, you need a sufficient size of test set (= calibration set) to estimate the evaluator’s error rates, and then you must correct for them. Second, very unfortunately, many findings we have seen in papers over the past few years need to be revisited. Unless two papers used the exact same LLM as a judge, comparing results across them could have produced false claims. The improvement could simply come from changing the evaluation pipeline slightly. A rigorous meta study is urgently needed. === tldr: (1) Almost all LLM-as-a-judge evaluations in the past few years were reported with a biased estimator. (2) It is easy to fix, so wait for our full preprint. (3) Many LLM-as-a-judge results should be taken with grains of salt. Full preprint coming in a few days, so stay tuned! Amazing work by my students and collaborators. @chungpa_lee @tomzeng200 @jongwonjeong123 and @jysohn1108

English
4
12
164
21.7K
Ivan's Cat
Ivan's Cat@IvansCat1·
@giffmana @burkov There's a cool paper by someone who works at a company called Anthropic that suggests an idea which would make this plot completely fine: Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations arxiv.org/pdf/2411.00640?
English
0
0
0
132
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
@burkov They do clearly break the axis though. Should have additionally broken the bars, but other than that, this is an ok graph in my book.
English
6
0
88
4.2K
BURKOV
BURKOV@burkov·
How to lie with charts? Anthropic knows how. I was actually surprised that they started the bars at 70%. They should have started at 74.5%. Indeed, "Lies, damned lies, and statistics."
BURKOV tweet media
English
56
9
215
23.9K
Ivan's Cat
Ivan's Cat@IvansCat1·
@simonw For viewers only the here's-something-I-prepared-earlier approach is interesting I believe.
English
0
0
0
5
Simon Willison
Simon Willison@simonw·
YouTube question: I've been making a few videos recently and I'm torn between the honest no-cheating live coding approach and the here's-something-I-prepared-earlier approach I'm aiming for 10-30 minutes per video Which format do people find more useful?
English
147
3
334
42.1K
Ivan's Cat
Ivan's Cat@IvansCat1·
@sea_snell They have errors bars on 2 out of 5 plots in the launch blog. I assume only some of their teams use them? Bit of a weird plot mix in their "publications".
English
0
0
1
434
Ivan's Cat
Ivan's Cat@IvansCat1·
@DKokotajlo This is basically pseudoscientific conjecture, like astrology. It is not too late to switch your energy to something more substantial. Anything really.
English
1
0
4
213
Daniel Kokotajlo
Daniel Kokotajlo@DKokotajlo·
Some people are unhappy with the AI 2027 title and our AI timelines. Let me quickly clarify: We’re not confident that: 1. AGI will happen in exactly 2027 (2027 is one of the most likely specific years though!) 2. It will take <1 yr to get from AGI to ASI 3. AGIs will definitely be misaligned We’re confident that: 1. AGI and ASI will eventually be built and might be built soon 2. ASI will be wildly transformative 3. We’re not ready for AGI and should be taking this whole situation way more seriously 🧵 with more details
English
118
97
1.1K
197.9K
Ivan's Cat
Ivan's Cat@IvansCat1·
@chrisalbon If you instruct it clearly, it is pretty helpful when making plots with seaborn and the likes. But it fails horribly for most kinds of complex analyses and sometimes even simple data transformations.
English
0
0
0
90
Chris Albon
Chris Albon@chrisalbon·
Agentic coding always, ALWAYS fails with data analyses. I think there is a level of precision (edge cases, context, etc) that you can't vibe.
English
20
3
83
9.7K
Ivan's Cat
Ivan's Cat@IvansCat1·
@AravSrinivas @comet This behaviour is awful. Why are you celebrating it? I hope every website puts you on a very extensive blocklist for this stuff
English
0
0
0
64