
PC Screen
3.6K posts


@oManelzin Fala pro professor que vc tá testando a resistência da água com e sem pelos pra medir a diferença
Português

@scaling01 @kittingercloud That's basically the turing test but with extra steps, only way for the model to succeed is to convince others that it's a human
English

@kittingercloud i call it the francois bench
I collected all puzzles from aroud the world but the scoring method is special, it looks like this
s(model) = 0%
s(human) = 100%
English

benchmarks have hit a wall
Lisan al Gaib@scaling01
62.1% on ARC-AGI-3 would be the score if they used the same scoring as ARC-AGI-1/2
English

@chatgpt21 @agenticasdk @spicey_lemonade The actual human baseline is below 30%, as the metric is not just clear rate but also step efficiency squared. So you could have 100% clear rate but if you take 2x as many steps as the second best human run for any given level the score will 25%.
English

@mark_k @Francis87120051 @Eduardopto Portraits don't tell much about a model's quality, and I think the bottleneck for face accuracy is the fact that faces are 3d (meaning they can look different depending on perspective, see example below) but we usually only give models 1 image as reference

English

@IqraSaifiii @LumaLabsAI I think I know why now, the agent in the website has access to Uni-1, Nano Banana Pro and GPT-Image 1.5 for image generation, so unless you specifically select uni-1 it might use the other models
English

Testing the new Uni-1 model it cooked hard🔥
Luma Uni-1 @lumalabsai vs Nano Banana 2
Luma Uni-1 can combine different styles in a single image. In this photo we've included an anime character, a sketched character and a claymation one, all with just a prompt
Prompt : A photo of an everyday scene at a busy cafe serving breakfast. In the foreground is an anime man with blue hair, one of the people is a pencil sketch, another is a claymation person


English

@mark_k @MagusWazir @teortaxesTex No, gpt 1.5 is way behind NBP for complex prompts. Images are NBP (left) and GPT 1.5 (right), read the panels from right to left. NBP gets everything right so any differences you see on the 1.5's side are mistakes


English

@Prescosmarte @supermaro2 @retro_anime It's actually not a kekkei genkai, it's a kinjutsu that he used on himself to create the mouths which let him knead chakra into materials. His kekkei genkai is Explosion release. By combining the 2 he created his explosive clay ninjutsu

English

@supermaro2 @retro_anime Kekei genkai, same as kimimaro, jugo, suigetsu and a lot or other characters that have body changes to create unique jutsus
English

With GPT 5.4 high reasoning I'm seeing 20%-55% accuracy across brainfuck & befunge98 on medium difficulty problems, despite the paper's claim that GPT 5.2 et al get 0% on all languages for medium & above

Lossfunk@lossfunk
4/ We tested GPT-5.2, O4-mini, Gemini 3 Pro, Qwen3-235B, and Kimi K2 across 5 prompting strategies. Models scoring 85-95% on HumanEval scored 0-11% on equivalent esoteric tasks. And every model, every language, every strategy scored 0% beyond the Easy tier. Not 2%. Not 5%. Zero.
English

@SHforlife56 @UndisputedZoro @Buggy When a character has regen, authors feel the need to show it off by having the character take obscene amounts of damage from random attacks even when they are supposed to be strong enough to tank/dodge it

English

@UndisputedZoro @Buggy Or it’s fact that the crew attacks has gotten lethal.
English

#ONEPIECE1177
Just realized Usopp tanked a direct explosion that has destroyed Gunko’s body on multiple occasions… Is this Oda confirming he’s just that resilient? He should be half of a corpse rn without regen😭


English

@max_spero_ @Ahoomanman Nano banana is gemini outputting images autoregressively and we know for sure it outputs in token space (we know the exact number of tokens per image for a given resolution), the official name is literally gemini-3-pro-image



English

@Ahoomanman I haven’t seen any evidence that nano banana is not diffusion
English

I die a little inside every time I see another piss yellow ChatGPT slop image generation.
Doubly so knowing that midjourney and a little taste can get you so much farther!
theseriousadult@gallabytes
midjourney is still the only image generator that comes in color
English

@lefthanddraft When you give them any modified puzzle, reasoning models will spend ages questioning whether the user made a typo (even if you tell it there are no typos), arrive at the right answer multiple times only to then backtrack and output the overfit "classic" answer
English

This seems like a mild case of answer-thrashing
(I also think it demonstrates the GPT-5.4 is not simply "correcting" a typo in my question)
Ryan J. Shaw@RyanJamesShaw
@lefthanddraft Errr…
English

@thefinnmckenty @DavidSHolz @flowersslop No, nano banana pro is literally just gemini generating the images like google themselves have said, it makes no sense for nano banana pro to be based on veo when nano banana outputs in discrete token space
English

@DavidSHolz @AHSEUVOU15 @flowersslop Oh, that’s interesting. I wasn’t aware of that, but that would actually make a lot of sense based on how nbp and veo behave in use (essentially that nbp seems to create a 3d model of the scene with an added temporal dimension).
English

GPT/Nano Banana arent purely diffusion like SDXL or Midjourney, they generate image tokens first (AR) and use diffusion to upscale.
which explains why Midjourney still has weird hands in 2026 and GPT/Nano Banana dont
I dont get the cockiness, you know what Angel meant lol
David@DavidSHolz
@Angaisb_ almost 100 percent of image and video models are still diffusion, you're just confused, sorry!
English

@DavidSHolz @flowersslop but they are upfront about veo being a latent diffusion model (which does not output tokens), meanwhile the official name for nano banana is literally gemini-3-pro-image, "nano banana" is the codename they used on lmarena which stuck around


English

@DavidSHolz @flowersslop The official name for Nano Banana Pro is gemini-3-pro-image and we know for sure it outputs in tokens. Gemini models are capable of native autoregressive image gen as per the gemini 1 paper, at worst it has a final diffusion upscaling step


English

@AHSEUVOU15 @flowersslop this gap can be explained just by them training a gemini-version of t5gemma and then conditioning a diffusion model on that - imho it's not that complicated
English

@DavidSHolz @flowersslop Nano Banana pro vs Qwen Image 2 vs Flux 2 Max on an actually complex prompt. Final image is the prompt. Read the panels from right to left since it's manga. Arena benchmarks mostly test extremely simple prompts, Nano Banana Pro is leagues ahead when you push the models




English

@Jesan1487632 @mijiuxrock18 The death note has a 23 day limit, if you try to specify a death after 23 days it'll kill the person with a heart attack instead after 40 seconds. Only way to get around it is to write the name of a disease without specifying a time of death at all



English

@mijiuxrock18 Yo siempre pensé
Por que light no ponia algo como
Light Yagami
Muere a los 130 años después de completar todos sus objetivos
Español

















