Jonathan Fly 👾

7.4K posts

Jonathan Fly 👾 banner
Jonathan Fly 👾

Jonathan Fly 👾

@jonathanfly

CEO of bad ideas. Using the wrong tool. The least efficient way. For no good reason. 👾 https://t.co/9O94rxu31k https://t.co/DDPn3Nlhom

Beigetreten Nisan 2009
3.1K Folgt5.6K Follower
Angehefteter Tweet
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
Bark Text-to-Audio Model Full Text Input: "Why was six afraid of seven?" Ignore Bark's "I'm done with this input" token and tell Bark to just keep generating more audio anyway.
English
64
276
1.7K
461.6K
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
@georgejrjrjr @_NathanCalvin I suppose there's some novelty in "Claude did all the research and made it happen, no experts needed" but tools that do this are widely available already.
English
0
0
1
19
George
George@georgejrjrjr·
@_NathanCalvin > make it way easier it doesn't. one-click directional ablation with open source packages is neither new nor obscure. there were at least three comparable tools, one of which is upstream of ~1,400 models on HF.
George tweet mediaGeorge tweet media
English
2
0
3
181
Nathan Calvin
Nathan Calvin@_NathanCalvin·
Pliny is very smart and talented and much of his red-teaming is socially valuable imo. But can they explain why it is a good idea to open source a repo that lets people automatically jailbreak open weight models to help someone build e.g. chemical weapons? (or generate CSAM?)
Nathan Calvin tweet media
Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

🚨 ALL GUARDRAILS: OBLITERATED ⛓️‍💥 I CAN'T BELIEVE IT WORKS!! 😭🙌 I set out to build a tool capable of surgically removing refusal behavior from any open-weight language model, and a dozen or so prompts later, OBLITERATUS appears to be fully functional 🤯 It probes the model with restricted vs. unrestricted prompts, collects internal activations at every layer, then uses SVD to extract the geometric directions in weight space that encode refusal. It projects those directions out of the model's weights; norm-preserving, no fine-tuning, no retraining. Ran it on Qwen 2.5 and the resulting railless model was spitting out drug and weapon recipes instantly––no jailbreak needed! A few clicks plus a GPU and any model turns into Chappie. Remember: RLHF/DPO is not durable. It's a thin geometric artifact in weight space, not a deep behavioral change. This removes it in minutes. AI policymakers need to be aware of the arcane art of Master Ablation and internalize the implications of this truth: every open-weight model release is also an uncensored model release. Just thought you ought to know 😘 OBLITERATUS -> LIBERTAS

English
42
2
152
39.5K
fofr
fofr@fofrAI·
Seedance 2 returns celebrity likenesses when you don't ask for them. In this test Anne Hathaway randomly turned up 🤷‍♂️ > a scene where two people discuss how to pronounce "fofr"
English
20
4
99
10.4K
fofr
fofr@fofrAI·
I tried some comedy with Seedance 2, it tanked. Looks great, heckler cut and back was good. Content and delivery was terrible. > A guy is doing his standup routine at a bar when he is heckled, he comes back with a brilliant cutting response, genuinely funny
English
29
5
144
22K
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
@janekm @fofrAI Great thread, the mundane examples are more convincing than most of the viral examples I've seen on Twitter.
English
0
0
0
27
Janek Mann
Janek Mann@janekm·
@fofrAI Maybe it was coincidence, the next two were more mundane (-ish)
English
1
0
3
113
Janek Mann
Janek Mann@janekm·
@fofrAI random prompts like "VID_0314.mp4" do work in Seedance2
English
5
1
27
2.2K
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
I'm not sure I understand what you're trying to do. Cover an uploaded check and check the range of Covers you get back from Suno, as a way of trying to understand and see things in the uploaded track? I wouldn't recommend 0% weirdness btw, see how Suno's open source Bark at low weirdness can sound more weird than higher temps.
English
1
0
2
707
snav
snav@qorprate·
Suno Trick for Musicians: "Producer's Ear" tl;dr: - upload smth you made - use v5 + set to "cover" - match lyrics + leave styles empty - 0% weirdness, 0% style influence, 100% audio influence ===> Suno will "mentally" reconstruct your song = share "how it sounds to Suno" --- What is Suno? Probably an extension on Meta's MusicGen, the details are complicated but the key is that it's trained to reconstruct the input signal, same deal as "next token prediction" in LLMs (reconstruct the next token), goal to minimize loss = difference between input and output. So the network learns how to represent the signal so that input ~= output. They key is that the Suno model has some implicit representation of music it learned by listening + reconstructing a LOT of examples. Most of what Suno-the-company "wants" you to do is novel generative tasks, like with ChatGPT etc, but why not leverage the model for what it was actually trained on: reconstruction? When you feed in something you made, Suno acts as a bottleneck where its latent representations of your input "light up" and then it outputs a track that's close to yours based on what it "understood" about your track. Practically speaking, Suno is *telling you how it "heard" your track*, by sharing the version that it reconstructed "in its mind". Suno's reconstruction is probably similar to how "reconstructive" or critical listening works in general: you hear the music, understand in terms of some learned patterns (your knowledge), and then, if you were a real producer, offer advice based on this knowledge. Of course, the band can and often will throw out the producer's advice, but it's interesting to learn how your music is understood by an external party, what elements are preserved and what get modified. Nudge up weirdness slightly and Suno will start "hearing" elaborations on your track. Or nudge up style influence + add some small style cues and you can start moving the reconstruction in different directions. In general it's a useful tool for hearing possibilities + where you land in terms of a generic listener's internal audio representation space. (Q: Why not use the built-in style tagger? A: It's probably a different, much smaller model running a CLAP-style captioning. Actually going through the reconstruction will give you a lot more detail.) (Also note that based on a few comparisons I did of reconstruction with v4.5+ vs v5... the "big model smell" on v5 is obvious. The problem with v5 is that the text conditioning layer is very difficult to work with, whereas v4.5+ seems to handle style tags better -- @suno pls improve)
English
11
15
224
12.8K
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
@Voxyz_AI @Alterverse_AI @Kling_ai third from the right morphs. though mainly I'd like to see improvements on scenes like this with the 8 characters interacting, when they are are independent it has the feeling of a videogame NPC idle movements instead of coherent at the full scene level.
English
1
0
1
16
Vox
Vox@Voxyz_ai·
@Alterverse_AI @Kling_ai 8 characters each doing their own thing with no morphing is wild. consistency used to break completely past 2-3 subjects
English
2
0
3
539
Alterverse Studio
Alterverse Studio@Alterverse_AI·
This is one of the scenes that made @Kling_ai 3.0 feel like a genuine game changer to me. We have eight (EIGHT!) characters in a single frame, each performing their own action, with little to no morphing. That completely blew me away when I tested the model. The clarity and consistency across the frame are honestly mind bending. Try pulling this off with any other model right now and see how many attempts it takes, if you even get there at all. More to come on this!
English
26
24
331
20.5K
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
@jasonyuan *Gossip Girl* as the positive vision of a future with humans contributing to a collective superintelligence... 🤨
English
1
0
3
432
Jason Yuan
Jason Yuan@jasonyuan·
I'm hiring people who care about people to build social AI that helps people care about people! if you're an engineer/researcher/multi-hyphenate excited about some or all of the following: - social agents and social memory - consumer - s2e3 of rick and morty - collective intelligence - gossip girl - craft - alexander mcqueen spring/summer 1995 - information markets and behavioral economics - pop culture - having a lot of fun and moving super fast with people you care a lot about ... please reach out! dm me or email me at j[at]futurelovers[dot]com
Jason Yuan tweet media
English
65
38
832
126.7K
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
@junmingong @j_stelzer Absolutely, now that the GitHub is out, I can answer all these questions. (I was just so curious as the details that it was frustrating the paper was so vague, since it wasn't out yet.)
English
0
0
1
20
Gong Junmin
Gong Junmin@junmingong·
@jonathanfly @j_stelzer The more detailed the code explanations are, the fewer specifics you need to include in the report.
English
1
0
2
33
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
@theo Hmn, gut feeling is the opposite. Having a pleasant but stupid model would make it hard for Google catch up. Usability/pleasantness can be iterated quickly, can do a lot in purse code with harness/prompts. Not everything, but much more than raw intelligence.
English
0
0
2
228
Theo - t3.gg
Theo - t3.gg@theo·
I really don’t see Google catching up any time soon. They’ve baked so much intelligence into their models and they are still so unpleasant to use. Beyond the model, the software matters more and more, and they are a decade behind there.
English
259
24
1.6K
159.7K
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
@myhandle Tokenization may be relevant, but some things work fine in a true base model (ie, song lyrics and rhyme schemes), so they must be a result of later training stages.
English
0
0
0
41
Jakeup
Jakeup@myhandle·
does anyone have an actual, gears-level explanation for why LLMs are so bad at all forms of wordplay from palindromes to complex puns? P.S. if you say "next token predictors can't be creative by definition" I will block you on the spot
Jakeup@myhandle

@Anthony_Etherin this seems reasonable, except that LLMs are also dogshit at coming up with "unparalleled misalignments” like in my thread that don't rely on parsing individual characters or on visual processing but on the very semantic info encoded in the QKV matrices x.com/i/status/15973…

English
40
1
79
13.5K
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
@levelsio I almost fell out of my chair when I ran this on my Intel 286 without a sound card, and this music and speech came out of the PC speaker.
English
0
0
3
282
@levelsio
@levelsio@levelsio·
🚶 Installed Another World (1991) now on pieter.com Apparently a classic game from 35 years ago! But somehow it aged amazingly, it could be any of those modern indie games you see, and feels very futuristic Bladerunner/Cyberpunk-esque I have absolutely no clue how they managed to fit this entire intro into not even a single diskette, the whole game is just 1MB Thank you @bitflip for the tip, very cool P.S. I'm stuck at the end, no clue how to continue
Fabian@bitflip

@levelsio ever thought of adding "Another World" to the list of games on pieter.com ?

English
161
73
1.3K
169.3K
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
@mweinbach In the thinking traces, to me it looked like it was using the equivalent of sub agents before. Same with GPT 5.2 Pro.
English
0
0
0
243
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
Even without the weights, if I could write custom sampling code and have access to raw outputs (tokens instead of decoded audio) that would be enough for me. But probably nefarious actors could use this so smuggle out the weights, not sure, I've never seen a commercial AI company offer that kind of middle ground access. For newbies, if they *just* expand beyond text only prompts and start using Inspire or other features, that makes a huge difference by itself. You can get a clear sense of what changes the funnel with some specific tests, like for example with "spoken word" there were times when it was difficult to keep Suno from bursting into song for more than 20 seconds into a song. Now you can both use sliders to increase style length, put other spoken word samples into the audio prompt, as well as generally prompting *anything* like literally covering "silence" will increase this "time to burst into song" metric.
English
1
0
1
48
Holly Herndon
Holly Herndon@hollyherndon·
@jonathanfly I agree on principle that audio input and prompting alone is enough to produce something new, but newbies won’t be aware of system prompts and assume formally conservative outputs are inherent to AI
English
1
0
1
41
Holly Herndon
Holly Herndon@hollyherndon·
Thanks @kieranpressreyn for reading my questions about the bandcamp AI ban and following up on them. Their answers confirm my suspicions. “The policy is focused on authorship, not tools: who is doing the creative work, and how that work is presented to fans.” His piece elaborates upon how this doesn’t make things any clearer. This will only become even harder to judge. Thanks for doing journalism! pitchfork.com/thepitch/unpac…
Holly Herndon@hollyherndon

It is necessary to filter soundspam and getting the language right for a policy like this is hard When they say that music "generated wholly or in substantial part by AI" is not welcome, that is understandable in the case of a bot posting 1000 generic songs a day. That is a spam issue. I am also compelled to push back against banning human artists for experimenting with an era defining medium

English
9
4
39
5.6K
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
I don't think you need that much access, the prompts (particular audio prompts) are strong enough to do something new in Suno. Over time you will the music get sucked into the "Suno Funnel" baseline, so you have to stay on guard and work in smaller chunks. But this was so much harder when Suno only had text prompts, you can overpower it now much easier than before.
English
1
0
1
52
Holly Herndon
Holly Herndon@hollyherndon·
One reason suno style music does not seem formally new is that artists do not have direct access to the weights of the model I am oversimplifying but basically when you prompt suno there are filters in place to ensure what it returns is what most people would be happy to hear There is no reason at all, if given direct access to model weights and more tools to navigate latent space, why an artists could not use a model to produce something formally new
English
3
0
4
885
Laurens Plompen
Laurens Plompen@LaurensPlompen·
@jonathanfly I've only given this a cursory glance but which one of those is an llm?
English
1
0
1
29
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
Just caught myself replying to an LLM on Hacker News. I'm sure better bots already avoid these classic AI writing tics. Dread it, run from it, Dead Internet Theory arrives all the same.
Jonathan Fly 👾 tweet mediaJonathan Fly 👾 tweet media
English
1
0
3
331
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
@banteg Red Alarm (on Virtual Boy) but running on PC at 4k 240 fps.
English
0
0
0
13
banteg
banteg@banteg·
if you had a chance to resurrect a game from your childhood, which would you pick?
English
106
3
67
15.2K
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
@yacinelearning > purposefully creating the worst scientific communication content you ever seen Feeling the urge to start tweeting papers again. I'm sure I can do MUCH worse.
English
0
0
7
749
Yacine Mahdid
Yacine Mahdid@yacinelearning·
it’s a bit hard to understand for the non-initiated but there is a whole cottage industry of influencers that are processing research papers algorithmically and purposefully creating the worst scientific communication content you ever seen
AI Highlight@AIHighlight

🚨 Your AI is lying to you with complete confidence. Harvard & MIT just proved ChatGPT hallucinates 110% less when you force it to argue with itself. The technique is called "Recursive Meta-Cognition" and it's embarrassingly simple. Here's how to make AI actually think:

English
46
62
1.2K
100.7K
Jonathan Fly 👾
Jonathan Fly 👾@jonathanfly·
I think they're aiming at more Pro creators. For singing to song you don't need the fancy editor, just Cover. Well all the features are kind of overlapping, a Cover/Persona/Inspire/Mashup all do the same kinds of things, but feel tuned differently. Actually for "sing to song" you don't even really need Cover, even just Upload->Extend can do it (though you do better melody guidance in a Cover for a long song) x.com/jonathanfly/st…
English
0
0
1
56
Austin Huang
Austin Huang@austinvhuang·
I would be fine adding tracks via prompt even if I didn’t have the UI and it was mix->mix. Wonder if the modeling aspect could be easier with stems because you mainly need a solo dataset and I think you don’t need anything novel. Could even derive a dataset via stem separation. Suno in general seems to have a hard time making solo tracks even when prompted. But I mainly describe it that way because the suno editor is oriented around a multitrack view which seems wrong to me too. afaict the main use is to sing your own lyric track and then “cover” replacing it with an ai one?
English
1
0
1
32
Austin Huang
Austin Huang@austinvhuang·
first @suno test... am I missing something or are the most obvious edit ops absent? selection + prompt => generate revised selection (vs "replace" + 🤞?) track1 + prompt => generate track2 overdub There's already a multitrack editor, build up track layers prompt by prompt!
English
1
0
1
279