Bishamon

1.9K posts

Bishamon

@9thbeer

passive llm research enjoyer, model is the product

part of your inner monologue Katılım Ağustos 2022

301 Takip Edilen663 Takipçiler

Bishamon@9thbeer·1h

claude is getting one more philosopher.

Overlap: Business & Tech@Overlap_Tech

Dario Amodei: Mythos Was Better Than Human Engineers at Finding Exploits⁣ ⁣ "Mythos was much better than human software engineers at finding ways that code could be exploited and broken into." — @DarioAmodei

English

Bishamon retweetledi

obsidian capital@obsidiancap1·12h

Good example of @JaredSleeper’s point that “single player software” is the most automatable. Multi player can be very different ballgame

Rishabh Mukherjee@rishabhm

I can share an interesting experience from last week. We have a person who is incharge of buying hardware, software and data sets. This might sound stupid but when you are buying 100s of servers, workstations and laptops a month, it's complicated. This dude used Claude to create an entire tracking and maintanence portal that inventoried everything. He even managed to integrate the portal with our monitoring software to display the status of every server vm. He then modified it to store invoices and so on. He's been at it for a couple of weeks and we've been able to identify wastage and needs. Without Claude, this would have been a maze of spreadsheets and a lot of manual labor. But we wouldn't have hired a developer for this. To me, this kind of software is the killer use case for AI. Enough to simplify your life, but not enough to justify hiring someone or buying a product. Is the code great? Is it scalable? Is it good software engineering? No, no and no. But that's besides the point.

English

3.5K

Bishamon@9thbeer·6h

wish someone creates eng + romanji lyrics cover for all anime songs using AI and uploads on YT.

English

Bishamon retweetledi

Travis Vought@TravisJonVought·1d

@davbelian Visualize in systems.

English

652

Bishamon retweetledi

Ethan Mollick@emollick·2d

I would push back a little: because the models are so good & improving, they don't have to be the product. But it is the model that is the prime mover. If they weren't so generally capable, the harnesses & apps the labs build around them would be hard to build and wouldn't work.

Greg Brockman@gdb

the model alone is no longer the product

English

307

38.3K

Bishamon@9thbeer·1d

@TonyTheMathLion lets make coordinate invariant object, boom we got tensors.

English

144

tony the math lion 🦁@TonyTheMathLion·2d

Mathematicians: let's invent ways to avoid local coordinates Et voila, differential geometry is born

English

4.2K

Bishamon@9thbeer·2d

in information theory, cost of frequent info goes down, same can be applied to network packet unmarshalling, same for REST vs GRPC, LLM prefill, kvcache.

Goshawk Trades@GoshawkTrades

Jane Street's head of technology just explained the full spectrum of how fast their trading decisions are made. the fastest systems turn around a packet in under 100 nanoseconds. at that speed, if you attached an oscilloscope to the wire going in and the wire going out, you'd see the response start to leave before the incoming packet has finished arriving. at that speed, you can't use a CPU. you can't use any programming language. you're on an FPGA direct wired to the network. and the decisions you're making are incredibly simple. because you literally can't compute anything complex in that time. but here's the part most people miss: that's just one end of the spectrum. Jane Street runs an ensemble of systems operating at every timescale simultaneously. some decisions happen in nanoseconds. some in microseconds. some in milliseconds. some take hours or a full day. "the right way to build an optimal trading strategy is an ensemble approach. for some decisions you're making very simple decisions very quickly. for others, you're operating at the scale of microseconds, milliseconds. and in some cases, if you can get that decision turned around in an hour, that's totally fine." the faster you need to respond, the simpler the decision has to be. the slower you can afford to go, the smarter the model can be. this is why "Jane Street is just a speed game" is wrong. speed is one dimension. intelligence is the other.

English

281

Bishamon retweetledi

Goodfire@GoodfireAI·2d

The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)

Goodfire@GoodfireAI

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

English

147

997

154.7K

Bishamon@9thbeer·4d

@TonyTheMathLion me when I skip author's prerequisite chapter on set theory

English

tony the math lion 🦁@TonyTheMathLion·4d

Me opening a math book

English

Bishamon@9thbeer·4d

on policy distillation is final_final_v3 form of post-training.

English

Bishamon@9thbeer·4d

@serialdotai on side note, head of data science are now VP of Applied AI.

English

serial@serialdotai·4d

are there still data scientists out there, or is everyone a machine learning engineer?

English

2.8K

Bishamon@9thbeer·4d

Mythos news be like:

Jake Wintermute 🧬/acc@SynBio1

Doing biotech is so metal

English

108

Bishamon@9thbeer·5d

agents using paths similar to URLs for eg. /goal /plan /goal/subagent; may be early sign on how fast and tiny agents could directly serve web requests.

English

Bishamon retweetledi

wh@nrehiew_·5d

Interestingly, albeit unsurprisingly, normal GRPO does not change the representation of the environment-related tokens which is kinda to be expected given they are usually masked out. ECHO naturally does model the environment better. (world modelling)

English

1.3K

Bishamon retweetledi

Prime Intellect@PrimeIntellect·5d

The next step toward automating AI is automating RL environments Introducing General-Agent: A fully synthetic environment whose task corpus self-evolves and grows harder over time 4,504 tool-use tasks · 1,040 domains · 8,159 unique tools

GIF

English

124

1.3K

284.1K

Bishamon@9thbeer·5d

@vikramskr overall, do all these books have 50% overlapping content?

English

Vikram Sekar@vikramskr·6d

All my EE books And, no I haven’t read all of them.

English

410

17.3K

Bishamon@9thbeer·5d

DevRel role got upgraded with AI.

Wulfie Bain@wulfie_bain_

Hiring in Bengaluru, India 🇮🇳 for my Startups Applied AI team at @OpenAI. Apply if you want to support the incredible startup ecosystem & shape the future of OpenAI. The team I'm building is already full of ex-founder/CTOs, AI PHDs, MLEs, DSs. We work with frontier startups, and closely with Product & Research. The team works hard, but I can genuinely say we love it. So if you’re obsessed with startups, high agency, & deeply technical - and you like the sound of that team - you should apply or reach out.

English

145

Bishamon retweetledi

Rosmine@rosmine·5d

I fixed why LLMs write so poorly, and I have a demo to prove it Announcing Distribution Fine Tuning (DFT): A post training step that fixes LLM writing Model outputs fooled pangram on 100% of test cases

English

122

158

3.2K

442.3K

Bishamon retweetledi

Oxford Mathematics@OxUniMaths·17 May

Mathematics is a universal language. Isn't it? The Tower of Babel - Episode 1

English

4.5K

Bishamon@9thbeer·17 May

@BushnaqLucius @GoodfireAI accepted. layernorm as solution is now downvoted, let me try different candidate for this arithmetic by rotating shapes behaviour.

English

Lucius Bushnaq ⏹️@BushnaqLucius·16 May

@9thbeer @GoodfireAI Yes, but in a D-dimension residual stream it'd be constraining them to the D-1 dimensional surface of a D dimensional hypersphere. That's a very meaningful difference for D=3, but for D=10,000 not so much.

English

Goodfire@GoodfireAI·14 May

Neural networks do math by rotating shapes. We found a shape-rotating calculator hidden inside an LLM – and it’s used for more than just math! (1/6)

Goodfire@GoodfireAI

English

122

556

4.3K

925.5K

Keşfet

@JaredSleeper @davbelian @TonyTheMathLion @serialdotai @vikramskr @elonmusk @BarackObama @taylorswift13