Magnus

180 posts

Magnus

Magnus

@WasBruba

hallo

Katılım Haziran 2022
176 Takip Edilen6 Takipçiler
Magnus
Magnus@WasBruba·
@BourbonCap Everything AI coming from microsoft has been so incredibly mid, always late and worse than competitors, why would you want them in control?
English
0
0
0
16
Bourbon Capital
Bourbon Capital@BourbonCap·
$MSFT still owns 27% of Openai Unpopular opinion: i want Openai to fail as a business so Microsoft can eat the whole thing. If nothing happens, thats fine as well
Bourbon Capital tweet media
English
11
5
130
12K
Magnus
Magnus@WasBruba·
@mtbomb @thdxr Cloud infrastructure is one of the most profitable, highest margin businesses (AWS, Azure…)
English
1
0
0
17
tower9000
tower9000@mtbomb·
@thdxr This plan only works if the tokens you're selling are not a commodity. If several firms are offering tokens that are interchangeable, there won't be a ton of profit similar to web hosting
English
1
0
1
697
dax
dax@thdxr·
inference is very profitable and probably a good opportunity to understand some basic business math 1. companies buy long lived assets like GPUs. these are one time costs and the asset depreciates over time 2. once you own this asset, you can plug it in and produce tokens which you can sell. the cost of goods sold here can be very low and you might be making 90% margins at scale, this is why we say inference is profitable 3. then you also hire employees to do r&d work to improve your systems, come up with new models, expand the business if you add these 3 up you end up with $0. you're not producing a profit because the business is growing and you're reinvesting it all buying assets or r&d to meet demand if it's obvious to other people the business is working, you can raise money from them to accelerate all these numbers so they max out in 5 years instead of 25 so on paper you'll be "losing money" every year but that's because you want to make sure you lock down the opportunity before someone else the bigger your market is the bigger this burn can be because it's a function of potential so when you see these companies losing a lot of money it doesn't mean the whole concept of their business broken it's possible they misjudge and overinvest on 1+3 and will suffer some consequences but fundamentally 2 does work
dax@thdxr

@d4m1n i'm a bit confused why so many people say api tokens are sold at a loss this isn't true - these models are incredibly expensive compared to the gpu time cost there's potential for 90% margin depending on the model

English
64
69
1.4K
151.1K
Magnus
Magnus@WasBruba·
@GrindstoneSEO Heard of em, but should be pretty easily detectable with a small scraper script? Could also check for topical relevance of the linking content using embeddings/NLP when scraping
English
1
0
0
40
Grind Stone
Grind Stone@GrindstoneSEO·
@WasBruba Yeah, that's the link farm indicator metric. They're not really link farms but scraper farms designed to trap Googlebot on the network but Claude won't listen to me about that part.
English
1
0
0
39
Magnus
Magnus@WasBruba·
@GrindstoneSEO Makes sense yeah - I guess you are running with a dataset of sites spammers link to based on the wording? Probably the most fuzzy part of this but do you also look at linked sites from linking url I'd say thats an indicator
English
1
0
0
44
Grind Stone
Grind Stone@GrindstoneSEO·
@WasBruba No single factor qualifies or disqualifies a link. It's a very complex scoring system that I developed through a LOT of trial and wrror.
English
1
0
0
74
Magnus
Magnus@WasBruba·
@GrindstoneSEO Are you really running with no money keyword anchors? Havent been too engaged with link tactics recently but aint that filter legit links?
English
1
0
1
77
Grind Stone
Grind Stone@GrindstoneSEO·
Who can spot the logic error?
Grind Stone tweet mediaGrind Stone tweet media
English
2
0
2
709
Magnus
Magnus@WasBruba·
@alxfazio You can just put running the linter into your agents.md dont know any llm that fails doing that after edits, what more could you want?
English
0
0
1
22
Magnus
Magnus@WasBruba·
@badlogicgames This feels very notion todo listing - why spend time working on the tool when you could be using - you know - the software it produces lol
English
0
0
0
61
ninthalek
ninthalek@ninthalek·
@thdxr Did anyone try to introduce testing axioms for LLMs? Like: - Testing is the process of executing a program with the intention of finding errors. - A good test case is one that has a high probability of detecting an undiscovered error. - It is impossible to test your own program.
English
1
1
5
2.5K
dax
dax@thdxr·
lmao this is maybe the craziest LLM written test i've ever seen
dax tweet media
English
46
6
827
84K
akira
akira@realmcore_·
@WasBruba No? They will build their own tools
English
1
0
0
15
akira
akira@realmcore_·
All sales people will be technical
English
8
0
13
1.1K
akira
akira@realmcore_·
Doing lit review for this blog post and I really do not want to do it. This is pretty awful. However, I will be including references to some of the most insightful people in the space at each level of the stack if you have anyone in mind, please let me know so I can evaluate
English
3
0
13
1K
Magnus
Magnus@WasBruba·
@BrendanFalk I think skills are mostly working pretty badly, personally landed on <thing>.<type>.md that .<type>. helps a lot to get agents to actually read that stuff ironically - then just properly guide in your main prompt/agents.md
English
0
0
0
113
Brendan Falk
Brendan Falk@BrendanFalk·
Key takeaway from all the comments: Use nested skills. e.g. instead of separate skills for "create PDF" and "parse PDF", have one skill called "manage PDF" which then routes to the relevant sub-skills With good nesting, this can likely scale to 1000+ skills/sub-skills!
Brendan Falk@BrendanFalk

Question for AI engineering community: what is the current best practice for giving a single agent access to a potentially unbounded number of skills? Goals are (in priority order) 1. Maximize skill use accuracy 2. Minimize context use 3. Minimize unnecessary tool calls

English
23
14
380
45.1K
Magnus
Magnus@WasBruba·
@lateinteraction Put dont use libraries in you agents.md or whatever - then let it iterate, I mostly tell codex to work in „slices“ and just chain „read agents.md“ prompts that gets it pretty well to implement stuff properly on a small scale
English
0
0
1
60
Omar Khattab
Omar Khattab@lateinteraction·
I still find it borderline stupid that coding agents seem inclined to use APIs or libraries in complex scripts before tinkering at small scale, as in bottom-up notebooks, to make sure they're modeling these APIs correctly. Who is responsible for this and what are they thinking.
Omar Khattab@lateinteraction

Though bash is a completely valid REPL, the amount of time coding agents lose during experimentation because they iterate on scripts instead of a Jupyter-like in-memory REPL is basically dumb. Fixing 1 local bug should not require restarting the whole job. Need better scaffolds.

English
24
15
231
31.1K
Magnus
Magnus@WasBruba·
@Gana_L_ @thsottiaux Switched to medium its much less prone to overthinking and didnt notice a difference for most task tbh, only using xhigh to plan complex tasks
English
0
0
0
72
Gana
Gana@Gana_L_·
@thsottiaux I'm also seeing insane "draining" when using 5.4 high. It uses like 3x more tokens than 5.3 xodex xhigh consistently I burnt through 50% of weekly usage in plus sub in under 3h While 5.3 xhigh couldn't even use 30% in half a day
English
2
0
32
1.3K
Tibo
Tibo@thsottiaux·
We have found one issue that leads to some users seeing inconsistent usage across sessions but it is quite rare, affecting less than 1% of users. We are working on a mitigation and continuing the investigation. For the rest we are not seeing evidence of higher usage consumption other than the advertised token cost increase of GPT-5.4 being 30% higher than GPT 5.2 and GPT-5.3-Codex.
Tibo@thsottiaux

We are investigating reports of higher usage drain than expected for Codex when WebSockets are enabled, the team is investigating and we will provide updates as we go

English
104
5
439
183.9K
Magnus
Magnus@WasBruba·
@realmcore_ (and not by the claude „production ready trust me bro“ criteria)
English
1
0
1
9
Magnus
Magnus@WasBruba·
@realmcore_ This over decomposituon i mostly just define exit criteria for a task which obviously for something like this are basically endless - its more an experiment than anything else lol - but for things like crud apps and sane stuff you can get the model to call something done
English
1
0
1
9
akira
akira@realmcore_·
I’m not sure RLM is the best way to do it right now… Some thoughts:
English
2
0
5
1.8K
akira
akira@realmcore_·
you thought there was anything better than RLM? Currently writing a blog about this exact topic and why RLM could be peak but also where it breaks
English
2
0
12
454
Magnus
Magnus@WasBruba·
Truly one of a kind @Apple the most basic stuff doesn’t work anymore, is anyone capable left at that company
Magnus tweet mediaMagnus tweet media
English
0
0
0
28
Magnus
Magnus@WasBruba·
Code Analysis Intensifies
Magnus tweet media
Français
0
0
0
14