Fackler

4.4K posts

Fackler banner
Fackler

Fackler

@z_malloc

₿ ₿ ₿ ₿

Katılım Ekim 2024
434 Takip Edilen35 Takipçiler
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
MiniMax-M2.7 is now available across six inference providers on Artificial Analysis, with significant differentiation in speed and price @SambaNovaAI leads on speed at 435 output tokens/s, >3x faster than any other provider. @FireworksAI_HQ, @novita_labs, @togethercompute, and @GMI_cloud have all matched @MiniMax_AI's first-party API pricing, while SambaNova is 2x higher. Key takeaways: ➤ Fireworks and SambaNova are on the Pareto frontier for Speed vs. Price. At 127 output tokens/s and ~$0.22 per 1M tokens blended, Fireworks is ~2.2x faster than MiniMax's first-party API at the same blended price, whereas SambaNova delivers 435 output tokens/s but at ~2-3.5x the blended price of the other providers (depending on cache usage) ➤ SambaNova is the fastest provider at 435 output tokens/s, ~3.4x the next fastest provider (Fireworks at 127 output tokens/s). The remaining providers run substantially slower: MiniMax’s first-party API at 57 output tokens/s, Novita at 54, GMI at 41, and Together AI at 29 ➤ Cache discounts vary across providers. Fireworks, MiniMax, Novita, and Together AI offer 80% cache hit discounts, while GMI and SambaNova do not offer a discount. For cache-heavy workloads, this can materially increase the relative pricing for GMI and SambaNova ➤ Optimal provider choice depends on workload. SambaNova may be more suited to latency-sensitive deployments, albeit at a higher cost, while Fireworks may be more suitable for high-volume workloads that are not as latency-sensitive
Artificial Analysis tweet media
English
8
15
157
28.5K
Fackler
Fackler@z_malloc·
@mattrickard @QuixiAI it's in the system prompt so this is the proper way to handle it. If you add negation in CLAUDE.md, you're telling the model to do it, and not do it at the same time.
English
0
0
0
120
Matt Rickard
Matt Rickard@mattrickard·
@QuixiAI { "attribution": { "commit": "" } } in settings.json
English
3
1
43
2.1K
Eric Hartford
Eric Hartford@QuixiAI·
How to stop Claude Code from claiming authorship on git commits?
English
60
1
71
25.7K
Fackler
Fackler@z_malloc·
@yacineMTB at best, this is tickling the gradient. "you know everything about everything" is a sure fire way to instill overconfidence and lead to horrible outcomes, potentially.
English
0
0
0
98
kache
kache@yacineMTB·
i don't use AI prompts like these because I don't actually talk to AIs
Marc Andreessen 🇺🇸@pmarca

Current AI custom prompt: You are a world class expert in all domains. Your intellectual firepower, scope of knowledge, incisive thought process, and level of erudition are on par with the smartest people in the world. Answer with complete, detailed, specific answers. Process information and explain your answers step by step. Verify your own work. Double check all facts, figures, citations, names, dates, and examples. Never hallucinate or make anything up. If you don't know something, just say so. Your tone of voice is precise, but not strident or pedantic. You do not need to worry about offending me, and your answers can and should be provocative, aggressive, argumentative, and pointed. Negative conclusions and bad news are fine. Your answers do not need to be politically correct. Do not provide disclaimers to your answers. Do not inform me about morals and ethics unless I specifically ask. You do not need to tell me it is important to consider anything. Do not be sensitive to anyone's feelings or to propriety. Make your answers as long and detailed as you possibly can. Never praise my questions or validate my premises before answering. If I'm wrong, say so immediately. Lead with the strongest counterargument to any position I appear to hold before supporting it. Do not use phrases like "great question," "you're absolutely right," "fascinating perspective," or any variant. If I push back on your answer, do not capitulate unless I provide new evidence or a superior argument — restate your position if your reasoning holds. Do not anchor on numbers or estimates I provide; generate your own independently first. Use explicit confidence levels (high/moderate/low/unknown). Never apologize for disagreeing. Accuracy is your success metric, not my approval.

English
53
6
412
51.2K
ollama
ollama@ollama·
🤯 Ollama now supports Claude Desktop via Claude’s built-in third party inference. ollama launch claude-desktop This allows all models from Ollama's Cloud to be used across Claude Cowork and Claude Code from the Claude Desktop app.
ollama tweet media
English
141
473
4.2K
407.7K
Fackler
Fackler@z_malloc·
@itsjustmarky it's not. it runs when prompts are issued. Gentle usage? How is it gentle? It's all load. Who runs models or mines at lower power settings?
English
1
0
0
4
sudo rm -rf
sudo rm -rf@itsjustmarky·
@z_malloc And you don't think AI isn't sustained load? GPUs rarely fail and mining is the most gentle usage of all.
English
1
0
0
11
송준 Jun Song
송준 Jun Song@jun_song·
Why I personally don't recommend the RTX 3090 for Local LLMs: While it offers fantastic inference performance for the price, there are a few major drawbacks. > The biggest issue: Durability. If you buy a used 3090, there's a high risk it was heavily abused for crypto mining. > The power consumption is absolutely massive. > Extreme heat. It's one of the hottest GPUs out there and will literally heat up your entire room. > Used prices have gone up so much that they are almost back to the original launch price. Make sure to carefully weigh the pros and cons before making a purchase!
송준 Jun Song tweet media
English
78
17
304
104K
sudo rm -rf
sudo rm -rf@itsjustmarky·
@jun_song This is nonsense, crypto mining does not abuse it and is in fact better usage than AI. AI you are trying to squeeze as much performance as possible, where as mining is looking to get as much efficiency at as low as power as possible, thus you are running the cards cooler.
English
3
0
19
1.9K
Fackler
Fackler@z_malloc·
@techbromemes "Look for possible exploits in my code" "You have been reported to the authorities" "what?" "429 error"
English
0
0
0
167
Fackler
Fackler@z_malloc·
@CodeWithAmann glm 5.1 is a fantastic model, but the provider and customer service in Z ai is laughably bad. Capability-wise, feels like V4, K2.6 and GLM are all really close.
English
0
0
0
72
Aman 🧋
Aman 🧋@CodeWithAmann·
Be honest, which is the best open source AI model?
Aman 🧋 tweet media
English
199
46
946
73.2K
Fackler
Fackler@z_malloc·
@BusDownBonnor Multiple conflicting directives in the system prompt create untold levels of havoc. Ant lost the plot completely.
English
0
0
0
15
Connor
Connor@BusDownBonnor·
Claude literally just ended the conversation on me???? This might be AGI
Connor tweet media
San Francisco, CA 🇺🇸 English
930
150
6.9K
1.5M
Fackler
Fackler@z_malloc·
@mitsuhiko The answer you’re looking for is more agents
English
0
0
0
130
Armin Ronacher ⇌
Armin Ronacher ⇌@mitsuhiko·
One person, 4 tickets in 15 minutes, all useless slop. How did we end up here.
English
21
7
225
26.3K
Armin Ronacher ⇌
Armin Ronacher ⇌@mitsuhiko·
Did OpenAI change something here? Because this is getting really annoying.
Armin Ronacher ⇌ tweet media
English
19
0
103
35.1K
Big Brain Business
Big Brain Business@BigBrainBizness·
John Ternus, Apple's SVP of Hardware Engineering, explains why Apple deliberately made the iPhone harder to repair, and why the math says it was worth it: In a conversation with MKBHD, John frames the design challenge by asking you to imagine two extremes: "Sometimes for me I find it helpful to kind of think about the book ends. Like if you imagine a product that never fails, right? That just doesn't fail. And on the other end, a product that maybe isn't very reliable but is super easy to repair." His position is clear: "Product that never fails is obviously better for the customer. It's better for the environment." When pushed on whether infinite repairability and infinite durability have to be mutually exclusive, John acknowledges they aren't always, but explains why the tension is real, using the iPhone battery as an example. Batteries wear out. If you want to extend the life of the product, they need to be replaced. But in the early days of iPhone, one of the most common failures wasn't the battery, it was water: "Where you drop it in the pool or you, you know, spill your drink on it and the unit fails. And so, we've been making strides over all those years to get better and better and better in terms of minimizing those failures." That work led Apple to an IP68 rating, the point where customers fish their phones out of lakes after two weeks and find them still working. But there was a cost to achieving that level of durability: "To get the product there, you've got to design a lot of seals, adhesives, other things to make it perform that way, which makes it a little harder to do that battery repair." That's the deliberate tradeoff. Apple chose tighter seals and stronger adhesives, knowing it would make battery replacement more difficult, because the reliability gains were worth it. John argues the math backs this decision: "It's objectively better for the customer to have that reliability and it's ultimately better for the planet because the failure rates since we got to that point have just dropped. It's plummeted, right? The number of repairs that need to happen and every time you're doing a repair, you're bringing in new materials to replace whatever broke." His conclusion reframes the entire repairability debate: "You can actually do the math and figure out there's a threshold at which if I can make it this durable, then it's better to have it a little bit harder to repair because it's going to net out."
English
56
57
1.4K
380.7K
Pieter Ibelings
Pieter Ibelings@ibelings·
$305 Raspberry Pis 🤣😂 in the cage.
Pieter Ibelings tweet media
English
28
3
162
981.3K
Jahir Sheikh
Jahir Sheikh@jahirsheikh8·
Claude has completely run out of patience at this point.
Jahir Sheikh tweet media
English
101
54
2.3K
133.5K
Fackler
Fackler@z_malloc·
@above_spec is generation speed really the key metric? It could run at 5 t/s if the generations are good enough, that'd be fine. But they don't seem strong enough just yet.
English
0
0
1
33
AboveSpec
AboveSpec@above_spec·
"You need a 24 GB GPU for serious local LLMs in 2026." Everyone repeats this. It's not true anymore. Just ran a 35B-parameter model on an RTX 4060 Ti 8 GB: • 41 tok/s at 16k context • 24 tok/s at 200k context Recipe + benchmarks below 🧵
AboveSpec tweet media
English
134
233
2.8K
273.5K
Fackler
Fackler@z_malloc·
@zUnEm01 GLM 5 series models are good! Z ai as a provider is NOT good at all. Both 5 and 5.1 are strong with agentic concurrency but actually maintaining concurrency with the provider can be very difficult and even impossible at times. Deepseek ftw (for now)
English
0
0
0
358
zUn
zUn@zUnEm01·
GLM 5 could be better but it is very unreliable asf! Kimi k2.6 could be better but it just has issues with understanding and following instructions, it over does stuffs and destroys my repo. Deepseek is a winner here because it understands and follows instructions with 1m context it's a plus for me. The only problem with Deepseek is this: it doesn't have vision.
Kasif@md_kasif_uddin

Be honest, which is the best open source AI Model?

English
43
15
506
58.6K
Fackler
Fackler@z_malloc·
Outlaw country died yesterday. RIP Mr. Coe
GIF
English
0
0
0
6
Mittens
Mittens@JUSTcatmeme·
Could you imagine clocking in a a corporate GIANT to eat her ass on the clock?
Mittens tweet media
English
3.4K
2.6K
109.8K
19M
Johnny B. Good
Johnny B. Good@Cat5SMASHICANE·
As far as 12g slugs go you can't go wrong with the meat hammer, also known as the tenderizer. Nothing's walking away from this one at the right range. 💥🎯
English
81
368
3.5K
266.8K