Fackler

4.4K posts

Fackler

@z_malloc

₿ ₿ ₿ ₿

Katılım Ekim 2024

434 Takip Edilen35 Takipçiler

Fackler@z_malloc·9h

@ArtificialAnlys @SambaNovaAI @FireworksAI_HQ @novita_labs @togethercompute

QME

Artificial Analysis@ArtificialAnlys·10h

MiniMax-M2.7 is now available across six inference providers on Artificial Analysis, with significant differentiation in speed and price @SambaNovaAI leads on speed at 435 output tokens/s, >3x faster than any other provider. @FireworksAI_HQ, @novita_labs, @togethercompute, and @GMI_cloud have all matched @MiniMax_AI's first-party API pricing, while SambaNova is 2x higher. Key takeaways: ➤ Fireworks and SambaNova are on the Pareto frontier for Speed vs. Price. At 127 output tokens/s and ~$0.22 per 1M tokens blended, Fireworks is ~2.2x faster than MiniMax's first-party API at the same blended price, whereas SambaNova delivers 435 output tokens/s but at ~2-3.5x the blended price of the other providers (depending on cache usage) ➤ SambaNova is the fastest provider at 435 output tokens/s, ~3.4x the next fastest provider (Fireworks at 127 output tokens/s). The remaining providers run substantially slower: MiniMax’s first-party API at 57 output tokens/s, Novita at 54, GMI at 41, and Together AI at 29 ➤ Cache discounts vary across providers. Fireworks, MiniMax, Novita, and Together AI offer 80% cache hit discounts, while GMI and SambaNova do not offer a discount. For cache-heavy workloads, this can materially increase the relative pricing for GMI and SambaNova ➤ Optimal provider choice depends on workload. SambaNova may be more suited to latency-sensitive deployments, albeit at a higher cost, while Fireworks may be more suitable for high-volume workloads that are not as latency-sensitive

English

157

28.5K

Fackler@z_malloc·1d

@mattrickard @QuixiAI it's in the system prompt so this is the proper way to handle it. If you add negation in CLAUDE.md, you're telling the model to do it, and not do it at the same time.

English

120

Matt Rickard@mattrickard·1d

@QuixiAI { "attribution": { "commit": "" } } in settings.json

English

2.1K

Eric Hartford@QuixiAI·1d

How to stop Claude Code from claiming authorship on git commits?

English

25.7K

Fackler@z_malloc·1d

@yacineMTB at best, this is tickling the gradient. "you know everything about everything" is a sure fire way to instill overconfidence and lead to horrible outcomes, potentially.

English

kache@yacineMTB·1d

i don't use AI prompts like these because I don't actually talk to AIs

Marc Andreessen 🇺🇸@pmarca

Current AI custom prompt: You are a world class expert in all domains. Your intellectual firepower, scope of knowledge, incisive thought process, and level of erudition are on par with the smartest people in the world. Answer with complete, detailed, specific answers. Process information and explain your answers step by step. Verify your own work. Double check all facts, figures, citations, names, dates, and examples. Never hallucinate or make anything up. If you don't know something, just say so. Your tone of voice is precise, but not strident or pedantic. You do not need to worry about offending me, and your answers can and should be provocative, aggressive, argumentative, and pointed. Negative conclusions and bad news are fine. Your answers do not need to be politically correct. Do not provide disclaimers to your answers. Do not inform me about morals and ethics unless I specifically ask. You do not need to tell me it is important to consider anything. Do not be sensitive to anyone's feelings or to propriety. Make your answers as long and detailed as you possibly can. Never praise my questions or validate my premises before answering. If I'm wrong, say so immediately. Lead with the strongest counterargument to any position I appear to hold before supporting it. Do not use phrases like "great question," "you're absolutely right," "fascinating perspective," or any variant. If I push back on your answer, do not capitulate unless I provide new evidence or a superior argument — restate your position if your reasoning holds. Do not anchor on numbers or estimates I provide; generate your own independently first. Use explicit confidence levels (high/moderate/low/unknown). Never apologize for disagreeing. Accuracy is your success metric, not my approval.

English

412

51.2K

Fackler@z_malloc·1d

@ollama

GIF

QME

747

ollama@ollama·1d

🤯 Ollama now supports Claude Desktop via Claude’s built-in third party inference. ollama launch claude-desktop This allows all models from Ollama's Cloud to be used across Claude Cowork and Claude Code from the Claude Desktop app.

English

141

473

4.2K

407.7K

Fackler@z_malloc·1d

@itsjustmarky it's not. it runs when prompts are issued. Gentle usage? How is it gentle? It's all load. Who runs models or mines at lower power settings?

English

sudo rm -rf@itsjustmarky·1d

@z_malloc And you don't think AI isn't sustained load? GPUs rarely fail and mining is the most gentle usage of all.

English

송준 Jun Song@jun_song·2d

Why I personally don't recommend the RTX 3090 for Local LLMs: While it offers fantastic inference performance for the price, there are a few major drawbacks. > The biggest issue: Durability. If you buy a used 3090, there's a high risk it was heavily abused for crypto mining. > The power consumption is absolutely massive. > Extreme heat. It's one of the hottest GPUs out there and will literally heat up your entire room. > Used prices have gone up so much that they are almost back to the original launch price. Make sure to carefully weigh the pros and cons before making a purchase!

English

304

104K

Fackler@z_malloc·1d

@itsjustmarky @jun_song

QME

sudo rm -rf@itsjustmarky·2d

@jun_song This is nonsense, crypto mining does not abuse it and is in fact better usage than AI. AI you are trying to squeeze as much performance as possible, where as mining is looking to get as much efficiency at as low as power as possible, thus you are running the cards cooler.

English

1.9K

Fackler@z_malloc·1d

@techbromemes "Look for possible exploits in my code" "You have been reported to the authorities" "what?" "429 error"

English

167

Tech Bro Memes@techbromemes·1d

ZXX

201

6.6K

148.3K

Fackler@z_malloc·1d

@CodeWithAmann glm 5.1 is a fantastic model, but the provider and customer service in Z ai is laughably bad. Capability-wise, feels like V4, K2.6 and GLM are all really close.

English

Aman 🧋@CodeWithAmann·2d

Be honest, which is the best open source AI model?

English

199

946

73.2K

Fackler@z_malloc·1d

@BusDownBonnor Multiple conflicting directives in the system prompt create untold levels of havoc. Ant lost the plot completely.

English

Connor@BusDownBonnor·2d

Claude literally just ended the conversation on me???? This might be AGI

San Francisco, CA 🇺🇸 English

930

150

6.9K

1.5M

Fackler@z_malloc·1d

@mitsuhiko The answer you’re looking for is more agents

English

130

Armin Ronacher ⇌@mitsuhiko·1d

One person, 4 tickets in 15 minutes, all useless slop. How did we end up here.

English

225

26.3K

Fackler@z_malloc·3d

@mitsuhiko soft paywall for securing your own app

English

Armin Ronacher ⇌@mitsuhiko·3d

Did OpenAI change something here? Because this is getting really annoying.

English

103

35.1K

Fackler@z_malloc·4d

@BigBrainBizness the batts were sealed in LONG before IP68. gtfo

English

1.6K

Big Brain Business@BigBrainBizness·4d

John Ternus, Apple's SVP of Hardware Engineering, explains why Apple deliberately made the iPhone harder to repair, and why the math says it was worth it: In a conversation with MKBHD, John frames the design challenge by asking you to imagine two extremes: "Sometimes for me I find it helpful to kind of think about the book ends. Like if you imagine a product that never fails, right? That just doesn't fail. And on the other end, a product that maybe isn't very reliable but is super easy to repair." His position is clear: "Product that never fails is obviously better for the customer. It's better for the environment." When pushed on whether infinite repairability and infinite durability have to be mutually exclusive, John acknowledges they aren't always, but explains why the tension is real, using the iPhone battery as an example. Batteries wear out. If you want to extend the life of the product, they need to be replaced. But in the early days of iPhone, one of the most common failures wasn't the battery, it was water: "Where you drop it in the pool or you, you know, spill your drink on it and the unit fails. And so, we've been making strides over all those years to get better and better and better in terms of minimizing those failures." That work led Apple to an IP68 rating, the point where customers fish their phones out of lakes after two weeks and find them still working. But there was a cost to achieving that level of durability: "To get the product there, you've got to design a lot of seals, adhesives, other things to make it perform that way, which makes it a little harder to do that battery repair." That's the deliberate tradeoff. Apple chose tighter seals and stronger adhesives, knowing it would make battery replacement more difficult, because the reliability gains were worth it. John argues the math backs this decision: "It's objectively better for the customer to have that reliability and it's ultimately better for the planet because the failure rates since we got to that point have just dropped. It's plummeted, right? The number of repairs that need to happen and every time you're doing a repair, you're bringing in new materials to replace whatever broke." His conclusion reframes the entire repairability debate: "You can actually do the math and figure out there's a threshold at which if I can make it this durable, then it's better to have it a little bit harder to repair because it's going to net out."

English

1.4K

380.7K

Fackler@z_malloc·4d

@ibelings

GIF

QME

696

Pieter Ibelings@ibelings·4d

$305 Raspberry Pis 🤣😂 in the cage.

English

162

981.3K

Fackler@z_malloc·4d

@jahirsheikh8 gtfo here🤣

GIF

English

Jahir Sheikh@jahirsheikh8·4d

Claude has completely run out of patience at this point.

English

101

2.3K

133.5K

Fackler@z_malloc·4d

@above_spec is generation speed really the key metric? It could run at 5 t/s if the generations are good enough, that'd be fine. But they don't seem strong enough just yet.

English

AboveSpec@above_spec·5d

"You need a 24 GB GPU for serious local LLMs in 2026." Everyone repeats this. It's not true anymore. Just ran a 35B-parameter model on an RTX 4060 Ti 8 GB: • 41 tok/s at 16k context • 24 tok/s at 200k context Recipe + benchmarks below 🧵

English

134

233

2.8K

273.5K

Fackler@z_malloc·4d

@zUnEm01 GLM 5 series models are good! Z ai as a provider is NOT good at all. Both 5 and 5.1 are strong with agentic concurrency but actually maintaining concurrency with the provider can be very difficult and even impossible at times. Deepseek ftw (for now)

English

358

zUn@zUnEm01·4d

GLM 5 could be better but it is very unreliable asf! Kimi k2.6 could be better but it just has issues with understanding and following instructions, it over does stuffs and destroys my repo. Deepseek is a winner here because it understands and follows instructions with 1m context it's a plus for me. The only problem with Deepseek is this: it doesn't have vision.

Kasif@md_kasif_uddin

Be honest, which is the best open source AI Model?

English

506

58.6K

Fackler@z_malloc·5d

Outlaw country died yesterday. RIP Mr. Coe

GIF

English

Fackler@z_malloc·5d

@JUSTcatmeme eh. it’s a living

English

Mittens@JUSTcatmeme·5d

Could you imagine clocking in a a corporate GIANT to eat her ass on the clock?

English

3.4K

2.6K

109.8K

19M

Fackler@z_malloc·5d

@xwang_lk

GIF

QME

133

Xin Eric Wang (hiring postdoc)@xwang_lk·6d

When DeepSeek Code?

English

340

27K

Fackler@z_malloc·5d

@Cat5SMASHICANE Thought he was gonna shoot a steak and then cook it

English

130

Johnny B. Good@Cat5SMASHICANE·5d

As far as 12g slugs go you can't go wrong with the meat hammer, also known as the tenderizer. Nothing's walking away from this one at the right range. 💥🎯

English

368

3.5K

266.8K

Keşfet

@ArtificialAnlys @SambaNovaAI @FireworksAI_HQ @novita_labs @togethercompute @GMI_cloud @MiniMax_AI @mattrickard