Casper

1.2K posts

Casper

@CasperWeb3

Присоединился Ocak 2022

278 Подписки1.7K Подписчики

Casper@CasperWeb3·26 Ara

@abacaj The problem is that most of the Chinese models are inherently not open and free because they use very restrictive licenses.

English

302

anton@abacaj·26 Ara

The LLMs coming out of china performing really well on benchmarks but I don’t hear much on how they actually perform on real tasks or if anyone outside china is actually using them

English

177

33.1K

Casper@CasperWeb3·24 Ara

@ylecun Now we just need to properly open Llama 3. Make it Apache 2.0 or MIT license and you can finally call it truly open source. Follow Mistral, they set a great example.

English

Yann LeCun@ylecun·23 Ara

Open source AI foundation models will wipe out closed and proprietary AI models for the same reason Wikipedia wiped out generalist commercial encyclopedia: crowd-sourced human contributions to open platforms can cater to a high diversity of interests, cultures, and languages.

English

226

553

3.8K

564.9K

Casper@CasperWeb3·21 Ara

@aton2006 @teknium @markatgradient @togethercompute There is a gap in knowledge in general about how quantized models work. The reason most quantized models are slower is because of the compute overhead of high batch size + dequantization.

English

Anton McGonnell@aton2006·21 Ara

@teknium @markatgradient @togethercompute I don’t understand why this would be the case. Throughput should be a function of number in the batch by number of sockets, latency should be a function of number of sockets for a batch of 1. Quantization should not impact this.

English

278

Teknium (e/λ)@Teknium·20 Ara

.@togethercompute's API is crazy fast

English

222

43.7K

Casper@CasperWeb3·17 Ara

@WifeyAlpha @ChiefQuant Been trying since summer 2022, maybe one day 🤞

English

Casper@CasperWeb3·16 Ara

This is incredible competition from AMD. They now beat Nvidia in both speed and price for inference. They just need to catch up with robust software - with time, it looks like they will beat Nvidia in the performance/cost trade-off. community.amd.com/t5/instinct-ac…

English

201

Casper@CasperWeb3·10 Ara

@BlancheMinerva Open source is nothing without open and free weights. Last time I checked, all the great Chinese models are gated by custom licenses and applications to even use them. Doesn’t quite meet my definition of open or free.

English

Stella Biderman@BlancheMinerva·9 Ara

China has larger open source models than LLaMA as well as larger closed source models than any open source model. This isn't a real policy position, it's just jingoism.

AI Notkilleveryoneism Memes ⏸️@AISafetyMemes

“We can’t slow down or lose to China.” If you believe this, you should be anti-open source AI. The race is largely between US companies. Thankfully for China, Meta gives them every model for free, so they’re never far behind. Yes, incredibly, China doesn’t even have to steal our technology like usual. Without open source, they’d be much further behind. And North Korea wouldn’t be using AI for cyberwarfare. Open source means irreversible proliferation. And it shortens already-dangerously-short AGI timelines. We’re racing against ourselves. With open source, the US graciously keeps all of our rivals nipping at our heels, never extending our lead.

English

7.2K

Casper@CasperWeb3·8 Ara

@far__el Someone uploaded it and split the weights huggingface.co/someone13574/m…

English

103

Far El@far__el·8 Ara

x.com/i/spaces/1yNxa…

ZXX

4.5K

Casper@CasperWeb3·28 Kas

@KyeGomezB They forked their own code and improved it, nice. They trained better models too, double nice.

English

137

Kye Gomez (swarms)@KyeGomezB·27 Kas

Mistral's team litterally just forked llama changed less than 40 lines of code and raised 120+M. Investors what are you doing 😢 this is horrendous

English

1.1K

846.4K

Casper@CasperWeb3·26 Kas

@lukebelmar Tether is the last one to go. Until then, crypto is a ticking time bomb.

English

Luke Belmar 👽@lukebelmar·24 Kas

My problems isn’t Bitcoin. My problems is USDT inflating the price of Bitcoin. What happens if Tether gets smacked the same way Binance did? Does BTC hold? I’m skeptical.

English

107

103

1.8K

143.6K

Casper@CasperWeb3·23 Kas

@E0M The authors of this post used GPT-4 to evaluate if a certain phrase was in the text. This does not have any resemblance of a real evaluation and certainly not something I would expect OpenAI to be using

English

175

Evan Morikawa@E0M·22 Kas

We theoretically could provide any context length we wanted, but effectively using it is a whole different story

stv.eth@SteveMoraco

This Claude 2.1 vs. GPT-4-Turbo chart absolutely blows my mind. Insane levels of fidelity from @OpenAI. Credit: @GregKamradt

South Dakota, USA 🇺🇸 English

429

137.3K

Casper@CasperWeb3·23 Kas

@SteveMoraco @OpenAI @GregKamradt These posts are only made to farm reactions on Twitter. Everyone should know that using GPT-4 as an evaluator says nothing about another model as GPT-4 is biased towards itself (surprise)

English

129

stv.eth@SteveMoraco·22 Kas

This Claude 2.1 vs. GPT-4-Turbo chart absolutely blows my mind. Insane levels of fidelity from @OpenAI. Credit: @GregKamradt

English

358

3.1K

1.7M

Casper@CasperWeb3·22 Kas

Can someone make GPT-4 good again?

English

207

Casper@CasperWeb3·21 Kas

@TXMCtrades Good luck with this. There is nothing that gets close to Bloomberg or Refinitiv for real-time data. They collaborate with the government that sends it directly to them.

English

332

Casper@CasperWeb3·21 Kas

@EMostaque An LLM as good as GPT-4 with MIT license

English

Emad@EMostaque·21 Kas

What is being released tomorrow? Guesses go here 👇

English

167

227

139.6K

Casper@CasperWeb3·15 Kas

@parth007_96 @woosuk_k They do acknowledge this in the blog from what I’m reading - DS MII is faster in the case of large contexts and small outputs. They also note that they will optimize this part of vLLM with the SplitFuse strategy.

English

108

Parth Thakkar@parth007_96·15 Kas

@woosuk_k I think the two blogs are comparing different load distributions. Deepspeed blog used avg prompt size of 2000 and output size of 128 whereas the image in your post has prompt length 500. Would be good to see the reproduction on both settings. Btw long prompts are fairly common

English

1.3K

Woosuk Kwon@woosuk_k·15 Kas

We’ve just released a new blog post comparing vLLM with DeepSpeed-FastGen. While we are happy to see the open-source technology advancements from the DeepSpeed team, we’ve got different results with more extensive performance benchmarks. vLLM is actually faster than DeepSpeed in many common scenarios. Details here: blog.vllm.ai/2023/11/14/not… (written with @zhuohan123, @simon_mo_, @eqhylxx)

English

205

45.2K

Casper@CasperWeb3·13 Kas

@AravSrinivas Unironically, this happened super early with ChatGPT. When security experts who know nothing about search & LLMs start fearmongering about your product - you know that you have made it.

English

415

Aravind Srinivas@AravSrinivas·13 Kas

Perplexity user: “Microsoft banned your domain, so can't use it on work devices”.

English

154

58.9K

Casper@CasperWeb3·6 Kas

That's it, switching from OpenAI to @perplexity_ai. It's simply better in my experience. Now with GPT-4 Turbo, the ChatGPT experienced has dropped for my coding tasks and an effective search for context that Perplexity leverages gives me more accurate answers.

English

439

Casper ретвитнул

Bravos Research@bravosresearch·4 Kas

2008 Financial Crisis timeline 👇

English

379

1.5K

318.4K

Casper@CasperWeb3·3 Kas

@EMostaque Any chance of a commercially compatible license?

English

Emad@EMostaque·3 Kas

Got lots of open models to release this month & other announcements. The creation => control => composition pipeline across every modality will evolve rapidly thanks to open models, super exciting. Holodeck & Star Trek future here we come.. Open source ftw 🚀

GIF

English

258

26.5K

Casper@CasperWeb3·19 Eki

@osanseviero Where is the code to reproduce these results?

English

Omar Sanseviero@osanseviero·18 Eki

NEFTune - an exciting paper many missed 🚀 With a very simple trick, your supervised fine tunes will get significantly better metrics! Let's learn about it🧑‍🎓 What's the simple trick? During training, add some noise to the output of the embedding layers. That's it! It sounds too simple, but it helps a lot! Fine-tuning Mistral-7B on Guanaco, @_lewtun obtained a 25% boost! 🤯(the boost is dataset dependant, of course) Here is another one from @tomgoldsteincs 👀 It's as simple as doing this in your forward pass if training: return orig_embed(x) + noise else: return orig_embed(x) How to use this?🚀 For trl, it's as easy as adding neftune_noise_alpha to the trainer's init (check the paper to see what is controlled by it) This method seems to work well independently of model sizes and model types. It also works with QLoRA! Paper: arxiv.org/abs/2310.05914 Thanks to @tomgoldsteincs @neeljain1717 et al for the paper, and @younesbelkada @lvwerra and @_lewtun for the quick integration into the trl library

English

119

19.8K

Открыть

@abacaj @ylecun @aton2006 @teknium @markatgradient @togethercompute @WifeyAlpha @ChiefQuant