Casper

1.2K posts

Casper banner
Casper

Casper

@CasperWeb3

Присоединился Ocak 2022
278 Подписки1.7K Подписчики
Casper
Casper@CasperWeb3·
@abacaj The problem is that most of the Chinese models are inherently not open and free because they use very restrictive licenses.
English
0
0
3
302
anton
anton@abacaj·
The LLMs coming out of china performing really well on benchmarks but I don’t hear much on how they actually perform on real tasks or if anyone outside china is actually using them
English
21
5
177
33.1K
Casper
Casper@CasperWeb3·
@ylecun Now we just need to properly open Llama 3. Make it Apache 2.0 or MIT license and you can finally call it truly open source. Follow Mistral, they set a great example.
English
0
0
1
98
Yann LeCun
Yann LeCun@ylecun·
Open source AI foundation models will wipe out closed and proprietary AI models for the same reason Wikipedia wiped out generalist commercial encyclopedia: crowd-sourced human contributions to open platforms can cater to a high diversity of interests, cultures, and languages.
English
226
553
3.8K
564.9K
Casper
Casper@CasperWeb3·
@aton2006 @teknium @markatgradient @togethercompute There is a gap in knowledge in general about how quantized models work. The reason most quantized models are slower is because of the compute overhead of high batch size + dequantization.
English
1
0
0
87
Anton McGonnell
Anton McGonnell@aton2006·
@teknium @markatgradient @togethercompute I don’t understand why this would be the case. Throughput should be a function of number in the batch by number of sockets, latency should be a function of number of sockets for a batch of 1. Quantization should not impact this.
English
1
0
0
278
Casper
Casper@CasperWeb3·
This is incredible competition from AMD. They now beat Nvidia in both speed and price for inference. They just need to catch up with robust software - with time, it looks like they will beat Nvidia in the performance/cost trade-off. community.amd.com/t5/instinct-ac…
English
0
0
0
201
Casper
Casper@CasperWeb3·
@BlancheMinerva Open source is nothing without open and free weights. Last time I checked, all the great Chinese models are gated by custom licenses and applications to even use them. Doesn’t quite meet my definition of open or free.
English
1
0
0
66
Casper
Casper@CasperWeb3·
@KyeGomezB They forked their own code and improved it, nice. They trained better models too, double nice.
English
0
0
0
137
Kye Gomez (swarms)
Kye Gomez (swarms)@KyeGomezB·
Mistral's team litterally just forked llama changed less than 40 lines of code and raised 120+M. Investors what are you doing 😢 this is horrendous
English
87
62
1.1K
846.4K
Casper
Casper@CasperWeb3·
@lukebelmar Tether is the last one to go. Until then, crypto is a ticking time bomb.
English
0
0
0
31
Luke Belmar 👽
Luke Belmar 👽@lukebelmar·
My problems isn’t Bitcoin. My problems is USDT inflating the price of Bitcoin. What happens if Tether gets smacked the same way Binance did? Does BTC hold? I’m skeptical.
English
107
103
1.8K
143.6K
Casper
Casper@CasperWeb3·
@E0M The authors of this post used GPT-4 to evaluate if a certain phrase was in the text. This does not have any resemblance of a real evaluation and certainly not something I would expect OpenAI to be using
English
0
0
0
175
Casper
Casper@CasperWeb3·
@SteveMoraco @OpenAI @GregKamradt These posts are only made to farm reactions on Twitter. Everyone should know that using GPT-4 as an evaluator says nothing about another model as GPT-4 is biased towards itself (surprise)
English
0
0
0
129
stv.eth
stv.eth@SteveMoraco·
This Claude 2.1 vs. GPT-4-Turbo chart absolutely blows my mind. Insane levels of fidelity from @OpenAI. Credit: @GregKamradt
stv.eth tweet mediastv.eth tweet media
English
82
358
3.1K
1.7M
Casper
Casper@CasperWeb3·
Can someone make GPT-4 good again?
English
1
0
1
207
Casper
Casper@CasperWeb3·
@TXMCtrades Good luck with this. There is nothing that gets close to Bloomberg or Refinitiv for real-time data. They collaborate with the government that sends it directly to them.
English
0
0
0
332
Casper
Casper@CasperWeb3·
@EMostaque An LLM as good as GPT-4 with MIT license
English
0
0
0
47
Emad
Emad@EMostaque·
What is being released tomorrow? Guesses go here 👇
English
167
14
227
139.6K
Casper
Casper@CasperWeb3·
@parth007_96 @woosuk_k They do acknowledge this in the blog from what I’m reading - DS MII is faster in the case of large contexts and small outputs. They also note that they will optimize this part of vLLM with the SplitFuse strategy.
English
0
0
0
108
Parth Thakkar
Parth Thakkar@parth007_96·
@woosuk_k I think the two blogs are comparing different load distributions. Deepspeed blog used avg prompt size of 2000 and output size of 128 whereas the image in your post has prompt length 500. Would be good to see the reproduction on both settings. Btw long prompts are fairly common
English
2
0
4
1.3K
Woosuk Kwon
Woosuk Kwon@woosuk_k·
We’ve just released a new blog post comparing vLLM with DeepSpeed-FastGen. While we are happy to see the open-source technology advancements from the DeepSpeed team, we’ve got different results with more extensive performance benchmarks. vLLM is actually faster than DeepSpeed in many common scenarios. Details here: blog.vllm.ai/2023/11/14/not… (written with @zhuohan123, @simon_mo_, @eqhylxx)
Woosuk Kwon tweet media
English
3
30
205
45.2K
Casper
Casper@CasperWeb3·
@AravSrinivas Unironically, this happened super early with ChatGPT. When security experts who know nothing about search & LLMs start fearmongering about your product - you know that you have made it.
English
1
0
1
415
Aravind Srinivas
Aravind Srinivas@AravSrinivas·
Perplexity user: “Microsoft banned your domain, so can't use it on work devices”.
English
26
7
154
58.9K
Casper
Casper@CasperWeb3·
That's it, switching from OpenAI to @perplexity_ai. It's simply better in my experience. Now with GPT-4 Turbo, the ChatGPT experienced has dropped for my coding tasks and an effective search for context that Perplexity leverages gives me more accurate answers.
English
0
0
2
439
Casper ретвитнул
Bravos Research
Bravos Research@bravosresearch·
2008 Financial Crisis timeline 👇
Bravos Research tweet media
English
69
379
1.5K
318.4K
Casper
Casper@CasperWeb3·
@EMostaque Any chance of a commercially compatible license?
English
0
0
0
83
Emad
Emad@EMostaque·
Got lots of open models to release this month & other announcements. The creation => control => composition pipeline across every modality will evolve rapidly thanks to open models, super exciting. Holodeck & Star Trek future here we come.. Open source ftw 🚀
GIF
English
16
23
258
26.5K
Casper
Casper@CasperWeb3·
@osanseviero Where is the code to reproduce these results?
English
0
0
0
42
Omar Sanseviero
Omar Sanseviero@osanseviero·
NEFTune - an exciting paper many missed 🚀 With a very simple trick, your supervised fine tunes will get significantly better metrics! Let's learn about it🧑‍🎓 What's the simple trick? During training, add some noise to the output of the embedding layers. That's it! It sounds too simple, but it helps a lot! Fine-tuning Mistral-7B on Guanaco, @_lewtun obtained a 25% boost! 🤯(the boost is dataset dependant, of course) Here is another one from @tomgoldsteincs 👀 It's as simple as doing this in your forward pass if training: return orig_embed(x) + noise else: return orig_embed(x) How to use this?🚀 For trl, it's as easy as adding neftune_noise_alpha to the trainer's init (check the paper to see what is controlled by it) This method seems to work well independently of model sizes and model types. It also works with QLoRA! Paper: arxiv.org/abs/2310.05914 Thanks to @tomgoldsteincs @neeljain1717 et al for the paper, and @younesbelkada @lvwerra and @_lewtun for the quick integration into the trl library
Omar Sanseviero tweet mediaOmar Sanseviero tweet mediaOmar Sanseviero tweet media
English
7
31
119
19.8K