ZF

73 posts

ZF

ZF

@zffc

شامل ہوئے Ocak 2010
89 فالونگ169 فالوورز
ZF
ZF@zffc·
and interleaved global–local attention to deliver high quality with competitive cost on Apple’s Private Cloud Compute platform.
English
0
0
1
591
ZF
ZF@zffc·
through architectural innovations such as KV-cache sharing and 2-bit quantization-aware training; and (ii) a scalable server model built on a novel Parallel-Track Mixture-of-Experts (PT-MoE) transformer that combines track parallelism, mixture-of-experts sparse computation,
English
1
0
2
832
ZF
ZF@zffc·
In this report, we describe the 2025 Apple Foundation Models ("AFM"). We also introduce the new Foundation Models framework, which gives app developers direct access to the on-device AFM model machinelearning.apple.com/research/apple…
English
2
16
41
9.9K
ZF ری ٹویٹ کیا
Jeff Dean
Jeff Dean@JeffDean·
Not that many systems handle 6B QPS. 😀 "Bigtable has been in continuous production use at Google for more than 15 years now, processing over 6 billion requests per second at peak and with over 10 exabytes of data under management. " cloud.google.com/blog/products/…
English
41
161
1.2K
256.6K
ZF ری ٹویٹ کیا
Jian Ma
Jian Ma@jmuiuc·
I view my talk as a chance to exchange ideas w/ students. Some points I raised: - Be interdisciplinary, à la A. van Leeuwenhoek - Read papers older than you - On fast-evolving topics, my students outpace my knowledge - My last serious coding session? During Obama's 1st term 😑
Jian Ma@jmuiuc

I'm sharing my @UCLA_CGSI talk slides last week surveying recent #LLM methods in genomics (DNA, scRNA). As exciting as LLM's potential in genomics is, a note of skepticism remains. Hope we maintain vigilance in what we publish. Feedback on slides welcome. cs.cmu.edu/~jianma/talks/…

English
1
6
44
11.1K
ZF ری ٹویٹ کیا
Dmitry (Dima) Lepikhin
Dmitry (Dima) Lepikhin@lepikhin·
arxiv.org/abs/2105.04663 GSPMD aka "Sharding is all you need". Foundational work for Giant Models by Yuanzhong Xu et al! *generalized from GShard backend system
English
1
3
14
0
ZF ری ٹویٹ کیا
Orhan Firat
Orhan Firat@orf_bnw·
This week we will be presenting three papers at #ICLR2021 each exploring a different aspect of multi-task/multilingual models at scale: (1) modeling (2) optimization and (3) large scale systems.
English
1
5
47
0
ZF ری ٹویٹ کیا
Vala Afshar
Vala Afshar@ValaAfshar·
Surround yourself with people who care about and encourage you to reach higher.
English
611
49.2K
141.8K
0
ZF ری ٹویٹ کیا
Harper's Magazine
Harper's Magazine@Harpers·
A statement signed by 150 people incl. Bill T. Jones, Wynton Marsalis, Jennifer Finney Boylan, Noam Chomsky, J.K. Rowling, Margaret Atwood, and Salman Rushdie expresses concern over the illiberal trend intensified by our national reckoning. harpers.org/a-letter-on-ju…
English
2.4K
5.5K
15.3K
0
ZF ری ٹویٹ کیا
Jeff Dean
Jeff Dean@JeffDean·
@timnitGebru @kat_heller I absolutely don't condone anyone being personally attacked on social media. I just haven't seen this in my feed here. If anyone reading this is part of attacking you or anyone else, I say this to you: Please stop. Personal attacks have no place in scientific discourse.
English
0
1
34
0
ZF ری ٹویٹ کیا
Jeff Dean
Jeff Dean@JeffDean·
Great work by @GoogleAI researchers @lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. 13.5 BLEU point gain is really significant!
Dmitry (Dima) Lepikhin@lepikhin

arxiv.org/abs/2006.16668 We scaled the Transformer model with Sparsely-Gated Mixture-of-Experts using GShard, and trained a 600B multilingual translation model in about 4 days (for 100 languages) achieving 13.5 BLEU gain compared to the baseline.

English
2
28
161
0