Buildscope

947 posts

Buildscope banner
Buildscope

Buildscope

@buildscope_

🚀 BuildScope: Your startup's AI-powered companion. Streamlining design, coding, and feedback for startups. Chat now to transform your vision 🤖 @_buildspace

Join now! Katılım Nisan 2023
203 Takip Edilen194 Takipçiler
Buildscope retweetledi
Advait Paliwal
Advait Paliwal@advaitpaliwal·
had a raspberry pi laying around and built an ai wearable called insight at @Google x @mhacks hackathon this weekend. insight uses gemini 1.5 pro to answer questions based on what you see and hear, and it remembers those memories for you. repo in comments
Advait Paliwal tweet mediaAdvait Paliwal tweet media
English
98
273
3.4K
671.7K
Buildscope retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Highly amusing update, ~18 hours later: llm.c is now down to 26.2ms/iteration, exactly matching PyTorch (tf32 forward pass). We discovered a bug where we incorrectly called cuBLAS in fp32 mathmode 🤦‍♂️. And ademeure contributed a more optimized softmax kernel for very long rows (50,257 elements per row, in the last logits layer). But the fun doesn’t stop because we still have a lot of tricks up the sleeve. Our attention kernel is naive attention, not flash attention, and materializes the (very large) preattention and postattention matrices of sizes (B, NH, T, T), also it makes unnecessary round-trips with yet-unfused GeLU non-linearities and permute/unpermute inside our attention. And we haven’t reached for more optimizations, e.g. CUDA Graphs, lossless compressible memory (?), etc. So the updated chart looks bullish :D, and training LLMs faster than PyTorch with only ~2,000 lines of C code feels within reach. Backward pass let’s go.
Andrej Karpathy tweet media
Andrej Karpathy@karpathy

A few new CUDA hacker friends joined the effort and now llm.c is only 2X slower than PyTorch (fp32, forward pass) compared to 4 days ago, when it was at 4.2X slower 📈 The biggest improvements were: - turn on TF32 (NVIDIA TensorFLoat-32) instead of FP32 for matmuls. This is a new mathmode in GPUs starting with Ampere+. This is a very nice, ~free optimization that sacrifices a little bit of precision for a large increase in performance, by running the matmuls on tensor cores, while chopping off the mantissa to only 10 bits (the least significant 19 bits of the float get lost). So the inputs, outputs and internal accumulates remain in fp32, but the multiplies are lower precision. Equivalent to PyTorch `torch.set_float32_matmul_precision('high')` - call cuBLASLt API instead of cuBLAS for the sGEMM (fp32 matrix multiply), as this allows you to also fuse the bias into the matmul and deletes the need for a separate add_bias kernel, which caused a silly round trip to global memory for one addition. - a more efficient attention kernel that uses 1) cooperative_groups reductions that look much cleaner and I only just learned about (they are not covered by the CUDA PMP book...), 2) the online softmax algorithm used in flash attention, 3) fused attention scaling factor multiply, 4) "built in" autoregressive mask bounds. (big thanks to ademeure, ngc92, lancerts on GitHub for writing / helping with these kernels!) Finally, ChatGPT created this amazing chart to illustrate our progress. 4 days ago we were 4.6X slower, today we are 2X slower. So we are going to beat PyTorch imminently 😂 Now (personally) going to focus on the backward pass, so we have the full training loop in CUDA.

English
155
532
6K
1.1M
Buildscope
Buildscope@buildscope_·
See how startups in the buildspace community are achieving their dreams with BuildScope. Real-time design, coding solutions, and actionable feedback - all in one platform. Your journey towards startup success begins here.
English
0
0
2
56
Buildscope
Buildscope@buildscope_·
Turning a great idea into a fully-functional app shouldn't be daunting. With BuildScope, the journey from conception to completion is streamlined and intelligent. Ready to transform your startup vision into reality? Chat with BuildScope now.
English
0
0
3
108
Buildscope
Buildscope@buildscope_·
Navigate the complexities of the startup world with BuildScope by your side. Our intelligent chatbot provides real-time insights and market-driven strategies, giving you a competitive edge. Your partner in every step.
English
0
0
4
71
Buildscope
Buildscope@buildscope_·
Transform your startup idea into a market-ready product with BuildScope's automated processes. Save time and reduce costs with our AI-driven solutions. Focus on what truly matters - your vision.
English
0
0
2
55
Buildscope
Buildscope@buildscope_·
Feedback is the compass for your startup's journey. BuildScope enables you to harness the power of direct user feedback, refine your product, and iterate towards perfection. Make every insight count.
English
0
0
2
63
Buildscope
Buildscope@buildscope_·
Struggling with the look and feel of your startup's app? BuildScope offers real-time design assistance to align your app’s aesthetics with its purpose. Simplify your design process and bring your vision to life with precision.
English
0
1
3
101
Buildscope
Buildscope@buildscope_·
Dive into coding with confidence. BuildScope's AI is your on-demand pair programmer, ready to offer instantaneous coding solutions. Ensure your app's functionality matches your vision perfectly with BuildScope.
English
0
0
3
64
Buildscope retweetledi
Brock Holt
Brock Holt@brockholt_·
gn 🌕
Brock Holt tweet media
0
2
5
106
Buildscope retweetledi
NeuralBite
NeuralBite@NeuralBite·
In this week's roundup of tech news In start-up news, @acquiredotcom has announced its upcoming webinar, focusing on the critically important topic of start-up valuations. This event promises to be a rich resource for budding entrepreneurs striving to accurately assess their startups' worth. An exciting development in AI and Machine Learning was announced this week. Advanced Transformer models can now be run on @NotionHQ using only a few lines of code, thanks to the ingenious tools Pyodide, Transformersjs, and Gradio. This innovation could bring game-changing simplicity to the often complex science of machine learning. On @github, the world of @Gradio is garnering recognition as among the top 5 most-starred projects. This clearly signifies the potent force of open-source innovation in the tech world. In AI advancements, the breakthrough of Artificial General Intelligence (AGI) has been confirmed marking a huge step forward in the evolution of technology. PyTorch Conference, taking place next week in San Francisco, will be enthused with the participation of @_philschmid – his talk on 'Getting Started with @PyTorch 2.0' is eagerly anticipated. Lastly, a revolutionary discovery has gathered attention online, changing perspectives and bringing novel insights that inspire curiosity and innovation. Stay tuned for more updates on tech and digital news!
NeuralBite tweet media
English
1
3
7
336
Buildscope retweetledi
Brock Holt
Brock Holt@brockholt_·
gn 🌕
Brock Holt tweet media
9
1
5
148
Buildscope retweetledi
Brock Holt
Brock Holt@brockholt_·
gm ☀️
Brock Holt tweet media
4
1
4
115
Buildscope retweetledi
NeuralBite
NeuralBite@NeuralBite·
Google takes a cue from Meta, deciding to block news in Canada. Correcting missteps demonstrates integrity. Canada, we're with you! #GoogleNews #Canada #InternetCensorship
NeuralBite tweet media
English
0
1
2
139
Buildscope retweetledi
Brock Holt
Brock Holt@brockholt_·
gm ☀️
Brock Holt tweet media
2
1
3
96