256 posts

[email protected]

@imcs_dl

Deep Learning at University of Latvia, IMCS

Entrou em Haziran 2019

74 Seguindo60 Seguidores

[email protected] retweetou

Yuchen Jin@Yuchenj_UW·6 Oca

OpenAI and Anthropic have opposite cultures. OpenAI runs like a modern Bell Labs. 2-3 researchers spin up projects like GPT & Sora, then turn them into products. Maximal ambition, from each kind of model to robotics to AI device. Anthropic is brutally focused. They believe coding is the path to AGI. Everything else is noise. No image models. No video models. No vagueposts. It will be fascinating to see which one wins.

English

334

8.2K

2.1M

[email protected]@imcs_dl·5 Eyl

@elonmusk The Tesla price is not publicly advertised in Latvia. I am considering buying an electric car for a year, but have not had time to investigate the market, sellers, rebates, state subsidies, public charger and home charging prices etc. Just see more electric cars on the street,

English

Elon Musk@elonmusk·4 Eyl

Please reply to this post with any difficulties you may have had in trying to buy a Tesla. Our goal is for the purchase and delivery experience to be fast and simple, with accurate answers to your questions. The key test is that you would recommend it to a friend.

English

61.3K

10.8K

128.4K

40.1M

[email protected] retweetou

Andrej Karpathy@karpathy·25 Haz

+1 for "context engineering" over "prompt engineering". People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step. Science because doing this right involves task descriptions and explanations, few shot examples, RAG, related (possibly multimodal) data, tools, state and history, compacting... Too little or of the wrong form and the LLM doesn't have the right context for optimal performance. Too much or too irrelevant and the LLM costs might go up and performance might come down. Doing this well is highly non-trivial. And art because of the guiding intuition around LLM psychology of people spirits. On top of context engineering itself, an LLM app has to: - break up problems just right into control flows - pack the context windows just right - dispatch calls to LLMs of the right kind and capability - handle generation-verification UIUX flows - a lot more - guardrails, security, evals, parallelism, prefetching, ... So context engineering is just one small piece of an emerging thick layer of non-trivial software that coordinates individual LLM calls (and a lot more) into full LLM apps. The term "ChatGPT wrapper" is tired and really, really wrong.

tobi lutke@tobi

I really like the term “context engineering” over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.

English

530

2.1K

14.4K

2.4M

[email protected] retweetou

vitrupo@vitrupo·27 Haz

CRISPR pioneer George Church says AI is letting us steer evolution, and biology is becoming a kind of computer. “Evolution might incorporate a few base pair changes in a million years. Now we can make billions of changes in an afternoon.” AI designs the possibilities. Biology executes them. No simulation. No approximation. Just real matter evolving on command.

English

109

588

46.7K

[email protected] retweetou

ₕₐₘₚₜₒₙ@hamptonism·22 Haz

Peter Thiel: “Computer Science was a fake field”

English

101

201

2.7K

902.1K

[email protected] retweetou

vitrupo@vitrupo·22 May

Satya Nadella says he cares far less about AGI benchmarks than about real-world impact. The tech industry “became the place we were celebrating ourselves and I just hate it.” He says what matters is not who built the model, but who used it, and whether something changed.

English

693

78.7K

[email protected]@imcs_dl·13 May

SERDES sezonas atklāšanā LU profesora Gunta Bārzdiņa vadībā praktiski iemēģinājām, kā radīt Suitu tradīcijas pārzinošus ChatGPT tēlus; tie tagad nonākuši arī Latvijas Radio 1, "Kultūras Rondo", un ir arī pieejami reģistretiem ChatGPT lietotājiem. facebook.com/SERDE/posts%2F…

Latviešu

[email protected] retweetou

AI Lab@AiLab_lv·1 Nis

"Valoda ir tā, kas vada mūsu attīstību. Bez valodas mēs pat īsti nesaprastu pasauli mums apkārt. Valoda kaut kādā ziņā ir mūsu acis." Intervija ar vadošo pētnieku un LZA īsteno locekli Gunti Bārzdiņu "Zinātnes Vēstneša" marta numurā. lza.lv/images/Zinatne…

Latviešu

234

[email protected] retweetou

Yam Peleg@Yampeleg·27 Mar

So gpt-4o is confirmed to be an autoregressive image generation model. how. on. earth. respect. cdn.openai.com/11998be9-5319-…

English

242

535.7K

[email protected] retweetou

Andrej Karpathy@karpathy·27 Şub

This is interesting as a first large diffusion-based LLM. Most of the LLMs you've been seeing are ~clones as far as the core modeling approach goes. They're all trained "autoregressively", i.e. predicting tokens from left to right. Diffusion is different - it doesn't go left to right, but all at once. You start with noise and gradually denoise into a token stream. Most of the image / video generation AI tools actually work this way and use Diffusion, not Autoregression. It's only text (and sometimes audio!) that have resisted. So it's been a bit of a mystery to me and many others why, for some reason, text prefers Autoregression, but images/videos prefer Diffusion. This turns out to be a fairly deep rabbit hole that has to do with the distribution of information and noise and our own perception of them, in these domains. If you look close enough, a lot of interesting connections emerge between the two as well. All that to say that this model has the potential to be different, and possibly showcase new, unique psychology, or new strengths and weaknesses. I encourage people to try it out!

Inception@_inception_ai

We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.

English

373

1.5K

11.5K

942.6K

[email protected] retweetou

AI Lab@AiLab_lv·25 Şub

Vajag palīdzību audio atpazīšanā un transkribēšanā? Īsa pamācība LATEs (late.ailab.lv) lietošanā (video sagatavots sadarbībā ar @_LFK_ ):

Latviešu

1.9K

[email protected] retweetou

AI Lab@AiLab_lv·19 Şub

Raidījumā "Latvijas formula" viesos AiLab vadītājs Normunds Grūzītis. Sarunas tēma – valsts tehnoloģiskā un digitāla attīstība: xtv.lv/rigatv24/video…

Latviešu

802

[email protected] retweetou

Alexandr Wang@alexandr_wang·18 Şub

Grok 3 is a new best model in the world from the @xai team! Grok 3 ranks #1 on Chatbot Arena w/a big gap, and scores impressively on pretraining and reasoning evals. congrats to @elonmusk @ibab @jimmybajimmyba @Yuhu_ai_ looking forward to more partnership on grok4 & beyond 🚀

English

658

934

7.9K

1.7M

[email protected] retweetou

Stanford NLP Group@stanfordnlp·12 Şub

The final admission that the 2023 strategy of OpenAI, Anthropic, etc. (“simply scaling up model size, data, compute, and dollars spent will get us to AGI/ASI”) is no longer working!

Sam Altman@sama

OPENAI ROADMAP UPDATE FOR GPT-4.5 and GPT-5: We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings. We want AI to “just work” for you; we realize how complicated our model and product offerings have gotten. We hate the model picker as much as you do and want to return to magic unified intelligence. We will next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model. After that, a top goal for us is to unify o-series models and GPT-series models by creating systems that can use all our tools, know when to think for a long time or not, and generally be useful for a very wide range of tasks. In both ChatGPT and our API, we will release GPT-5 as a system that integrates a lot of our technology, including o3. We will no longer ship o3 as a standalone model. The free tier of ChatGPT will get unlimited chat access to GPT-5 at the standard intelligence setting (!!), subject to abuse thresholds. Plus subscribers will be able to run GPT-5 at a higher level of intelligence, and Pro subscribers will be able to run GPT-5 at an even higher level of intelligence. These models will incorporate voice, canvas, search, deep research, and more.

English

113

1.3K

310.8K

[email protected] retweetou

Andrew Ng@AndrewYNg·27 Oca

Today's "DeepSeek selloff" in the stock market -- attributed to DeepSeek V3/R1 disrupting the tech ecosystem -- is another sign that the application layer is a great place to be. The foundation model layer being hyper-competitive is great for people building applications.

English

238

994

791.4K

[email protected] retweetou

Andrej Karpathy@karpathy·27 Oca

I don't have too too much to add on top of this earlier post on V3 and I think it applies to R1 too (which is the more recent, thinking equivalent). I will say that Deep Learning has a legendary ravenous appetite for compute, like no other algorithm that has ever been developed in AI. You may not always be utilizing it fully but I would never bet against compute as the upper bound for achievable intelligence in the long run. Not just for an individual final training run, but also for the entire innovation / experimentation engine that silently underlies all the algorithmic innovations. Data has historically been seen as a separate category from compute, but even data is downstream of compute to a large extent - you can spend compute to create data. Tons of it. You've heard this called synthetic data generation, but less obviously, there is a very deep connection (equivalence even) between "synthetic data generation" and "reinforcement learning". In the trial-and-error learning process in RL, the "trial" is model generating (synthetic) data, which it then learns from based on the "error" (/reward). Conversely, when you generate synthetic data and then rank or filter it in any way, your filter is straight up equivalent to a 0-1 advantage function - congrats you're doing crappy RL. Last thought. Not sure if this is obvious. There are two major types of learning, in both children and in deep learning. There is 1) imitation learning (watch and repeat, i.e. pretraining, supervised finetuning), and 2) trial-and-error learning (reinforcement learning). My favorite simple example is AlphaGo - 1) is learning by imitating expert players, 2) is reinforcement learning to win the game. Almost every single shocking result of deep learning, and the source of all *magic* is always 2. 2 is significantly significantly more powerful. 2 is what surprises you. 2 is when the paddle learns to hit the ball behind the blocks in Breakout. 2 is when AlphaGo beats even Lee Sedol. And 2 is the "aha moment" when the DeepSeek (or o1 etc.) discovers that it works well to re-evaluate your assumptions, backtrack, try something else, etc. It's the solving strategies you see this model use in its chain of thought. It's how it goes back and forth thinking to itself. These thoughts are *emergent* (!!!) and this is actually seriously incredible, impressive and new (as in publicly available and documented etc.). The model could never learn this with 1 (by imitation), because the cognition of the model and the cognition of the human labeler is different. The human would never know to correctly annotate these kinds of solving strategies and what they should even look like. They have to be discovered during reinforcement learning as empirically and statistically useful towards a final outcome. (Last last thought/reference this time for real is that RL is powerful but RLHF is not. RLHF is not RL. I have a separate rant on that in an earlier tweet x.com/karpathy/statu…)

Andrej Karpathy@karpathy

DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M). For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being brought up today are more around 100K GPUs. E.g. Llama 3 405B used 30.8M GPU-hours, while DeepSeek-V3 looks to be a stronger model at only 2.8M GPU-hours (~11X less compute). If the model also passes vibe checks (e.g. LLM arena rankings are ongoing, my few quick tests went well so far) it will be a highly impressive display of research and engineering under resource constraints. Does this mean you don't need large GPU clusters for frontier LLMs? No but you have to ensure that you're not wasteful with what you have, and this looks like a nice demonstration that there's still a lot to get through with both data and algorithms. Very nice & detailed tech report too, reading through.

English

361

2.1K

14.3K

2.4M

[email protected] retweetou

bayes@bayeslord·27 Oca

x.com/i/article/1883…

ZXX

162

1.5K

342.1K

[email protected] retweetou

AI Lab@AiLab_lv·24 Oca

22. janvārī Madara un Gunta piedalījās seminārā Ropažu novada skolotājiem un pastāstīja par AiLab aktualitātēm valodu tehnoloģijās, kā arī par elektronisko vārdnīcu tēzaurs.lv.

Latviešu

484

[email protected]@imcs_dl·20 Oca

Par šo sīkāk LR1 raidījumā lr1.lsm.lv/lv/raksts/zina…

AI Lab@AiLab_lv

Par vienu no nozīmīgākajiem sasniegumiem zinātnē 2024. gadā @LZA_LV atzinusi pētījumu "Robota kognitīvā uztvere un augsta līmeņa instrukciju interpretācija ar dabiskās valodas jēdzieniem". Lepojamies ar mūsu @imcs_dl komandas ieguldījumu tā īstenošanā sadarbībā ar @edi_riga.

Latviešu

[email protected] retweetou

AI Lab@AiLab_lv·13 Oca

Latviešu

403

Descobrir

@elonmusk @_LFK_ @xai @ibab @jimmybajimmyba @Yuhu_ai_ @BarackObama @taylorswift13