Alex Shtoff

11.3K posts

Alex Shtoff banner
Alex Shtoff

Alex Shtoff

@AlexShtf

Ph.D. Principal Scientist @ TII. Ex @YahooResearch. I do machine learning ∩ numerical methods ∩ SW development.

Israel Katılım Ağustos 2012
289 Takip Edilen1.3K Takipçiler
Sabitlenmiş Tweet
Alex Shtoff
Alex Shtoff@AlexShtf·
New post in my "Eigenvalues as models" series. This one asks a practical question: can eigenvalue-based models be made much cheaper to train and evaluate without collapsing into something too simple to be interesting? Dense matrices are expressive but expensive. Fully diagonal ones are cheap but too restrictive. In this post I explore a middle ground that turned out to be much more useful than I expected. It is probably the most implementation-focused entry in the series so far: structured matrices, PyTorch/SciPy plumbing, and experiments. If you care about spectral methods, differentiable numerical linear algebra, or unusual tabular model classes, this is post and the entire series is for you: alexshtf.github.io/2026/03/15/Spe…
English
1
11
53
4.6K
teej dv 🔭
teej dv 🔭@teej_dv·
whats your favorite type system feature in a programming language? it can be an unused lang or whatever, just something cool that you think other languages could be doing if it they were designed w/ it from the start ... definitely not making my own lang ... do not tell prime
English
166
1
328
50.5K
Alex Shtoff
Alex Shtoff@AlexShtf·
@jlylekim Is there any optimizer work that is not about "optimizing for language models"? :)
English
2
0
1
175
J. Lyle Kim
J. Lyle Kim@jlylekim·
🚨New paper: Anytime Training with Schedule-Free Spectral Optimization🚨 We introduce SF-NorMuon, a schedule-free spectral method that outperforms or matches heavily tuned AdamW across 125M and 772M parameter language models.
J. Lyle Kim tweet media
English
8
21
114
12.5K
Shira Makin
Shira Makin@shiramakin·
שאלה לאנשים שמזדהים כליברלים ומתכוונים להצביע בנט. מה האג׳נדה הליברלית שלו, אם יש כזו? שואלת ברצינות
עברית
97
3
125
12K
Alex Shtoff
Alex Shtoff@AlexShtf·
@kaiswonderlandd @loic_na Shinkansen may not be the fastest, but its scale is unprecedented. Neither in Europe nor in China. A huge number of lines, phenomenal frequency, safety, and capacity, and all of this at very high speeds. It serves almost the entire country.
English
0
0
0
19
kai!
kai!@kaiswonderlandd·
@AlexShtf @loic_na shinkansen isnt even top 5 high speed trains im sorry
English
1
0
0
21
Priya
Priya@naturedotcom·
Name a skill that pays more than people can think
English
45
1
23
2.9K
Beyza
Beyza@hicasamadim·
bunu çözersen, sen bir dahisin. çözebilir misin?
Beyza tweet media
Türkçe
53.5K
725
9K
5.1M
Mathieu
Mathieu@miniapeur·
Mathieu tweet media
ZXX
3
11
210
4.5K
igalk
igalk@igalk3·
למי שלא באפליקציות, איפה מכירים במציאות? יש לכם איזה מציאות מומלצת?
עברית
52
0
218
31K
Alex Shtoff
Alex Shtoff@AlexShtf·
This is a very interesting approach, but let's take it one step further. There are plenty of integer programming solvers such as @mosektw , @gurobi , and others that, even though in the worst case have an exponential complexity, on many very large scale practical problems they find globally optimal solutions fairly quickly. Have you tried just solving the integer programming problem directly?
English
0
0
2
100
Tiago Pimentel
Tiago Pimentel@tpimentelms·
Fresh on arXiv! 😁 Our new paper reformulates tokenisation as a linear program (LP), which we solve to get SOTA tokenisers! As a bonus, this LP allows us to know how close to optimal any tokeniser is! Check it out! 👇
Jan Tempus@Jan55028368

In our new paper, we reinterpret tokenisation as a problem in high-dimensional geometry (100M dims to be precise!), which we can solve efficiently to get a globally near-optimal tokeniser! Our method consistently improves language models over BPE. See 🧵for details.

English
2
9
110
15.1K
Alex Shtoff
Alex Shtoff@AlexShtf·
@MikeE_3_14 @RichardSSutton קורה גם עם פולינומים, לא רק עם טורי פוריה. מה שלא מפתיע אותי הוא שזה יצא מבית היוצר של John Duchi. האיש אגדה :)
עברית
1
0
1
29
Mike Erlihson, Math PhD, AI
⚡️ה-bitter lesson האגדי של @RichardSSutton מכה שוב 🧲והפעם זה נמצא במחוזות הדאטה בפרט "דאטה איכותי" מה תגידו אם אספר שסינון הדאטה שלכם בעצם פוגע ביכולות של המודל? 🤯 💣מאמר חדש ודי חמוד מסטנפורד מטיל סוג של פצצונת(לדעתי די צפוי): תחת מספיק כוח חישוב, הפילטר הכי טוב לדאטה הוא שום פילטר. תשכחו מצינורות סינון יקרים שמנפים 99% מהרשת כדי לחפש זהב. 🤯למה זה קורה? ההסבר מבוסס על אלגברה לינארית טהורה. החוקרים מדגימים, דרך פירוק מטריצות (Low-Rank Matrix Factorization), שכאשר הקיבולת (אקספרסיבנס) של המודל גדולה מספיק (כלומר הדרגה גדולה מספיק), הרעש בדאטה ״נעלם״ מבחינה מתמטית. בעוד שמודלים קטנים "נחנקים" ממידע זבל בגלל צוואר בקבוק ייצוגי, מודלים עצומים משתמשים ביתירות הפרמטרית שלהם(overparametrization) כדי לנתב את המידע המזוהם דרך חלקים מסוימים במודל. הרעש ״מוזרק״ למרחב אורתוגונלי שלא מתנגש עם הידע האיכותי, וכך המודל מצליח לשאוב מידע מועיל אפילו מטקסט משובש לחלוטין סטטיסטית (כמו התפלגויות Unigram) ולהפיק ממנו ערך מבלי להיענש בביצועים. 🧠📉 במבחן המציאות, כשהחוקרים סקיילו את כמות ה-FLOPs ואת גודל המודל, הוכח שאימון על ה-Common Crawl הגולמי עקף לבסוף מסדי נתונים מסוננים בקפידה כמו RefinedWeb ו-DCLM-Baseline. כדי לבחון את הגבולות, החוקרים הזריקו למאגר "זבל" טהור: מחרוזות טקסט אקראיות ומשפטים שסדר המילים בהם שובש לגמרי. התוצאה? המודלים הגדולים לא רק שלא קרסו, הם הצליחו להפיק מהכאוס הזה ערך אמיתי. החוקרים מנבאים שכאשר תעשיית ה-AI תגיע לעוצמות חישוב של כ-1e+30 FLOPs (על מודל גדול מספיק) אימון ישיר על הרשת הגולמית פשוט ישאיר אבק לכל מנגנון סינון שקיים כיום. 🚀 🧅🧄ה-Bitter Lesson מכה שנית. במקום להנדס פילטרים מורכבים על בסיס הטיות וקריטריונים אנושיים, תנו למתמטיקה ולכוח החישוב לדבר. הסקייל, מתברר, מסנן הכי טוב בעצמו. 💥
Mike Erlihson, Math PhD, AI tweet media
עברית
6
5
60
3.7K
Alex Shtoff
Alex Shtoff@AlexShtf·
@zbrandonz @alvarosabu The whole point is not understanding business in its current form, but changing it to a different form.
English
1
0
1
64
Brandon Smith
Brandon Smith@zbrandonz·
@alvarosabu If you can't understand why that won't work then you didn't understand business
English
5
0
5
1.6K
Alvaro アルバロ
Alvaro アルバロ@alvarosabu·
Why not a 100x agentic CEO? Can’t stand corporate bullshit anymore
Zeb Evans@DJ_CURFEW

Today we reduced headcount by 22%. The business is the strongest it's ever been. So I think it's important to be direct about what I'm seeing and why. First, I made this decision and I own it. I did it because the way to operate at the highest level of productivity is changing, and to win the future, ClickUp needs to change with it. Second, this wasn't about cutting costs. Most savings from this change will flow directly back into the people who stay. We'll be introducing million-dollar salary bands. If you create outsized impact using AI, you'll be paid outside of traditional bands. Most importantly, I have the deepest gratitude for those affected. We're doing this from a position of strength specifically so we can take care of people properly. Everyone affected receives a package aimed at honoring their contributions and easing the transition. I only see two options: wait for this to play out gradually in the market or be honest about what I'm seeing and act proactively. THE 100X ORGANIZATION The primary change is that we're restructuring around what I call 100x org. The goal is 100x output. The roles required to build at the highest level are fundamentally different than they were a year ago. Incremental improvements to existing systems won't get us there. We need new ones. That means creating enough disruption to rebuild rather than iterate on what's already broken. The common narrative is that AI makes everyone more productive. It doesn't. Many of the workflows of today, if left unchanged, create bottlenecks in AI systems. These roles will evolve. But waiting for that to happen naturally means falling behind now. The 100x org is actually heavily dependent on people - infinitely more than today. This is only possible with 10x people that have embraced and adopted new ways of working. THE BUILDERS, AGENT MANAGERS, AND FRONT-LINERS — THE BUILDERS: 10X ENGINEERS I don't think most companies have internalized what's actually happening with AI in engineering. The common narrative is that AI makes all engineers more productive. That may be true in isolation, but at an organization level - that is the farthest thing from reality. Here's what we've validated recently at ClickUp: the great engineers, the ones who can orchestrate, architect, and review, are becoming 100x engineers. They're not writing code. They're directing agents that write code. The skill is judgment. AI makes the best engineers wildly more productive, and everyone else using AI slows these engineers down. Think about it - the bottlenecks are (1) orchestration - telling AI what to do, and (2) reviewing - what AI did. Everything is leapfrogged and no longer needed. So who do you want orchestrating and reviewing code? And how do you want your best engineers to spend their time? If your best engineers are spending time reviewing other people's code, then this is inherently an inefficient bottleneck. These engineers can review their agent's code much faster than reviewing human code. The new world is about enabling your 10x engineers to become 100x. The wrong strategy is to push every engineer to use infinite tokens. Companies doing this are celebrating 500% more pull requests. But customer outcomes don't match the volume of code being generated. I call this the great reckoning of AI coding, and every company will face this soon if not already. More code is just another bottleneck to the best engineers, and ultimately to your company's impact as well. — THE BUILDERS: 10X PRODUCT MANAGERS Product management and design roles are merging. Designers that have customer focus, become more like product managers. And product managers that have intuition for UX become more like designers. The bottleneck of user research is gone. It takes us just one mention of an agent to kickoff research and analyze results. The bottleneck of product <> design iteration is also gone. The product builder iterates on their own, along with agents and skills that ensure alignment with quality and strategy. Also controversial today - I believe that the wrong strategy is to have your PMs shipping code - that just introduces another bottleneck that the best engineers will waste their time on. To be clear, PMs should be coding but they should do this in a playground to iterate, validate, and scope. That code should not go to production. Everything outside of managing systems, orchestrating AI, and reviewing output becomes a bottleneck. That's why the other roles that are critical along with these are the systems managers (to reduce bottlenecks) along with a bottleneck you can't replace - customer meeting time. — THE SYSTEM MANAGERS Ironically, the people that automate their jobs with AI will always have a job. They become owners of the AI systems - agent managers. We have many examples of these people at ClickUp. The underlying systems in which we operate are absolutely critical to get right. I think most companies are delusional to think they can iterate on existing systems and compete in this new world. You must create enough disruption so that old systems are deprecated entirely. If there's any definition for 'AI native' that's what it is. — THE FRONT-LINERS In a world that will become saturated with AI communication, the human touch will matter more than anything to customers. This is a bottleneck that you shouldn't replace - even when agents are high enough quality to do video meetings. One-on-one meeting time with customers is something that shouldn't be automated. The systems around the meetings should be - so that front-liners spend nearly 100% of their time with customers. REWARDING 100X IMPACT In a world where companies are able to do so much more with less, where does that excess money go? In our case, much of the savings in this new operating model will flow directly back to those that enabled it. We must reward people that create productivity accordingly. This aligns incentives on both sides. Plus, in a world where your best people create 100x impact, you can't afford to lose them. You should aim to retain these employees for decades. The context they have and their ability to efficiently orchestrate and review will be nearly impossible to replace. Compensation bands of today should be thrown out the door. We're introducing $1 million cash/year salary bands with a path available to nearly everyone in the company if they produce 100x impact by creating or managing AI systems. THE FUTURE Nearly every company will make changes like these. The ones that do it proactively will define what comes next. The future is not fewer people. It's different work, new roles, and better rewards for those who embrace it. We're already seeing entirely new roles emerge, like Agent Managers, that didn't exist a year ago. ClickUp is positioning to lead this shift, not just internally, but for our customers too. I've never been more certain about where we're headed.

English
50
74
2.2K
71.8K
Dmitrii Kovanikov
Dmitrii Kovanikov@ChShersh·
If you know what this is, we're automatically friends
Dmitrii Kovanikov tweet media
English
74
8
353
41.4K
OpenAI Developers
OpenAI Developers@OpenAIDevs·
Codex anywhere and everywhere, all the time. Now your Mac doesn’t have to be unlocked for Codex to use your computer. From your phone, Codex can securely use apps on your Mac, even when the screen is off and locked. #locked-use" target="_blank" rel="nofollow noopener">developers.openai.com/codex/app/comp…
OpenAI Developers tweet media
English
492
552
8.3K
2.5M
Francesco Orabona
Francesco Orabona@bremen79·
@XingyuZhou989 You should publish it. Learn from my mistakes: don't post anything new on a blog post without at least an arxiv report
English
3
0
14
1.6K
Xingyu Zhou
Xingyu Zhou@XingyuZhou989·
[1/n] Yesterday, we are excited about how AI solves the unit distance problem. Following this, I also want to share my own experience where Codex helps in my research. Of course, my problem cannot compare with the popular open problem. But, Codex really helps my understanding!
English
4
4
50
7.5K
Alex Shtoff
Alex Shtoff@AlexShtf·
@PelegReuven האמת היא שהמודלים של גוגל עד כה הם פשוט מטומטמים.
עברית
0
0
1
77