Mark Simithraaratchy

1.9K posts

Mark Simithraaratchy banner
Mark Simithraaratchy

Mark Simithraaratchy

@marksimi

Machine Learning Engineering mgmt | Meta Alum Here for growth in people and systems

Brooklyn, NY Beigetreten Mayıs 2010
660 Folgt350 Follower
Angehefteter Tweet
Mark Simithraaratchy
Mark Simithraaratchy@marksimi·
I love forecasting (like this excellent thread on how a tough market may impact eng adjacent roles). But even great perspective can kick up imposter syndrome or a loss of agency, like "Are less hands-on EMs totally screwed?" Getting back to right action takes woo and wisdom.
Gergely Orosz@GergelyOrosz

Prediction: a lot less engineers will consider moving to engineering manager positions the next few years. Going from engineer to EM was almost a no-brainer until now. Not much downside, but a lot of career upside (and some compensation upside). Now it's a lot more career risk.

English
1
1
3
659
Michael Zimmermann
Michael Zimmermann@zimm3rmann·
Is 16+gb/mo a normal amount of telemetry? Can you not do any local compute of “get hot” or “get cold” with a multi core processor and multiple gigabytes of memory? Can’t just repeat the previous nights settings? It’s bad enough that you slapped a $200/yr subscription on things, worse that it doesn’t work at all without internet.
Michael Zimmermann tweet media
English
218
178
4.7K
3.1M
Matteo Franceschetti
Matteo Franceschetti@m_franceschetti·
The AWS outage has impacted some of our users since last night, disrupting their sleep. That is not the experience we want to provide and I want to apologize for it. We are taking two main actions: 1) We are restoring all the features as AWS comes back. All devices are currently working, with some experiencing data processing delays. 2) We are currently outage-proofing your Pod experience and we will be working tonight-24/7 until that is done. More updates soon.
English
673
197
4.8K
7.9M
Mark Simithraaratchy
Mark Simithraaratchy@marksimi·
@KevinEspiritu Super courageous video. Bravo, man. Wanted to mention "The Body Keeps the Score" in the chance of it offering some healing (as it has for me). Wishing you the best for these next steps, Kevin! 💪
English
0
0
2
238
Kevin → Plant Daddy
Kevin → Plant Daddy@KevinEspiritu·
I'm moving out of my urban homestead and turning it into a company office I almost never share personal life stuff in my YouTube videos as it's not the point of our channels... But in this video on my 2nd channel, I opened up about the extreme mental & physical struggles I've had the last 12-15mo as they are part of why I've decided to leave the house I transformed over the last 5 years It's not even 12hrs since release and already over 1,000 comments from viewers expressing support, many of whom I've talked to in DM at some point over the years It's gratifying to hear back from people you have in some small way helped through a journey of their own...one of the best parts of being a creator 💚 Now I just need to find a new place to live...
Kevin → Plant Daddy tweet media
English
30
0
215
15.9K
Richard Oliver Bray
Richard Oliver Bray@RichOBray·
What's stopping you from coding like this?
Richard Oliver Bray tweet media
English
1
0
5
127
Mark Simithraaratchy
Mark Simithraaratchy@marksimi·
@RichOBray @Dominus_Kelvin @warpdotdev Thanks for LMK! I'm specifically after Warp's editor-like features while using CC. There's no problem getting that while in Docker cont or any subshell. Via their docs I can get the Warpify modal to pop up, but it won’t work. Will keep searching or file a fr/issue🤞
English
0
0
1
46
K.O.O
K.O.O@Dominus_Kelvin·
I am curious if you use Claude Code why haven't you tried @warpdotdev's ADE. You do know it supports multiple LLMs and already rocks by being the best terminal in
English
1
0
2
655
Mark Simithraaratchy
Mark Simithraaratchy@marksimi·
@RichOBray @Dominus_Kelvin @warpdotdev did you manage to get the Warpified terminal going when running claude code? every tutorial i've gone thru has been busto and leaves me the vanilla claude code treatment. I need my warpify..!
English
1
0
1
43
Richard Oliver Bray
Richard Oliver Bray@RichOBray·
I use opencode with warp mode than Claude code. For me sometimes I just want to use a cheaper model like Kimo K2 that isn't supported by warp at the moment or local models if I'm offline. I love using wap for when I'm debugging a Linux server I've ssh'ed into. Warpify is amazing. But for simple tasks, I jump to cheaper or local models.
English
1
0
3
122
Mark Simithraaratchy
Mark Simithraaratchy@marksimi·
@TheEthanDing curious if you managed to get the Warpified terminal going when running claude code? every tutorial i've gone thru has been busto and leaves me the vanilla claude code treatment
English
0
0
0
27
ethan ding 📊
ethan ding 📊@TheEthanDing·
opening up warp on my home computer and asking it to install claude code immediately feels like opening up safari to install chrome feels like cheating
ethan ding 📊 tweet media
English
3
1
10
1.3K
Mark Simithraaratchy
Mark Simithraaratchy@marksimi·
@ramigh Did you figured out how to 'warpify' your warp terminal while using Claude Code? Claude Code is great...but I miss the bells and whistles of the Warp terminal.
English
0
0
0
69
Brian Sunter
Brian Sunter@Bsunter·
I made the ultimate pizza planner app. I've been making homemade pizza for the past year and this is the result of my research and experiments. I used it to plan a pizza party yesterday and it turned out really well! pizzaplan.briansunter.com
Brian Sunter tweet media
English
5
0
8
417
Kevin → Plant Daddy
Kevin → Plant Daddy@KevinEspiritu·
10 years ago I started a little hydroponic gardening blog Today, we're days away from launching our own line of Epic Gardening seeds 🥹
Kevin → Plant Daddy tweet media
English
36
6
255
10K
Mark Simithraaratchy
Mark Simithraaratchy@marksimi·
@Bsunter Very cool, Brian. Never heard of FFMI until you'd mentioned. Always fun to look at projections. As my trainer says: big boi stuff!
English
0
0
0
63
Brian Sunter
Brian Sunter@Bsunter·
Running some projections on how long it would take to reach my theoretical "muscular potential" (ffmi of 25) at my current rate of muscle growth (~0.83 lbs/month) - 6 years.
Brian Sunter tweet media
English
2
0
4
399
Max Woolf
Max Woolf@minimaxir·
finally 😄
Max Woolf tweet media
English
11
0
147
14.1K
Max ⛅
Max ⛅@maxisawesome538·
don't ever say databricks doesn't build 😤
English
4
1
33
1.6K
Brian Sunter
Brian Sunter@Bsunter·
New DEXA scan results are in! I gained 5.85 lbs (2.65kg) of muscle and 0.79 lbs (0.36kg) fat over the course of 10 months. Even though I gained weight, my bodyfat % went down slightly, from 22.7 to 22.2. Relatively modest results, but happy overall to be consistently gaining ~0.5 lbs of muscle per month, while only training 2x per week in the gym.
Brian Sunter tweet media
English
2
0
5
336
Kevin → Plant Daddy
Kevin → Plant Daddy@KevinEspiritu·
The fact that people still think apples are better than pears is honestly embarrassing Elevate your palate peasants 🍐 > 🍏
English
50
3
123
10K
Sara Hooker
Sara Hooker@sarahookr·
Seems timely to remind that this is one of the reasons training compute thresholds are limited as a proxy of ability and risk. Inference time techniques can dramatically improve ability but aren't captured by any current policies.
Jim Fan@DrJimFan

OpenAI Strawberry (o1) is out! We are finally seeing the paradigm of inference-time scaling popularized and deployed in production. As Sutton said in the Bitter Lesson, there're only 2 techniques that scale indefinitely with compute: learning & search. It's time to shift focus to the latter. 1. You don't need a huge model to perform reasoning. Lots of parameters are dedicated to memorizing facts, in order to perform well in benchmarks like trivia QA. It is possible to factor out reasoning from knowledge, i.e. a small "reasoning core" that knows how to call tools like browser and code verifier. Pre-training compute may be decreased. 2. A huge amount of compute is shifted to serving inference instead of pre/post-training. LLMs are text-based simulators. By rolling out many possible strategies and scenarios in the simulator, the model will eventually converge to good solutions. The process is a well-studied problem like AlphaGo's monte carlo tree search (MCTS). 3. OpenAI must have figured out the inference scaling law a long time ago, which academia is just recently discovering. Two papers came out on Arxiv a week apart last month: - Large Language Monkeys: Scaling Inference Compute with Repeated Sampling. Brown et al. finds that DeepSeek-Coder increases from 15.9% with one sample to 56% with 250 samples on SWE-Bench, beating Sonnet-3.5. - Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. Snell et al. finds that PaLM 2-S beats a 14x larger model on MATH with test-time search. 4. Productionizing o1 is much harder than nailing the academic benchmarks. For reasoning problems in the wild, how to decide when to stop searching? What's the reward function? Success criterion? When to call tools like code interpreter in the loop? How to factor in the compute cost of those CPU processes? Their research post didn't share much. 5. Strawberry easily becomes a data flywheel. If the answer is correct, the entire search trace becomes a mini dataset of training examples, which contain both positive and negative rewards. This in turn improves the reasoning core for future versions of GPT, similar to how AlphaGo’s value network — used to evaluate quality of each board position — improves as MCTS generates more and more refined training data.

English
3
19
165
17.4K
Mark Simithraaratchy
Mark Simithraaratchy@marksimi·
@DrJimFan Highlights for building intuition on the impact of scaling up inference from @polynoamial: - 10:58: "...if you add search to the poker bot, it's equivalent to scaling up the model by 100,000x" - 13:35: "...it's augmenting your model capacity and amortizing the cost of training."
English
0
0
0
19
Mark Simithraaratchy
Mark Simithraaratchy@marksimi·
@DrJimFan While the idea hasn't necessarily been popular, it's been discussed in the research community for a while now with huge impacts in gaming domains (with the LLM research path laid out). My fav resource is Noam Brown chatting with Imbue in Feb 2023: imbue.com/podcast/2023-0…
English
1
0
0
22
Jim Fan
Jim Fan@DrJimFan·
OpenAI Strawberry (o1) is out! We are finally seeing the paradigm of inference-time scaling popularized and deployed in production. As Sutton said in the Bitter Lesson, there're only 2 techniques that scale indefinitely with compute: learning & search. It's time to shift focus to the latter. 1. You don't need a huge model to perform reasoning. Lots of parameters are dedicated to memorizing facts, in order to perform well in benchmarks like trivia QA. It is possible to factor out reasoning from knowledge, i.e. a small "reasoning core" that knows how to call tools like browser and code verifier. Pre-training compute may be decreased. 2. A huge amount of compute is shifted to serving inference instead of pre/post-training. LLMs are text-based simulators. By rolling out many possible strategies and scenarios in the simulator, the model will eventually converge to good solutions. The process is a well-studied problem like AlphaGo's monte carlo tree search (MCTS). 3. OpenAI must have figured out the inference scaling law a long time ago, which academia is just recently discovering. Two papers came out on Arxiv a week apart last month: - Large Language Monkeys: Scaling Inference Compute with Repeated Sampling. Brown et al. finds that DeepSeek-Coder increases from 15.9% with one sample to 56% with 250 samples on SWE-Bench, beating Sonnet-3.5. - Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. Snell et al. finds that PaLM 2-S beats a 14x larger model on MATH with test-time search. 4. Productionizing o1 is much harder than nailing the academic benchmarks. For reasoning problems in the wild, how to decide when to stop searching? What's the reward function? Success criterion? When to call tools like code interpreter in the loop? How to factor in the compute cost of those CPU processes? Their research post didn't share much. 5. Strawberry easily becomes a data flywheel. If the answer is correct, the entire search trace becomes a mini dataset of training examples, which contain both positive and negative rewards. This in turn improves the reasoning core for future versions of GPT, similar to how AlphaGo’s value network — used to evaluate quality of each board position — improves as MCTS generates more and more refined training data.
Jim Fan tweet media
English
135
1.1K
6.1K
799.2K
Mark Simithraaratchy retweetet
Harry Stebbings
Harry Stebbings@HarryStebbings·
I have been fortunate enough to invest in 13 unicorn founders over the last 10 years and 12 shared one core trait: They excelled at video games in their early years. @aidangomez 👇 on why video games make for better entrepreneurs.
English
11
10
75
25.5K