JPeg

572 posts

JPeg

JPeg

@jpeg729

Katılım Mayıs 2013
154 Takip Edilen14 Takipçiler
Jonathan Frankle
Jonathan Frankle@jefrankle·
Meet KARL, an RL'd model for document-centric tasks at frontier quality and open source cost/speed. Great for @databricks customers and scientists (77-page tech report!) As usual, this isn't just one model - it's an RL assembly line to churn out models for us and our customers 🧵
Jonathan Frankle tweet mediaJonathan Frankle tweet media
English
9
46
241
67.9K
JPeg
JPeg@jpeg729·
@HippyMomPhD I was a conceptual learner, maths was easy, so I didn't practice much and I forgot a lot. Practice is important for everybody.
English
0
0
0
11
Claire Honeycutt | ClarifiED 🕊️❤️
I'm starting to wonder if teaching conceptual math heavily is making it harder for a lot of kids - hear me out. They looked at students who were gifted in math & went "huh, they think conceptually. We should teach everyone conceptually" My youngest is gifted in math. She just "gets it" It's absolutely conceptual to her - and easy. But my oldest, nope. And no amount of me explaining the concept ever helped her. You know what did? Procedural practice ... over & over & over. Then, something kinda like magic happened. She looked at me this week and said "oh, I get it!" and she then explained to me the concept I'd tried to teach her a year ago. Don't get me wrong. Conceptual math for prek-2nd grade is great. But I'm not convinced it's the best path long-term. Conceptual learners - already "get it." The strugglers.... might just need a LOT more practice before that light bulb goes off. Thoughts?
English
324
95
1.7K
64.7K
Boris Cherny
Boris Cherny@bcherny·
I hope this was helpful! What are your tips for using Claude Code? What do you want to hear about next?
English
281
42
2.6K
327.1K
Boris Cherny
Boris Cherny@bcherny·
I'm Boris and I created Claude Code. Lots of people have asked how I use Claude Code, so I wanted to show off my setup a bit. My setup might be surprisingly vanilla! Claude Code works great out of the box, so I personally don't customize it much. There is no one correct way to use Claude Code: we intentionally build it in a way that you can use it, customize it, and hack it however you like. Each person on the Claude Code team uses it very differently. So, here goes.
English
1.3K
7K
54.2K
8M
JPeg
JPeg@jpeg729·
@alexhillman Especially if you find it easy to find hacky workarounds
English
0
0
0
2
JPeg
JPeg@jpeg729·
@alexhillman Noticing that you aren't sure and realising you should ask is a skill that isn't always easy to learn
English
1
0
0
6
📙 Alex Hillman
📙 Alex Hillman@alexhillman·
One of the ways I struggled when hiring assistants is that I would tell them: "I'd MUCH rather you ask me questions when you aren't 100% sure, than guess. I promise it's not bothering me." And they wouldn't believe me. And so they would guess and be wrong. And THEN I would be bothered cuz they wasted their time and mine. Now, if I tell my robot the same thing, it actually asks me when it doesn't understand what I want. The upside is that I've gotten a lot better at asking for what I want. But it is also still very hard to get people to be honest when they aren't sure, and believe me when I say I actually want them to "bother" me with questions and updates instead of waiting to the end.
English
3
2
14
3.2K
JPeg
JPeg@jpeg729·
@burkov Doesn't this remove the model's ability to tell whether some instruction is in the system prompt? Conversation parts are delimited with tokens, so if it can no longer tell what came from where, then prompt injection becomes easier
English
0
0
0
5
BURKOV
BURKOV@burkov·
In transformers, self-attention layers process sequences without built-in notions of order, so positional embeddings—vectors added to token representations to encode their positions—are used to provide that information. Rotary positional embeddings, or RoPE, a common type that applies rotations to queries and keys in attention, turn out to accelerate training by giving gradient descent a helpful starting bias that makes convergence happen faster, as shown through analysis of attention patterns and gradients. At the same time, keeping RoPE after training restricts the model from handling sequences longer than its original context length without further adjustments, because the rotations become unfamiliar at new positions. Removing these embeddings post-training and running a short recalibration at the original length lets the model adapt while preserving its short-context abilities, enabling it to work on much longer inputs immediately, with experiments across model scales up to billions of parameters demonstrating better retrieval and perplexity than standard scaling techniques. Read with an AI tutor: chapterpal.com/s/725ed44f/ext… Read alone: arxiv.org/pdf/2512.12167
BURKOV tweet media
English
16
15
143
8.5K
JPeg
JPeg@jpeg729·
@ChenSun92 It sounds like linear attention over pairs or triples of tokens with a fancy name and maybe more learnable bits than standard linear attention
English
0
0
0
43
Chen Sun 🤖
Chen Sun 🤖@ChenSun92·
DeepSeek's Engram succeeds where others failed in this endeavor to replace a transformer's crappy FFN with a symbolic-ish lookup table. And in the process, it reveals what I think is a truly gorgeous, monumental even, paradigm shift in our understanding of transformer capability 🌹 🚨 To begin the story, explicitly replacing an FFN with a symbolic lookup table fails catastrophically (desirable as it may be, rather than wasting training compute to do this through FFN layers) because language explodes, guaranteeing collisions and polysemy that a rigid lookup cannot resolve. Engram's cool solution to this relies on 3 complementary ingredients:  1) Learnable "Superposition" Embeddings. Because the table is co-trained rather than fixed, the optimizer learns a dense vector that mathematically represents a "superposition" of multiple concepts. It minimizes the global loss for all colliding inputs simultaneously rather than storing a single rigid value. Therefore, even though collisions are guaranteed, you can learn the superposition of the most useful memories. 2) Context-Aware Gating. This seems to be a further "fail-safe" that makes this learned hashing viable, via the dynamic gate $\alpha_t$. Even after you have retrieved the memory, it forces the backbone to check the retrieved memory against the current semantic context; if the hash returns noise (a collision) or irrelevant polysemy, the gate snaps shut ($\alpha ~ 0$), effectively filtering the signal. 3) it is placed in a middle layer: If you place it at Layer 0, you force the model to decide how much to trust the memory before it has read the rest of the sentence. But if you place it in a middle layer, it can then use its Gate ($\alpha_t$) in a way that is not simply a dictionary but rather a context-dependent memory. And here is the crux: this study reveals a monumental critical inefficiency in modern architecture: standard Transformers waste valuable sequential depth and attention capacity ... effectively simulating ... static lookup tables for local patterns. The authors demonstrate that if one simply offloads these trivial dependencies to the Engram module, the model stops "polluting" its attention heads with basic dictionary work ( it really is basic 2-3 token dictionary work), and -- makes it suddenly able to perform signficantly better on very long context tasks. It is almost as if a burden had been relieved! Crucially - this offloading was achieved not through expensive semantic retrieval, but via "dumb," deterministic hash lookups. This compels us to ask: Have we been over-engineering memory by assuming retrieval must be semantic? If a 'fractured' lexical lookup can outperform deep neural computation, should future architectures abandon the expensive vector database paradigm in favor of massive, dumb hash tables? (provided we have a smart context-aware filtering) Let me know, friends, what you think! 🧙‍♂️
Chen Sun 🤖 tweet media
Lisan al Gaib@scaling01

DeepSeek is back! "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" They introduce Engram, a module that adds an O(1) lookup-style memory based on modernized hashed N-gram embeddings Mechanistic analysis suggests Engram reduces the need for early-layer reconstruction of static patterns, making the model effectively "deeper" for the parts that matter (reasoning) Paper: github.com/deepseek-ai/En…

English
8
29
277
24.6K
JPeg
JPeg@jpeg729·
@lemire Studying maths taught me to look for counter-examples. So now when a client says, "it should always work like so", I can easily find an edge case.
English
0
0
0
18
Daniel Lemire
Daniel Lemire@lemire·
My PhD supervisor, Dubuc, once told me « You can do great things with mathematics, as long as you don't become a mathematician. » It took to me a long time to appreciate the wisdom of the his statement. There is basically an endless stream of narrow topics you can master. But most of it would not help you. Rather what helps you is to have a broad base of knowledge that you can quickly integrate. For example, most people need relatively little mathematics. Even high level software developers need little math. But mix in a bit of math, a bit of business, a bit of design, and so forth... and you can achieve great things!
Daniel Lemire tweet media
English
18
49
426
23.3K
JPeg
JPeg@jpeg729·
@PerceptualPeak What happens if a major refactor occurs and past chats are all wrong?
English
0
0
0
9
Zac
Zac@PerceptualPeak·
Guys....really not trying to glaze myself but it's crazy how effective this actually is. I just one-shotted 3 different minor additions to 3 different projects with the Smart Fork feature. Trying to apply these same additions through a fresh context window would have introduced errors and a few cycles of iteration to get right. Efficiency has been vastly enhanced. Repo coming soon.
Zac@PerceptualPeak

holy shit it fucking WORKS. SMART FORKING. My mind is genuinely blown. I HIGHLY RECCOMEND every Claude Code user implement this into their own workflows. Do you have a feature you want to implement in an existing project without re-explaining things? As we all know, the more relevant context a chat session has, the more effectively it will be able to implement your request. Why not utilize the knowledge gained from your hundreds/thousands of other Claude code sessions? Don't let that valuable context go to waste!! This is where smart forking comes into play. Invoke the /fork-detect tool and tell it what you're wanting to do. It will then run your prompt through an embedding model, cross reference the embedding with a vectorized RAG database containing every single one of your previous chat sessions (which auto updates as you continue to have more sessions). It will then return a list of the top 5 relevant chat sessions you've had relating to what you're wanting to do, assigning each a relevance score - ordering it from highest to lowest. You then pick which session you prefer to fork from, and it gives you the fork command to copy and paste into a new terminal. And boom, there you have it. Seamlessly efficient feature implementation. Happy to whip up an implementation plan & share it in a git repo if anyone is interested!

English
18
8
270
43.8K
JPeg
JPeg@jpeg729·
@iammukeshm Even better I like using the results returned by Task.WhenAll rather than using .Result on each individually. But that is just a stylistic choice
English
0
0
1
356
Mukesh Murugan
Mukesh Murugan@iammukeshm·
I see this mistake in almost every codebase I review. Developers awaiting async calls one by one: var user = await GetUserAsync(); var orders = await GetOrdersAsync(); var stats = await GetStatsAsync(); Each await waits for completion before starting the next. 3 calls at 200ms each = 600ms total. You should learn about Task.WhenAll(). var userTask = GetUserAsync(); var ordersTask = GetOrdersAsync(); var statsTask = GetStatsAsync(); await Task.WhenAll(userTask, ordersTask, statsTask); Now your total wait time = the slowest task. 600ms becomes ~200ms. I use Task.WhenAll() whenever I have: - Multiple API calls that don't depend on each other - Dashboard data fetching from different sources - Notifications going to multiple channels (email, SMS, push) - Cache invalidation across multiple keys Why I love it: - No extra threads needed, just smarter scheduling - Async I/O waits for responses, doesn't block - One simple change, massive performance gain Some lessons I learned the hard way: - Only works when tasks are truly independent - If Task B needs Task A's result, you can't parallelize - For thousands of tasks, I add SemaphoreSlim for throttling Trust me, go check your codebase right now. You'll find at least one place where this applies. Join my free .NET Web API Zero to Hero Course: codewithmukesh.com/courses/dotnet… Found this useful? Repost it to help a fellow developer. #dotnet #csharp #aspnetcore #performance #asyncawait
Mukesh Murugan tweet media
English
6
22
199
13.2K
JPeg
JPeg@jpeg729·
@AlexanderTw33ts @UltraLinx It feels a little robotic, probably because all words are shown for the same amount of time regardless of length
English
0
0
1
200
Oliur
Oliur@UltraLinx·
Can you read 900 words per minute? Try it.
English
4.8K
29.6K
212.1K
31.5M
JPeg
JPeg@jpeg729·
It seems strange to me that models must recognise system prompt tokens from positional queues alone. Rope adds position data to each token, so why not add a categorical value indicating for each token what sort of message it comes from. You might get more robust system instruction following
English
0
0
0
16
Mariusz Kurman
Mariusz Kurman@mkurman88·
@kerighan2 It seems so. I was also amazed because it indicates that it is strongly prompt-structure-independent.
English
1
0
0
21
Mariusz Kurman
Mariusz Kurman@mkurman88·
164M parameters, 145B tokens seen. Decay phase, lr 3e-4, ctx 2048, 1024 effective bs
Mariusz Kurman tweet media
English
5
0
36
5.2K
Ashpreet Bedi
Ashpreet Bedi@ashpreetbedi·
This is poor man's continuous learning: no fine-tuning, no retraining, just better system design. Code for reference: agno.link/agentos-demo Agno makes this easy. Star the repo if it helps, and follow for more agent patterns.
English
4
1
30
3.1K
Ashpreet Bedi
Ashpreet Bedi@ashpreetbedi·
Poor man's continuous learning: How to make agents better without fine-tuning or retraining. Over the last few months, I've been using a simple pattern that's made my agents noticeably more reliable and useful. It's also been the most fun I've had building in a while.
English
19
20
407
76.7K
JPeg
JPeg@jpeg729·
@nickproud Unless your db is sqlite in which case n+1 can be faster than complex joins
English
0
0
0
19
Nick Proud
Nick Proud@nickproud·
The real performance killer in #dotnet isn’t your code — it’s your database access patterns. Fix your N+1 queries before you touch anything else. #coding #data #sql
English
5
3
34
2.9K
Antonio Tokic
Antonio Tokic@antotoki·
@dustedcodes @MattParkerDev I use Neovim to develop .NET apps and would never go back to Visual Studio, Rider or VS Code. Given that, why do you think .NET is not trully cross-platform?
English
2
0
1
99
Matt Parker
Matt Parker@MattParkerDev·
🚀 Excited to announce SharpIDE - A Modern, Cross-Platform IDE for .NET! I'm thrilled to share my latest open-source project, just in time for .NET 10: SharpIDE, a brand new IDE for .NET, built with .NET and Godot! 🎉 🔗 Check it out on GitHub: github.com/MattParkerDev/… ...
English
35
61
377
86.1K
JPeg
JPeg@jpeg729·
@debasishg This works in C# too. A zero size struct as a type parameter gets optimised away by the jit
English
1
0
1
380
Debasish (দেবাশিস্) Ghosh 🇮🇳
another Rust tip as a design pattern: Zero Sized “Strategy Types” Instead of Runtime Flags Idea: Encode strategy/policy as a type parameter (usually a Zero-Sized Type) instead of a runtime enum/flag. The compiler then monomorphizes and optimizes away conditionals. Why it helps: • Each `R` produces a separate monomorphization with no runtime branch to pick strategy. • Compiler can inline through `R::round` and simplify the arithmetic further. • The ZST strategies (like `Floor`) cost zero runtime space. When to use: algorithms with a small number of variants (rounding style, hash strategy, retry policy, etc.) decided at compile time. Note: this idea is very much applicable to C++ as well. • What we call “ZST strategy types” in Rust is basically policy-based design / tag types in C++. • C++ doesn’t use the term ZST, but you can get the same effect with: • Empty structs as policies/strategies • Templates to monomorphize per-strategy • Empty Base Optimization to make them zero-size in practice
Debasish (দেবাশিস্) Ghosh 🇮🇳 tweet media
English
7
15
190
12.4K
Anton Martyniuk
Anton Martyniuk@AntonMartyniuk·
.NET 10 is one of the best releases ever. Here are the top updates 👇 .NET 10 and C# 14 were released on November 11, 2025. As a Long-Term Support (LTS) release, .NET 10 will receive three years of support until November 14, 2028. This makes it a solid choice for production applications that need long-term stability. Here are the most amazing new features in this release: 𝗖# 𝟭𝟰 • Extension Members • Null-Conditional Assignment • The Field Keyword • Lambda Parameters with Modifiers • Partial Constructors and Events 𝗙𝗶𝗹𝗲-𝗕𝗮𝘀𝗲𝗱 𝗔𝗽𝗽𝘀: • Starting with .NET 10, you can create a single *.cs file and run it directly, without solution file (sln) and project file (csproj). 𝗔𝗦𝗣.𝗡𝗘𝗧 𝗖𝗼𝗿𝗲 • Validation Support in Minimal APIs • JSON Patch Support in Minimal APIs • Server-Sent Events (SSE) • OpenAPI 3.1 Support 𝗘𝗙 𝗖𝗼𝗿𝗲 • Optional Complex Types • JSON and struct Support for Complex Types • LeftJoin and RightJoin Operators • Named Query Filters • ExecuteUpdate for JSON Columns • Regular Lambdas in ExecuteUpdate 📌 In November, more than 𝟭𝟳,𝟬𝟬𝟬 developers explored these new features in my detailed newsletter. If you missed it, you can read the full guide here: ↳ antondevtips.com/blog/new-featu… 👉 Join 𝟭𝟴,𝟬𝟬𝟬+ developers and improve your .NET and Software Architecture skills. Every subscriber also receives a PDF with 650+ exclusive .NET learning resources. —— I have already migrated most of my commercial projects to .NET 10. Have you already migrated to .NET 10, or do you have any plans for a migration? —— ♻️ Repost to help others learn about .NET 10 ➕ Follow me ( @AntonMartyniuk ) to improve your .NET and Architecture Skills.
Anton Martyniuk tweet mediaAnton Martyniuk tweet mediaAnton Martyniuk tweet mediaAnton Martyniuk tweet media
English
5
20
121
5.9K
JPeg
JPeg@jpeg729·
@Dave_DotNet We have analysers that can tell us when we have forgotten to await something, so we no longer need a clear visual indication of whether a method returns a Task or not.
English
0
0
3
410
Dave Callan | dotnet
Dave Callan | dotnet@Dave_DotNet·
Are you still adding an 'Async' suffix to your async methods in #dotnet? Why / Why not? 🤔
English
45
1
17
25.9K
JPeg
JPeg@jpeg729·
Optional things becoming required is a breaking change for writers, not for readers. Required things becoming optional is a breaking change for readers, not for writers. So you really shouldn't change optionality at all unless you also maintain all of the readers or all of the writers.
English
0
0
0
13
Milan Jovanović
Milan Jovanović@mjovanovictech·
Was API Versioning a mistake? I built and maintained a large public API (400+ endpoints). The API has dozens of integrations, serving mainly mobile applications. When your API is serving so many clients, breaking changes are expensive. So, everything I implemented on the public API had to be planned. If I wanted to introduce breaking changes, I had to version the API. This wasn't a standard I decided on, but I had to live with the consequences. API versioning allows your API to evolve independently from the clients using it - or so they say. What we actually need is change management for APIs. 1. Don't remove anything 2. Don't change business rules 3. Don't make optional things required 4. New things must be optional I explained this (and many other things) in-depth in Pragmatic REST APIs: milanjovanovic.tech/pragmatic-rest… What do you think about API versioning? P.S. I've got a Black Friday offer for my Pragmatic REST APIs course that expires soon. Don't miss it. --- Sign up for the .NET Weekly with 75K+ other engineers, and get a free Clean Architecture template: milanjovanovic.tech/templates/clea…
English
3
6
56
6.5K
JPeg
JPeg@jpeg729·
@stevekrouse The gold list method seems more interesting and less mechanical than simple spaced repetition
English
0
0
0
18
Steve Krouse
Steve Krouse@stevekrouse·
Spaced repetition gives me the ick It's like the Soylent of learning. It's a scientist's idealized form of learning, stripped of all the natural messiness that makes learning rich and beautiful Picture a mom using spaced repetition flashcards on her baby Now picture that mom speaking lovingly to her baby about whatever's on her mind as they go about their day Which world do you want to live in? Where do you think the baby is better off? The humane way to learn something is to be immersed in an environment where learning happens naturally, automatically, as a consequence of natural motivation and play Think about all your most positive learning experiences. Learning your native language. Learning to move your body at playgrounds as a child. Learning to play video games. Learning how to use a computer by messing about. All natural, without instructions, or idealized, measured doses of learning Natural learning is a beautiful human process. Spaced repetition loses all of that. We should flip our focus from learning random facts as fast as possible to crafting ENVIRONMENTS where skills and learning happen naturally Want to learn French? Go to France Want to learn math? Go to Mathland (Logo or Scratch) Want to learn programming? Go to Val Town ;)
English
54
18
422
103.6K