
OS Dev
306 posts

OS Dev
@OSdev_
Senior Engineer @Qualcomm - Performance Engineering | Windows kernel | C/C++ | ARM64 | CPU & Memory Microarchitectures | SoC's


I started reading about Store-to-Load forwarding and now I'm exploring the entire memory subsystem microarchitecture. Currently reading about ARMv8.1-A Large System Extensions(LSE), AMBA CHI protocols(Communication protocol used between memory subsystems like cache-to-cache interconnects). Will write about this in detail soon :)








After many years, I finally got a chance to read about Clocks and Timers. I've read about LC circuits, impedance, resonance, oscillators(feedback + loop gains + inverters + amplifiers), and piezoelectric concepts in Electrical engineering. But never had a chance to actually work/understand the concept of CPU clocks or Timers because I never went in-depth into VLSI or into electronics in general. These are the same concepts used in Clocks and Timers. Quartz Crystal is a kind of LC circuit with High Q factor, which results in precise and stable frequencies. But it's not a conductor ! Quartz Crystal inside an electric field changes its polarity thus generating high voltages. When the AC voltage from the circuit hits the crystal, it physically flexes. If that AC frequency matches the crystal’s "natural" mechanical frequency, the crystal vibrates violently (resonance) and its impedance drops sharply. This is how it generates stable frequency ! The combination below is what makes a CPU clock. 1. Pierce oscillator 2. Phase locked loops (PLL) circuits(Since the oscillator just acts as a frequency selector - stable small frequency in MHz, this circuit amplifies to high frequencies) 3. Clock trees (converting sine-ish signals to digital squarish waves) Think of Timers as just counters with stable frequency thus providing the proper times for OS. I am still learning, so lots of over-simplications are there. I'll share more in upcoming tweets.



[C] Function Pointers - Everything you need to know! Function pointers are pointers that point to code. When these are dereferenced, the fetched data is treated like instructions and the CPU executes them. "" When we dereference a function pointer, the PC register in the CPU is set to the address held by the pointer. "" The full note is available here: pyjamacafe.com/posts/function…

From here, I went on deep to understand Cache internals and architecture. While reading about that I found a weird statement, "Misaligned loads are cheaper than misaligned stores." My mental model about LDR and STR was wrong. The above statement stands true. Because LDR just needs to read 2 cache lines if it's misaligned access(even better if it can find it in store buffer/queue - Store-to-Load forwarding). But STR needs to **read** 2 caches lines first, yes why read ? Because we need to preserve the existing data. Once it read/fetch it, now it needs ownership(in multi-core, cache coherence comes into picture) to modify the respective bytes and write to store buffer. Yes, to store buffer because it helps Out-of-order execution better. Later, that store buffer is flushed to cache and so on. I come from an electrical background so the mental model was very simple LDR means reads from cache(cache hit/miss) and STR means writes to a cache and nothing deeper than this.


From here, I went on deep to understand Cache internals and architecture. While reading about that I found a weird statement, "Misaligned loads are cheaper than misaligned stores." My mental model about LDR and STR was wrong. The above statement stands true. Because LDR just needs to read 2 cache lines if it's misaligned access(even better if it can find it in store buffer/queue - Store-to-Load forwarding). But STR needs to **read** 2 caches lines first, yes why read ? Because we need to preserve the existing data. Once it read/fetch it, now it needs ownership(in multi-core, cache coherence comes into picture) to modify the respective bytes and write to store buffer. Yes, to store buffer because it helps Out-of-order execution better. Later, that store buffer is flushed to cache and so on. I come from an electrical background so the mental model was very simple LDR means reads from cache(cache hit/miss) and STR means writes to a cache and nothing deeper than this.

The more I learn about CPU architecture, the more I start to believe that extraterrestrial technology exists. What do you mean by Store-to-Load forwarding :D exists.

People think CPUs run instructions one by one. Wrong. They use a pipeline: Fetch → Decode → Execute → Memory → Writeback And these happen at the same time. That’s how CPUs get fast. Not smarter. Just more parallel. #CPU #ComputerArchitecture



