引用:
|
作者zohar
沒看到這句話呢 
|
在第七頁
From the load and store buffers, memory operations proceed to access the cache hierarchy, which has been totally redone from top to bottom. As with the P4, both the caches and TLBs are dynamically shared between threads based upon observed behavior. Nehalem’s L1D cache has retained the same size and associativity (check) as the previous generation, but the latency increased from 3 to 4 cycles to accommodate timing constraints. As previously mentioned, each core can support more outstanding misses (up to 16) to take advantage of the extra memory bandwidth.
整篇英文太多了,就等對岸翻譯了。
CPU架構圖↓↓
