完全整合繪圖及北橋功能的單晶片處理器 Intel 32nm Sandy Bridge 已流片 :shock:

		PCDVD數位科技討論區 > 電腦硬體討論群組 > 系統組件
完全整合繪圖及北橋功能的單晶片處理器 Intel 32nm Sandy Bridge 已流片 :shock:

第7頁共12頁

« 第一

最後 »

主題工具

ProtoZohar

Master Member

加入日期: May 2009

文章: 2,405

引用:

作者davis0725

容我插一下花

快取大不代表命中率高

演算法才是重點

這點相信大家都會同意

另外，有關C2D很依賴快取一事
說真的現有所見的測試報告，雖說是拿同樣時脈不同快取大小的CPU來測
問題是不知道有沒有注意到一點，這些C2D CPU，L2管線數並不相同！！

不同的管線數，CACHE的效益可是完全不同的！

快取越大、關聯性可以越高，命中率當然也越高啊......

因為倉庫越大擺的東西越多

您是不是把命中率和什麼東西搞錯了？

__________________
眞子内親王　　　　　　　　　　　　　　　　　　　綠壩娘

此文章於 2009-07-14 11:13 AM 被 ProtoZohar 編輯.

2009-07-14, 11:12 AM #61

K8FX

Advance Member

加入日期: Feb 2005

文章: 342

感覺這像是異質多核心的開端～

NV在主版業務快撐不下去了

2009-07-17, 02:02 AM #62

chyx741021 停權中加入日期: Aug 2004 您的住址: 宜蘭文章: 845	其實我最想問的是... 啊AMD的Fusion咧?
2009-07-17, 02:15 AM #63

jackalawa

Advance Member

加入日期: Jun 2002

文章: 365

引用:

作者DeepVoice

糟糕這篇文章讓我想長篇大論起來了

首先快取是三小
我們拿算盤書中很棒的例子來做說明
今天你要去圖書館中寫報告
由於你需要大量資料所以你可能要查很多書
你發現你邊寫報告時
手上可以抓著一本書(L1 最快)
桌上可以堆三本書(L2好了慢了點)
你幹了一台推車放你桌旁裡面堆了二十本(記憶體恩慢慢慢)
圖書館整棟大樓 N本書(硬碟了)

今天你要資料首先當然看看是不是在你手上那本
(我們先假定你能看書名就知道資料在不在其中好了)
假設不是你就往前桌上看看
再不是恩你要起身到旁邊推車翻一翻了
再不是恭喜你你要殺去一層樓一層樓找了

這邊會有個疑問那我書全堆桌上或全抓手上豈不是最快
這會面臨下面的問題
1.你需要特別訂做一個超大的桌子(超大的晶片面積)
2.就算有這張桌子你花在從桌上的書堆中找出你要的書的時間增加了(快取延遲變高)
3.就算這桌子超大很有可能你...

說的很棒+++++1111111
但要很懂處理器架構的人.才會認同你所說的...
一般似懂非懂的user恐怕會在編一些歪理...
讓你有理說不清.不過我還是覺得很棒..
算是溫習吧..

__________________
海闊天空

2009-07-17, 02:18 AM #64

visionary_pcdvd

*停權中*

加入日期: Jan 2008

文章: 1,281

引用:

作者chyx741021

其實我最想問的是...
啊AMD的Fusion咧?

大概要等 AMD 的 32nm 推出後才看得到成品吧，一年半後或許有機會

2009-07-17, 07:54 AM #65

visionary_pcdvd

*停權中*

加入日期: Jan 2008

文章: 1,281

引用:

作者jackalawa

歪理？是

顯然閣下就是閣下所謂的「很懂處理器架構的人」

說穿了不過是個只會趁機自捧兼損人的貨色

此文章於 2009-07-17 08:30 AM 被 visionary_pcdvd 編輯.

2009-07-17, 08:29 AM #66

chou124

New Member

加入日期: Jun 2008

文章: 3

引用:

作者zzz333

cache認真回答:

其實cache的擊中率關係最大的不是硬體設計而是OS,一般OS會把cache規劃成3區:

1. 當一般ram用OS專用區不會cache out.
2. 不會cache out區.(這跟OS設計有關,一般都是一些緊急程式不能被cache out,ex.cache管理程式)

3. 一般cache區.
一般情況只要擊中就是由硬體自動,
但是一旦miss硬體只會產生一個中斷叫cache管理程式自己想辦法,
要cache out多少從dram哪個位置搬進多少都是由程式決定不是硬體,
OS設計好壞影響非常大.

你把 CPU 的 cache 和 OS 的 memory management 搞混了吧

2009-07-17, 09:05 AM #67

Cudacke-Dees

*停權中*

加入日期: Jun 2008

文章: 551

光是 wiki 上這一小段 cache 的說明,
應該就夠解釋要真的要比較, 不是只有大小跟速度而已.

"Multi-level caches

Another issue is the fundamental tradeoff between cache latency and hit rate. Larger caches have better hit rates but longer latency. To address this tradeoff, many computers use multiple levels of cache, with small fast caches backed up by larger slower caches.

Multi-level caches generally operate by checking the smallest Level 1 (L1) cache first; if it hits, the processor proceeds at high speed. If the smaller cache misses, the next larger cache (L2) is checked, and so on, before external memory is checked.

As the latency difference between main memory and the fastest cache has become larger, some processors have begun to utilize as many as three levels of on-chip cache. For example, the Alpha 21164 (1995) had a 96 KB on-die L3 cache, the IBM POWER4 (2001) had a 256 MB L3 cache off-chip, shared among several processors, the Itanium 2 (2003) had a 6 MB unified level 3 (L3) cache on-die, Intel's Xeon MP product code-named "Tulsa" (2006) features 16 MB of on-die L3 cache shared between two processor cores, the AMD Phenom II (2008) has up to 6 MB on-die unified L3 cache and the Intel Core i7 (2008) has an 8 MB on-die unified L3 cache that is inclusive, shared by all cores. The benefits of an L3 cache depend on the application's access patterns.

Finally, at the other end of the memory hierarchy, the CPU register file itself can be considered the smallest, fastest cache in the system, with the special characteristic that it is scheduled in software—typically by a compiler, as it allocates registers to hold values retrieved from main memory. (See especially loop nest optimization.) Register files sometimes also have hierarchy: The Cray-1 (circa 1976) had 8 address "A" and 8 scalar data "S" registers that were generally usable. There was also a set of 64 address "B" and 64 scalar data "T" registers that took longer to access, but were faster than main memory. The "B" and "T" registers were provided because the Cray-1 did not have a data cache. (The Cray-1 did, however, have an instruction cache.)

[edit] Exclusive versus inclusive

Multi-level caches introduce new design decisions. For instance, in some processors, all data in the L1 cache must also be somewhere in the L2 cache. These caches are called strictly inclusive. Other processors (like the AMD Athlon) have exclusive caches — data is guaranteed to be in at most one of the L1 and L2 caches, never in both. Still other processors (like the Intel Pentium II, III, and 4), do not require that data in the L1 cache also reside in the L2 cache, although it may often do so. There is no universally accepted name for this intermediate policy, although the term mainly inclusive has been used.[citation needed]

The advantage of exclusive caches is that they store more data. This advantage is larger when the exclusive L1 cache is comparable to the L2 cache, and diminishes if the L2 cache is many times larger than the L1 cache. When the L1 misses and the L2 hits on an access, the hitting cache line in the L2 is exchanged with a line in the L1. This exchange is quite a bit more work than just copying a line from L2 to L1, which is what an inclusive cache does.

One advantage of strictly inclusive caches is that when external devices or other processors in a multiprocessor system wish to remove a cache line from the processor, they need only have the processor check the L2 cache. In cache hierarchies which do not enforce inclusion, the L1 cache must be checked as well. As a drawback, there is a correlation between the associativities of L1 and L2 caches: if the L2 cache does not have at least as many ways as all L1 caches together, the effective associativity of the L1 caches is restricted.

Another advantage of inclusive caches is that the larger cache can use larger cache lines, which reduces the size of the secondary cache tags. (Exclusive caches require both caches to have the same size cache lines, so that cache lines can be swapped on a L1 miss, L2 hit). If the secondary cache is an order of magnitude larger than the primary, and the cache data is an order of magnitude larger than the cache tags, this tag area saved can be comparable to the incremental area needed to store the L1 cache data in the L2.

[edit] Example: the K8

To illustrate both specialization and multi-level caching, here is the cache hierarchy of the K8 core in the AMD Athlon 64 CPU.[7]

Example of hierarchy, the K8

The K8 has 4 specialized caches: an instruction cache, an instruction TLB, a data TLB, and a data cache. Each of these caches is specialized:

* The instruction cache keeps copies of 64 byte lines of memory, and fetches 16 bytes each cycle. Each byte in this cache is stored in ten bits rather than 8, with the extra bits marking the boundaries of instructions (this is an example of predecoding). The cache has only parity protection rather than ECC, because parity is smaller and any damaged data can be replaced by fresh data fetched from memory (which always has an up-to-date copy of instructions).

* The instruction TLB keeps copies of page table entries (PTEs). Each cycle's instruction fetch has its virtual address translated through this TLB into a physical address. Each entry is either 4 or 8 bytes in memory. Each of the TLBs is split into two sections, one to keep PTEs that map 4 KiB, and one to keep PTEs that map 4 MiB or 2 MiB. The split allows the fully associative match circuitry in each section to be simpler. The operating system maps different sections of the virtual address space with different size PTEs.

* The data TLB has two copies which keep identical entries. The two copies allow two data accesses per cycle to translate virtual addresses to physical addresses. Like the instruction TLB, this TLB is split into two kinds of entries.

* The data cache keeps copies of 64 byte lines of memory. It is split into 8 banks (each storing 8 KiB of data), and can fetch two 8-byte data each cycle so long as those data are in different banks. There are two copies of the tags, because each 64 byte line is spread among all 8 banks. Each tag copy handles one of the two accesses per cycle.

The K8 also has multiple-level caches. There are second-level instruction and data TLBs, which store only PTEs mapping 4 KiB. Both instruction and data caches, and the various TLBs, can fill from the large unified L2 cache. This cache is exclusive to both the L1 instruction and data caches, which means that any 8-byte line can only be in one of the L1 instruction cache, the L1 data cache, or the L2 cache. It is, however, possible for a line in the data cache to have a PTE which is also in one of the TLBs—the operating system is responsible for keeping the TLBs coherent by flushing portions of them when the page tables in memory are updated.

The K8 also caches information that is never stored in memory—prediction information. These caches are not shown in the above diagram. As is usual for this class of CPU, the K8 has fairly complex branch prediction, with tables that help predict whether branches are taken and other tables which predict the targets of branches and jumps. Some of this information is associated with instructions, in both the level 1 instruction cache and the unified secondary cache.

The K8 uses an interesting trick to store prediction information with instructions in the secondary cache. Lines in the secondary cache are protected from accidental data corruption (e.g. by an alpha particle strike) by either ECC or parity, depending on whether those lines were evicted from the data or instruction primary caches. Since the parity code takes fewer bits than the ECC code, lines from the instruction cache have a few spare bits. These bits are used to cache branch prediction information associated with those instructions. The net result is that the branch predictor has a larger effective history table, and so has better accuracy."

http://en.wikipedia.org/wiki/CPU_ca...ti-level_caches

2009-07-17, 09:21 AM #68

visionary_pcdvd

*停權中*

加入日期: Jan 2008

文章: 1,281

引用:

作者Cudacke-Dees

光是 wiki 上這一小段 cache 的說明,
應該就夠解釋要真的要比較, 不是只有大小跟速度而已...

一、要比架構當然有得談，但不管架構如何，AMD 以及 Intel 都只會把快取「越做越快」及「越做越大」...

二、雖然我看到英文就頭大，但 AMD 初登場的 L3 TLB Bug 我還沒忘記；此外 Exclusive versus inclusive 更證明 AMD 為了善用有限的快取容量，不得不採行 Exclusive 多層式快取設計...
　
　

此文章於 2009-07-17 09:58 AM 被 visionary_pcdvd 編輯.

2009-07-17, 09:54 AM #69

Cudacke-Dees

*停權中*

加入日期: Jun 2008

文章: 551

引用:

作者visionary_pcdvd

所以說結論還是,
光是 wiki 上這一小段 cache 的說明,
應該就夠解釋要真的要比較, 不是只有大小跟速度而已.

2009-07-17, 10:25 AM #70

第7頁共12頁

« 第一

最後 »

« 上一主題 | 下一主題 »

POPIN

主題工具
顯示可列印版本傳送本頁給好友

發表文章規則
您不可以發起新主題您不可以回應主題您不可以上傳附加檔案您不可以編輯您的文章 vB 代碼打開表情圖示打開 [IMG]代碼打開 HTML代碼關閉

所有的時間均為GMT +8。現在的時間是02:49 PM.

vBulletin Version 3.0.1
powered_by_vbulletin 2025。