PCDVD數位科技討論區
PCDVD數位科技討論區   註冊 常見問題 標記討論區為已讀

回到   PCDVD數位科技討論區 > 電腦硬體討論群組 > 系統組件
帳戶
密碼
 

回應
 
主題工具
sxs112.tw
Elite Member
 
sxs112.tw的大頭照
 

加入日期: Aug 2001
文章: 12,393
INQ披露Merom的詳細訊息,結合P-M和P4優點的全新設計

Intel Merom is designed from the ground up

Intel Developer Forum Merom wasn't built in a day

By Charlie Demerjian in San Francisco: 星期二 23 八月 2005, 17:39

INTEL IS FINALLY TALKING about its new architecture, the Merom family of chips. While these chips are often characterised as a Pentium M derived, or tweaked versions thereof. They are not, not even close. Merom is ground up new, from the philosophy to the architecture, but it carries on with the familiar technologies from the previous P4 and PMs. It is going to be good.

The first thing you notice is that Intel has abandoned the long pipe, high speed, lower IPC model that was the norm for the last few years. Merom just about halves the pipeline length. It is now 14 stages, but whether that is 14 critical stages, or 14 overall was not stated. Either way, it is going to give up a lot of MHz to the Pentium 4, but will end up faster anyway.

The basic structure is a four issue wide core, without going into specifics, they said the Merom cores can keep up a sustained 4 ops from issue to retire. This probably has a bunch of caveats, addenda and asterisks, but it is clear that wider is the course for the day. Each pipe is a full pipe versus the old P6 derived simple and complex pipe structures. The number of ALU ports are greatly increased also.

The family picks up a lot from the previous Yonah architecture, the dual core, shared L2 and low power architectures. It also picks up some of the baggage like the long in tooth FSB and stronger integer performance than floating point. Everything is new, even if it looks similar.

Lets look at these things in a little greater detail. First is the shared L2, something that debuted with Yonah. This carries forward to Merom, but there are some important differences. Since it was not an addition to the architecture, but there from the first day, you can make assumptions based on it. One of those is a direct L1 to L1 link to cut down the time needed to snoop the cache. Since it cuts out two L2s and a bus traversal, it can cut the time down to one third of what it took the 'old way'. It may not do much between sockets, that is what some of the Blackford chipset enhancements are for, but it will make a significant difference.

The cache is fed through two new prefetch algorithms, which are bandwidth aware. It was not stated outright, but it looks like one is for L1 and the other are L2. They can change their modes depending on how much bandwidth is available, it is the next step of speculative prefetch.

Along the same lines, Intel has a technology called Memory Disambiguation. The fancy words can be translated to English as 'we check dependencies on retire, not on load'. Combined with the speculative loading, it can do a lot for keeping stalls from happening and raise IPC.

One of the tricks that Banias brought to the forefront was Micro-Op Fusion, basically the ability to gang multiple decoded operations into one single. Merom takes this much farther, and adds a more sophisticated version to the mix. In addition, Merom has Macro-Op fusion, the ability to gang x86 operations before decode. As an example, if you have a multiply followed by an add, Macro-Op fusion can turn that into a Multiply and Accumulate. Again, this simplifies the complex process of x86 execution and again increases IPC.

Then comes power savings, which is what this family was designed to to. Pentium 4s are pretty much on all the time they are powered up, and if you need to cook eggs, this is your chip. Banias/Dothan took power savings seriously, and allowed the chip to power down units that were not in use. This was a massive power savings.

Merom goes well beyond this, all units are powered down in the default state. When units are needed , they are powered up, and the chip takes power savings to a new level entirely. The unit power up takes a few clock cycles, and again, while exact numbers are classified, it is more than one, less than 10 in most cases. This depends greatly on processor power state, but it should not be all that noticable.

On the baggage side, the lower integer performance is more due to the shorter pipe length, and it looks like Merom cores will be faster than Opteron+'s in int, but lose a little to them in FP, quite the change.

A lot of this is due to bandwith to the cores, and that is the weakest link for Merom. They keep the current infrastructure, can keep the chipsets, and keep the FSB. The target for Woodcrest, the server version of Merom is a 1333MHz FSB. The quad core MCM Clovertown will drop down to 1066, and Conroe will sit on 1066 also. I think that Conroe will end up on a 1333, but officially, it isn't. Merom will be lower due to power constraints.

How much power does it take? Merom is listed at 35W TDP, with a 1-2W average consumption. Intel is supposed to be binning on power consumption as well as power, so the higher speeds may end up to actually use less power. Conroe sits at 65W for the desktop, and Woodcrest is at 80W. Conroe and Woodcrest are substantial improvements over their predecessors, and Merom is slightly higher outright, but vastly more efficient as far as performance per watt is concerned. It should end up more efficient overall because it can do more in less time more efficiently, but I will wait for samples before I say that for sure.

How fast are they? Well as far as raw clock speeds, Merom will be in the low 2Ghz range, Conroe and Woodcrest in the 2.5-3GHz range and Clovertown a couple of bins down from Woodcrest. Clock for clock, look for a 30% improvement. This chip is going to give AMD quite the run for its money. µ
=================================================================

要點:

1. 14條管線
2. 4 wide issue,每條管線都是全功能的,而不是像P6那樣分成簡單和複雜管線,當然ALU數目也要大增。
3. 雙核心,共用L2,核心間與L1直接連接。
4. 根據可用頻寬應用兩種不同的預取演算法
5. Memory Disambiguation + 投機式記憶體讀取。
6. 加強的Micro-ops fusion(INQ聲稱可以將一個相鄰的MUL和ADD指令變成一個MAC指令 )
7. 默認情況下,所有單元都處於關閉狀態,僅在需要時啟動,以減少待機功耗。
8. 預計整數性能將大大超過Opteron+,浮點略微落後,限制性能的是FSB頻率,Merom將是667Mhz,並與Yonah平台相容。Conroe 1066,Woodcrest 1333MHz。
9. 最大功耗35W,平均功耗1-2W。Conroe 65W,Woodcrest 80W。
10. Merom頻率從2G起跳,Conroe和Woodcrest大概是2.5-3G。同時脈性能相對於Yonah將提高30%。

感覺INQ把他神話了.......到時候是不是這麼一回事..明年春季IDF就可以知道了

INQ
     
      

此文章於 2005-08-24 02:25 AM 被 sxs112.tw 編輯.
舊 2005-08-24, 02:19 AM #1
回應時引用此文章
sxs112.tw離線中  
cpusocket
Master Member
 
cpusocket的大頭照
 

加入日期: Dec 2001
文章: 2,321
誰知道???媒體,哼!!

P4 只看到 HT 而已
要是直接把 P-M 弄到桌機上,
再來個多 U 支援,搞個 PMMP,
好像比較有看頭!!
 
__________________

GV+Club
舊 2005-08-24, 05:30 AM #2
回應時引用此文章
cpusocket離線中  
gtr32ae101
Junior Member
 

加入日期: Sep 2003
文章: 919
****消息?哈!.Intel開支票都開的很晚.這東西要實現要可能等2007了.噴火龍品種要賣到2006
HT那是啥東西?.雙核都起步了.一落下去P4跟K8的朋友通通殺個眼紅.
舊 2005-08-24, 08:01 AM #3
回應時引用此文章
gtr32ae101離線中  
sxs112.tw
Elite Member
 
sxs112.tw的大頭照
 

加入日期: Aug 2001
文章: 12,393
目前就IDF發表的東西來看....似乎都是真實的...









舊 2005-08-24, 09:23 AM #4
回應時引用此文章
sxs112.tw離線中  
藍鯨
Advance Member
 

加入日期: Jan 2002
文章: 437
想當初Northwood發展的比AthlonXP強勢,Prescott的消息剛出來的時候我也是很期待的...不過最後...
舊 2005-08-24, 10:14 AM #5
回應時引用此文章
藍鯨離線中  
sxs112.tw
Elite Member
 
sxs112.tw的大頭照
 

加入日期: Aug 2001
文章: 12,393
引用:
作者藍鯨
想當初Northwood發展的比AthlonXP強勢,Prescott的消息剛出來的時候我也是很期待的...不過最後...


如果解決了90nm的漏電問題....Prescott還不算太差
舊 2005-08-24, 12:10 PM #6
回應時引用此文章
sxs112.tw離線中  


回應


POPIN
主題工具

發表文章規則
不可以發起新主題
不可以回應主題
不可以上傳附加檔案
不可以編輯您的文章

vB 代碼打開
[IMG]代碼打開
HTML代碼關閉



所有的時間均為GMT +8。 現在的時間是12:59 PM.


vBulletin Version 3.0.1
powered_by_vbulletin 2026。