PCDVD數位科技討論區 - 瀏覽單個文章 - Nvidia被發現不支援DX12的非同步渲染功能

freaky的中文翻譯得很順暢，雖看原文會比較了解意思是什麼
裡面提到的東西，多年前就有人提及
http://pc.watch.impress.co.jp/docs/...502_598132.html
的這個部份
http://pc.watch.impress.co.jp/img/p...tml/17.png.html
讓AMD 一開始設計GCN 就採取左邊的方式
原因就個這篇文章的標題一樣為了HSA
> GCN tends to leave a lot of units idle from what I can tell, and thus it needs this sort of mechanism the most
可以明顯看出這個方式，如果不能用ACE 那就比傳統的方式多了很多idle的時間 (GPU 在等CPU排程)
大概可以解釋AMD效能差的部份原因
> Yes, I'm honestly curious what the benefits to having multiple compute kernels in parallel really are (ala AMD's >2 ACEs)...
> This is beneficial if you cannot overlap an independent graphics workload and you have multiple independent compute workloads to run, but I'm not sure how important that is in practice."
為什麼要兩個以上ACE 以及多個(非圖形)獨立運算負荷的答案很簡單，就是有可能會用到 (如HSA)
kaveri 最高可以到8個CU 或者說8個ACE(1個CU對應1個ACE)

為什麼是8個原因很簡單，就下面這個
http://www.pcdvd.com.tw/showpost.ph...6&postcount=111
HSA的模型下 CPU、GPU要達到效率高有個比例關係
現在kaveri 就核心數量上的比例是1:2，再加上執行效能的加成的話比例是1:3
> ... and if anyone starts talking about numbers of hardware queues and ACEs and whatever else you can pretty safely ignore that as marketing/fanboy nonsense that is just adding more confusion rather than useful information.
是可以這樣想沒錯，因為這東西不是為了圖形運算產生的
從圖形運算去解釋ACE本來就是non-sense，從HSA的角度看 CU(ACE)數量就是必須要考慮的因素
CU太多浪費沒有用，CU太少HSA 效能出不來

結論就是GCN 原本設計的目標不單純(也包含了HSA)，當然也間接導致GCN在傳統API下效能不高
所以AMD就弄了mantle 做技術展示，讓ACE實用化
間接讓MS生出與mantle 方向類似的DX12 也讓OpenGL 有了下一代API Vulkan

NV因為API的這項改變被拖下水，原本NV採用效率高的運算方式不像AMD這種粒度小的方式
NV在DX12 效能增加幅度可能沒AMD大