轉自CHH論壇
原文:http://techreport.com/discussions.x/19216
Nvidia has long promoted its PhysX game physics middleware as an example of a computing problem that benefits greatly from GPU acceleration, and a number of games over the past couple of years have featured PhysX with GPU acceleration. Those games have often included extra physics effects that, when enabled without the benefit of GPU acceleration, slow frame rates to a crawl. With the help of an Nvidia GPU, though, those effects can usually be produced at fluid frame rates.
Nvidia 一直在推廣它的 PhysX 物理遊戲引擎作為 GPU 加速及其所帶來的好處。過去幾年有一些遊戲應用了 PhysX 的 GPU 加速。這些遊戲往往包括額外的物理效果,當沒有 GPU 加速下開啟此效果,幀速會變得非常緩慢。隨著Nvidia的圖形處理器的幫助,這些物理效果通常會變得順暢。
We have noted in the past that some games implement PhysX using only a single thread, leaving additional cores and hardware threads on today's fastest CPUs sitting idle. That's true despite the fact that physics solvers are inherently parallel and are highly multithreaded by nature when executing on a GPU.
在過去我們注意到一些執行 PhysX 的遊戲只使用一個CPU線程,而擁有多核與多硬件線程的 CPU 變成其餘核心閒置著。這是真的,儘管事實上 GPU 運行時物理運算器本身是具有高度的並行性和多線程的。
Now, David Kanter at RealWorld Technologies has added a new twist to the story by analyzing the execution of several PhysX games using Intel's VTune profiling tool. Kanter discovered that when GPU acceleration is disabled and PhysX calculations are being handled by the CPU, the vast majority of the code being executed uses x87 floating-point math instructions rather than SSE. Here's Kanter's summation of the problem with that fact:
如今,RealWorld Technologies 的 David Kanter 為故事展現了更多曲折之處,他通過使用英特爾 VTune 分析器工具分析了幾個 PhysX 物理遊戲。Kanter 發現,當 GPU 加速被禁用而 PhysX 物理計算由 CPU 處理時,絕大多數被運行的代碼是使用 x87 浮點運算指令,而不是極為常用的 SSE。這裡是 Kanter 對問題概括:
x87 has been deprecated for many years now, with Intel and AMD recommending the much faster SSE instructions for the last 5 years. On modern CPUs, code using SSE instructions can easily run 1.5-2X faster than similar code using x87. By using x87, PhysX diminishes the performance of CPUs, calling into question the real benefits of PhysX on a GPU.
x87 已經被廢棄了很多年,英特爾和 AMD 在過去的 5 年都建議使用更快的 SSE 指令。在現代的 CPU,使用 SSE 指令的代碼可以很容易的比使用 x87 類似代碼時運行快 1.5 - 2 倍的速度。通過使用 x87,PhysX 物理削弱了 CPU 的性能,這讓人質疑 GPU 對 PhysX 物理真正所帶來的好處。
Kanter notes that there's no technical reason not to use SSE on the PC—no need for additional mathematical precision, no justifiable requirement for x87 backward compatibility among remotely modern CPUs, no apparent technical barrier whatsoever. In fact, as he points out, Nvidia has PhysX layers that run on game consoles using the PowerPC's AltiVec instructions, which are very similar to SSE. Kanter even expects using SSE would ease development: "In the case of PhysX on the CPU, there are no significant extra costs (and frankly supporting SSE is easier than x87 anyway)."
So even single-threaded PhysX code could be roughly twice as fast as it is with very little extra effort.
Kanter 指出在技術上沒有任何阻礙,在電腦上使用 SSE指令集的 CPU 不須要額外為 x87 向後兼容的需要而付出性能損耗,更重要的是SSE並不是一個封閉的技術壁壘。事實上,他指出 NVIDIA 的 PhysX 引擎已經在遊戲機使用與 SSE 非常類似的 PowerPC 的 AltiVec 指令。Kanter 還預計使用 SSE 將使開發更容易:「在 CPU 運行 PhysX 物理的情況下,沒有明顯的額外性能損耗(坦白地說支持 SSE 反正是比 x87 容易)。」
因此,即使單線程 PhysX 物理代碼也可以很容易的實現兩倍甚至四倍的速度。
Between the lack of multithreading and the predominance of x87 instructions, the PC version of Nvidia's PhysX middleware would seem to be, at best, extremely poorly optimized, and at worst, made slow through willful neglect. Nvidia, of course, is free to engage in such neglect, but there are consequences to be paid for doing so. Here's how Kanter sums it up:
在缺乏多線程和已經老掉牙的 x87 指令集支持下,PC 版本的 NVIDIA的 PhysX 物理引擎似乎是連最簡單的優化都沒有做,用更壞點的揣測,它是通過故意的措施造成CPU運行緩慢。 Nvidia,當然可以繼續這種做法,但是要為此付出代價的。以下是 Kanter 的總結:
The bottom line is that Nvidia is free to hobble PhysX on the CPU by using single threaded x87 code if they wish. That choice, however, does not benefit developers or consumers though, and casts substantial doubts on the purported performance advantages of running PhysX on a GPU, rather than a CPU.
Nvidia當然可以繼續運行 PhysX 物理時用單線程的 x87 代碼來拖慢 CPU,如果這是他們想要的。然而這種選擇將不利於開發商或者消費者,使人們對 GPU 對比 CPU 運行 PhysX 物理時的性能優勢產生懷疑。
Indeed. The PhysX logo is intended as a selling point for games taking full advantage of Nvidia hardware, but it now may take on a stronger meaning: intentionally slow on everything else.
事實上。PhysX 引擎的商標本來就是作為NVIDIA 硬件在遊戲應用上的一個很大的賣點,但它現在可能暗示著另一個更重要的含義:在其他一切(非它自己的)硬件上故意拖慢。
----------------------------

這就是為啥NV要砲Intel的原因嗎