不知使用async compute最佳化有沒有一定的標準可循? 似乎還是要看架構...
底下這篇文章看了兩三次, 仍有些疑惑
Pascal Dynamic scheduling
http://www.anandtech.com/show/10325...dition-review/9
(Ryan Smith是Anandtech主編, Anandtech有不少不錯的硬體分析文)
Ryan Smith的質疑, 似乎在使用dynamic scheduling來最佳化不是那麼容易, 但Pascal確實給予了彈性, 不知有沒有人能補充
Dynamic scheduling requires a greater management of hazards that simply weren’t an issue with static scheduling, as now you need to handle everything involved with suddenly switching an SM to a different queue. Meanwhile NVIDIA more than likely paid a die space penalty for implementing dynamic scheduling. GPUs continually sit on the fence between being an ultra-fast staticly scheduled array of ALUs and an ultra-flexible somewhat smaller array of ALUs, and GPU vendors get to sit in the middle trying to figure out which side to lean towards in order to deliver the best performance for workloads that are 2-5 years down the line. It is, if you’ll pardon the pun, a careful balancing act for everyone involved.