GasVosky的芭樂格: 小試 icc 11..

2009年4月20日星期一

小試 icc 11..

首先寫在前面的, 安裝前請先注意, 環境變數中 path 裡的東西不可以有 () 出現.. (之前裝 DirectX SDK 手賤一直按 next 後的下場..)

測試程式有兩個, 一個是 paul viola 有篇 "Rapid Object Detection using a Boosted Cascade of Simple Features" (CVPR 2001) 講的 object detection 的 implement, 一個是這個 object detection 裡用的積分圖方法, 把這個積分圖方法獨立出來特別做測試.

icc 跟 vc9 相同 (或相近) 的參數:

/O3 (vc9 是 /ox)

/Ob2 (Any Suitable)

/Ot (Favor Fast Code)

/Oy (Omit Frame Pointers)

icc 的多了:

/Og (Global optimizations)

/Qipo (Interprocedural Optimization)

/GA (Optimizatize for Windows Appilication 不是 Galaxy Angel..)

/Qft-speculationfast (Floating=point speculation)

/QaxSSE2 (Intel Core and NetBrust uArch. w/SSE2)

(AMD Barcelona 最多可以用到 /QaxSSE3, 不過這裡因為要跟 vc9 比較, 只用 SSE2.)

/QxSSE? 不能設定, 這個會先檢查是不是 intel 自家的 CPU 產品,

不是的話 compile 會過但不給你跑... 嘿嘿...

/Qprof???? (Profile Guded Optimization) 這項要關掉, 否則會跑不完...

順路一提, 這兩個 VS project 裡都有 brook+ 的檔案, 照過.... :P

首先是第二個程式, 也就是積分圖的運算速度, 有沒有 SSE 的時間差別, 第一個是用一般浮點去計算, 第二個是用 SSE2 指令集下去算, 測試用的是 AMD Phenom 9500 (Agena, 其實就是 Barcelona) 2.2Ghz full-speed, 測試圖就是下面貼的那 ORL face 那張大圖片.

vc9:

bitch 1 fucking time 266.000000 mS

bitch 2 fucking time 78.000000 mS

icc11:

bitch 1 fucking time 178.000000 mS

bitch 2 fucking time 69.000000 mS

浮點有快蠻多的, SSE 就還好而已.

再來是第一支程式, Object detection 抓的是人臉, 測試圖是 ORL face database 把所有 400 人的人臉圖合成一張 4096 x 4096 大小的大圖:

跟第一個程式不一樣的是, 這個有 OpenMP 的支援, 運作途中四顆 CPU 都是 100% full loading.

vc9: 16.75秒

icc11: 18.078秒.

不過, 這隻程式我指標用的非常兇, 以下是這兩個 code 的簡短差別...

1. (SSE bitchmark)

{

int nT = 0;

int nB = width;

for(int y=height-1; y>0; --y, ++nT, ++nB)

{

for(int x=width-1; x>0; --x, ++nT, ++nB)

{

buf[nB+1] += buf[nB] + buf[nT+1] - buf[nT];

}

2. (Adaboost Object etection)

{

float *pdS, *pdT, *pdU, *pdV, *pdW;

pdT = buf + width + 1;

pdV = buf + width;

pdU = buf + 1;

pdS = buf;

pdW = pdV + width - 1;

for(; pdT＜zoe; pdT++, pdU++, pdS++, pdV++, pdW+=width)

{

for(; pdV＜pdW; pdT++, pdU++, pdV++, pdS++)

{

*pdT += *pdU + *pdV - *pdS;

}

目前這隻程式的 optimization 冠軍仍舊是 gcc 3.4.6 cygwin 版...

近一個月 K brook+ 的 sample code 跟 tune 以前的 code 的感想是, 指標能不要用就不要用, 在新一點的 compiler 裡只會跑的更慢 -_-|||b (brook+ 裡直接不能使用指標..)

沒有留言:

張貼留言

2009年4月20日 星期一

小試 icc 11..

沒有留言:

2009年4月20日星期一