정렬된 배열의 "=="는 정렬되지 않은 배열보다 빠르지 않은가?

programing

정렬된 배열의 "=="는 정렬되지 않은 배열보다 빠르지 않은가?

prostudy 2022. 4. 16. 09:42

정렬된 배열의 "=="는 정렬되지 않은 배열보다 빠르지 않은가?

참고: 중복된 것으로 추정되는 질문은 대부분 "<"와 ">" 비교와 관련이 있지만 "=" 비교는 관련이 없으며 따라서 "=" 연산자의 성능에 대한 내 질문에 대답하지 않는다.

오랫동안 나는 정렬된 배열은 정렬되지 않은 배열보다 "처리"가 더 빨라야 한다고 믿어 왔다.처음에는 "=="를 정렬된 배열에서 사용하는 것이 가지 예측이 어떻게 작용하는지 알기 때문에 정렬되지 않은 배열에서 사용하는 것보다 더 빨라야 한다고 생각했다.

정렬되지 않은 상태:

5 == 100 F
43 == 100 F
100 == 100 T
250 == 100 F
6 == 100 F
(other elements to check)

정렬된 배열:

5 == 100 F
6 == 100 F
43 == 100 F
100 == 100 T
(no need to check other elements, so all are F)

그래서 SOREDARY가 UNIGREDARY보다 빨라야 할 것 같은데, 오늘은 코드로 테스트할 헤더에 어레이 2개를 생성했는데, 분기 예측이 생각대로 되지 않는 것 같았다.

테스트할 정렬되지 않은 배열과 정렬된 배열을 생성했다.

srand(time(NULL));
int UNSORTEDARRAY[524288];
int SORTEDARRAY[sizeof(UNSORTEDARRAY)/sizeof(int)];
for(int i=0;i<sizeof(SORTEDARRAY)/sizeof(int);i++){
    SORTEDARRAY[i]=UNSORTEDARRAY[i]=rand();
}
sort(SORTEDARRAY,SORTEDARRAY+sizeof(SORTEDARRAY)/sizeof(int));
string u="const int UNSORTEDARRAY[]={";
string s="const int SORTEDARRAY[]={";
for(int i=0;i<sizeof(UNSORTEDARRAY)/sizeof(int);i++){
    u+=to_string(UNSORTEDARRAY[i])+",";
    s+=to_string(SORTEDARRAY[i])+",";
}
u.erase(u.end()-1);
s.erase(s.end()-1);
u+="};\n";
s+="};\n";
ofstream out("number.h");
string code=u+s;
out << code;
out.close();

따라서 테스트하려면 다음과 같이 값이 == RAND_MAX/2인지 계산하십시오.

#include "number.h"
int main(){
int count;
    clock_t start = clock();
    for(int i=0;i<sizeof(SORTEDARRAY)/sizeof(int);i++){
        if(SORTEDARRAY[i]==RAND_MAX/2){
            count++;
        }
    }
    printf("%f\n",(float)(clock()-start)/CLOCKS_PER_SEC);
}

3회 실행:

정렬되지 않음

0.005376
0.005239
0.005220

정렬된 배열

0.005334
0.005120
0.005223

작은 성능 차이인 것 같아 믿지 못하고 "SOREDARR[i]=RAND_MAX/2"를 "SOREDARR[i]"로 바꾸려고 했다.차이를 만들었는지 확인하기 위한 RAND_MAX/2":

정렬되지 않음

0.008407
0.008363
0.008606

정렬된 배열

0.005306
0.005227
0.005146

이번에는 큰 차이가 있다.

정렬된 배열의 "=="가 정렬되지 않은 배열보다 빠르지 않은가?만일 그렇다면, 정렬된 배열의 ">은 정렬되지 않은 배열보다 빠르지만 "=="는 그렇지 않은 이유는 무엇인가?

당장 떠오르는 것은 CPU의 분기 예측 알고리즘이다.

의 경우>비교, 정렬된 배열에서 분기 동작은 매우 일관적이다: 첫째,if조건은 항상 거짓이다. 그러면 그것은 한결같이 진실이다.이것은 심지어 가장 간단한 가지 예측과도 매우 잘 맞는다.

정렬되지 않은 배열에서>조건은 본질적으로 무작위적이기 때문에 모든 가지 예측을 좌절시킨다.

이것이 정렬된 버전을 더 빨리 만드는 것이다.

의 경우==비교, 대부분의 경우 그 조건은 거짓이며 매우 드물게 사실이다.이는 배열의 정렬 여부와 상관없이 분기 예측과 함께 잘 작동한다.그 시간들은 본질적으로 같다.

^{N.B. 댓글을 달기에는 너무 길어서 대답하는 거야.}

여기서의 효과는 바로 이 질문에 대한 풍부한 대답에서 이미 아주 상세하게 설명되어 있는 것이다.그 경우에는 분기 예측 때문에 정렬된 배열 처리가 더 빨랐다.

여기서 범인은 또 다시 가지 예측이다.그==시험은 거의 사실이 아니므로 가지 예측은 두 가지 모두에 대해 거의 같은 효과를 발휘한다.로 변경했을 때>그리고 나서 당신은 그 질문에서, 그리고 같은 이유로 그 행동을 설명하게 되었다.

도덕:

나는 정렬된 배열의 "처리"가 [ ]정렬되지 않은 배열보다 빨라야 한다고 생각한다.

왜 그런지 알아야 해.이것은 어떤 마법의 법칙도 아니고, 항상 진실도 아니다.

비교==주문과 관련하여 다음보다 작음>하게 예측하는 것. 정확하거나 부정확한 예측==중복된 값의 수와 분포에만 의존할 수 있다.

사용할 수 있다perf stat성능 카운터를 보려면...

jason@io /tmp $ lz4 -d ints | perf stat ./proc-eq >/dev/null
Successfully decoded 104824717 bytes

 Performance counter stats for './proc-eq':

       5226.932577      task-clock (msec)         #    0.953 CPUs utilized
                31      context-switches          #    0.006 K/sec
                24      cpu-migrations            #    0.005 K/sec
             3,479      page-faults               #    0.666 K/sec
    15,763,486,767      cycles                    #    3.016 GHz
     4,238,973,549      stalled-cycles-frontend   #   26.89% frontend cycles idle
   <not supported>      stalled-cycles-backend
    31,522,072,416      instructions              #    2.00  insns per cycle
                                                  #    0.13  stalled cycles per insn
     8,515,545,178      branches                  # 1629.167 M/sec
        10,261,743      branch-misses             #    0.12% of all branches

       5.483071045 seconds time elapsed

jason@io /tmp $ lz4 -d ints | sort -n | perf stat ./proc-eq >/dev/null
Successfully decoded 104824717 bytes

 Performance counter stats for './proc-eq':

       5536.031410      task-clock (msec)         #    0.348 CPUs utilized
               198      context-switches          #    0.036 K/sec
                21      cpu-migrations            #    0.004 K/sec
             3,604      page-faults               #    0.651 K/sec
    16,870,541,124      cycles                    #    3.047 GHz
     5,300,218,855      stalled-cycles-frontend   #   31.42% frontend cycles idle
   <not supported>      stalled-cycles-backend
    31,526,006,118      instructions              #    1.87  insns per cycle
                                                  #    0.17  stalled cycles per insn
     8,516,336,829      branches                  # 1538.347 M/sec
        10,980,571      branch-misses             #    0.13% of all branches

jason@io /tmp $ lz4 -d ints | perf stat ./proc-gt >/dev/null
Successfully decoded 104824717 bytes

 Performance counter stats for './proc-gt':

       5293.065703      task-clock (msec)         #    0.957 CPUs utilized
                38      context-switches          #    0.007 K/sec
                50      cpu-migrations            #    0.009 K/sec
             3,466      page-faults               #    0.655 K/sec
    15,972,451,322      cycles                    #    3.018 GHz
     4,350,726,606      stalled-cycles-frontend   #   27.24% frontend cycles idle
   <not supported>      stalled-cycles-backend
    31,537,365,299      instructions              #    1.97  insns per cycle
                                                  #    0.14  stalled cycles per insn
     8,515,606,640      branches                  # 1608.823 M/sec
        15,241,198      branch-misses             #    0.18% of all branches

       5.532285374 seconds time elapsed

jason@io /tmp $ lz4 -d ints | sort -n | perf stat ./proc-gt >/dev/null

      15.930144154 seconds time elapsed

 Performance counter stats for './proc-gt':

       5203.873321      task-clock (msec)         #    0.339 CPUs utilized
                 7      context-switches          #    0.001 K/sec
                22      cpu-migrations            #    0.004 K/sec
             3,459      page-faults               #    0.665 K/sec
    15,830,273,846      cycles                    #    3.042 GHz
     4,456,369,958      stalled-cycles-frontend   #   28.15% frontend cycles idle
   <not supported>      stalled-cycles-backend
    31,540,409,224      instructions              #    1.99  insns per cycle
                                                  #    0.14  stalled cycles per insn
     8,516,186,042      branches                  # 1636.509 M/sec
        10,205,058      branch-misses             #    0.12% of all branches

      15.365528326 seconds time elapsed

참조URL: https://stackoverflow.com/questions/32063235/is-in-sorted-array-not-faster-than-unsorted-array

'programing' 카테고리의 다른 글

활성 항목의 스타일링(vue.js) (0)	2022.04.16
메이븐 공예품이란 무엇인가? (0)	2022.04.16
포인터 붕괴에 대한 배열이란? (0)	2022.04.16
16진수에 대한 선행 0(C) 인쇄 (0)	2022.04.16
Vue JS 제공/주체 및 소품 (0)	2022.04.16

현재글정렬된 배열의 "=="는 정렬되지 않은 배열보다 빠르지 않은가?

각종 프로그래밍 정보를 다루는 블로그입니다.

python-3, react-router, TypeScript, vuejs2, vue-component, 공연, 전시, VUE, react-native, react-redux, python-2, Python, C, java, vuex, react-hooks, Reactjs, vuetify, Vue-Router, Rxjs,

Today :
Yesterday :

prostudy

정렬된 배열의 "=="는 정렬되지 않은 배열보다 빠르지 않은가?

정렬된 배열의 "=="는 정렬되지 않은 배열보다 빠르지 않은가?

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

정렬된 배열의 "=="는 정렬되지 않은 배열보다 빠르지 않은가?

정렬된 배열의 "=="는 정렬되지 않은 배열보다 빠르지 않은가?

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바