Why is IPC lower than one on a modern processor?

    7703.572978 task-clock (msec)         #    0.996 CPUs utilized          
          1,575 context-switches          #    0.204 K/sec                  
             18 cpu-migrations            #    0.002 K/sec                  
         65,975 page-faults               #    0.009 M/sec                  
 25,719,058,036 cycles                    #    3.340 GHz                    
<not supported> stalled-cycles-frontend 
<not supported> stalled-cycles-backend  
 12,323,855,909 instructions              #    0.48  insns per cycle        
  2,337,484,352 branches                  #  303.429 M/sec                  
    200,227,908 branch-misses             #    8.57% of all branches        
  3,167,237,318 L1-dcache-loads           #  411.139 M/sec                  
    454,416,650 L1-dcache-load-misses     #   14.35% of all L1-dcache hits  
    326,345,389 LLC-loads                 #   42.363 M/sec                  
<not supported> LLC-load-misses:HG      

I profiled my code written with libCCC in C by perf stat. It sorts an doubly linked list which causes a lot of list traversal operations, which means that it may ask many data located from different memory addresses. However, modern processor supports pipelining of multi stages, branch prediction and out-of-order execution, so these should increase the average amount of instructions executed in the same time interval. In fact, from the analysis data, only about an instruction is processed per two cycles. What's the reasons that may cause this phenomenon?

Show source
| caching   | performance   | perf   | hardware   | cpu   2017-01-03 08:01 1 Answers

Answers ( 1 )

  1. 2017-01-03 09:01

    Your CPU is just waiting for memory, that's all. It's precisely this effect which justifies HyperThreading: modern CPU's can switch quickly enough that one core can work on two threads, executing instructions from one while the other thread is waiting on memory.

◀ Go back