Abstract: In this paper, we introduce an environment to visualize the internal activities of superscalar processors. This seems currently to be the dominating class of processors on the market.
A programmer or a compiler can produce optimized code only with a thorough understanding of the internal structures. This usefulness of this environment is then demonstrated for two aspects of program opimization: loop unrolling in situations with cold or perfectly warmed cache and instruction ordering. We use matrix multiplication as representative example to reflect signal processing code.
Keywords: Superscalar processors, signal processing, loop-unrolling, data dependencies, instruction ordering.