Instruction scheduling is a code reordering transformation used to hide latencies present in modern day microprocessors. Scheduling is often critical in achieving peak performance from these processors. The designer of a compiler's instruction scheduler has many choices to make, including the scope of scheduling, the underlying scheduling algorithm, and handling interactions between scheduling and other transformations.
List scheduling algorithms, and variants thereof, have been the dominate algorithms used by instruction schedulers for years. In this work we explore the strengths and weaknesses of this algorithm aided by the use of stochastic scheduling techniques. These new techniques we call RBF (randomized backward and forward scheduling) and iterative repair or IR. We examine how the algorithms perform in a variety of contexts, including different scheduling scope, different scheduling problem instances, different architectural features, and scheduling in the presence of register allocation.
IR is a search framework enjoying a lot of attention in the artificial intelligence community. IR scheduling techniques have shown promise on other scheduling problems such as shuttle mission scheduling. In this work we describe how to target the framework for compiler instruction scheduling. We describe the evolution of our algorithm, how to integrate register pressure concerns, and the technique's performance.
We evaluate our alternative algorithms based on a set of real applications and random instruction scheduling problems. Not surprisingly, list scheduling performs very well when scheduling basic blocks of machine instructions. However, there is some opportunity for alternative techniques when scheduling over larger scopes and targeting more complicated architectures. We describe an interesting link between list scheduling efficacy and the amount of parallelism available in a random problem instance. Increasingly we see complex microprocessors being used in embedded systems applications. Often the price of such systems is affected by the amount of on-board memory needed to store executable code for the microprocessor. Much work on instruction scheduling has improved the running time of scheduled code while sacrificing code size. Thus, finding scheduling techniques that do not increase code size is another focus of this work. We also develop techniques to decrease code size without increasing running time using genetic algorithms-another stochastic search method.