This paper presents a technique that makes efficient use of super scalar processor capabilities to optimize the execution of nested loop structures. By creating new global and local execution schedules, the linear dependencies inherent to the regular execution of the loop are removed and the degree of parallelism is increased. New compiler constructs allow the execution of the instructions according to the new schedule directions.