2
2 ECE369 “toupper” :converts any lowercase characters (with ASCII codes between 97 and 122) in the null-terminated argument string to uppercase. Assume that this function is called with a string that contains exactly 100 lowercase letters, followed by the null terminator. toupper: lb $t2, 0($a0) beq $t2, $0, exit # Stop at end of string blt $t2, 97, next # Not lowercase bgt $t2, 122, next # Not lowercase sub $t2, $t2, 32 # Convert to uppercase sb $t2, 0($a0) # Store back in string next: addi $a0, $a0, 1 j toupper exit: jr $ra (a) How many instructions would be executed for this function?

3
3 ECE369 “toupper” :converts any lowercase characters (with ASCII codes between 97 and 122) in the null-terminated argument string to uppercase. Assume that this function is called with a string that contains exactly 100 lowercase letters, followed by the null terminator. toupper: lb $t2, 0($a0) beq $t2, $0, exit # Stop at end of string blt $t2, 97, next # Not lowercase bgt $t2, 122, next # Not lowercase sub $t2, $t2, 32 # Convert to uppercase sb $t2, 0($a0) # Store back in string next: addi $a0, $a0, 1 j toupper exit: jr $ra (b) Assume that we implement a single-cycle processor, with a cycle time of 8ns. How much time would be needed to execute the function?

4
4 ECE369 “toupper” :converts any lowercase characters (with ASCII codes between 97 and 122) in the null-terminated argument string to uppercase. toupper: lb $t2, 0($a0) beq $t2, $0, exit # Stop at end of string blt $t2, 97, next # Not lowercase bgt $t2, 122, next # Not lowercase sub $t2, $t2, 32 # Convert to uppercase sb $t2, 0($a0) # Store back in string next: addi $a0, $a0, 1 j toupper exit: jr $ra (c) assume a 5-stage pipeline, each stage takes one clock cycle, register file can be read and written in the same cycle, forwarding is done whenever possible, and stalls are inserted otherwise, branches are resolved in the ID stage and are predicted correctly 90% of the time, jump instructions are fully pipelined, so no stalls or flushes are needed, how many total cycles are needed for the “toupper” with these assumptions?

5
5 ECE369 (d) If the cycle time of the pipelined machine is 2ns, how would its performance compare to that of the single cycle processor from Part (b)?

6
6 ECE369 Solution (a) loop will be executed 100 times. three more instructions are executed to process the final null character. That’s 803 instructions!

8
8 ECE369 Solution (c) The “base” number of cycles needed, assuming no stalls or flushes, would be four cycles to fill the pipelineand 803 more cycles to complete the function, for a total of 807 cycles. However, the given code also includes 301 branches (three for each of the first 100 loop iterations, and one for the end of the string). If 10% of these are mispredicted and require one instruction to be flushed, there will be a total of 0.10 x 301 x 1 = 30 wasted cycles for flushes. There is also a hazard between the “lb” instruction and the “beq” right after it. If we assume forwarding is done whenever possible and that registers can be written and read in the same cycle, we’d still need to insert a two-cycle stall between these instructions, since branches are determined in the ID stage. The “lb/beq” sequence is executed 101 times, so we need a total of 202 stall cycles. The total number of cycles would then be 807 + 30 + 202 = 1039.

9
9 ECE369 Solution (d) At 2 ns per cycle, the pipelined system will require 1039 x 2 = 2078 ns. This is a performance improvement of 6424/2078, or roughly three times!