Unaligned memory accesses

I believe real hardware takes care of memory alignment by issuing two loads or stores if necessary from the same instruction in the load/store queue. The result is combined after the data gets back as part of the load/store and things continue. That doesn't seem to fit very well with m5 which assumes aligned accesses and causes problems if it receives something else. ptlsim handles things by having two different versions of every memory operation which do the low and high portions of an access and then glue it together. One problem with this method is that instead of a single memory operation "stuttering", there would be two operations taking up twice as many resources. Also, if segmentation happens at the address translation step, figuring out what part of a memory operation should be done in each step is harder to predict. Segmentation to align unaligned addresses and unalign aligned addresses.