Loading from &8000 would load the value &02FE05AA. From &8008, it would load the value &FF00E039.
What about if you loaded from &8002?

The result depends not only on the ARM processor family, but also the specific implementation. The most likely result on older (pre-Cortex) processors is the value &05AA02FE (the normal word is loaded, and the result shifted according to the value of the bottom two bits of the address. On later ARMs (Cortex), the request is broken down into non-atomic byte loads which produce the expected result (&05AA3291 - on a little endian system).

However, the situation is considerably more complicated than this.

Contents

ARM family differences

ARM v5 and earlier

The Load Register instruction performs a rotated load where the lower two bits of the address specified are used as an input the the shifter: If the address is %xxx01 the result will be ROR #8, if %xxx10 the result will be ROR #16, and if it is %xxx11 the result is ROR #24.
This can be synthesised with the following code:

Using the Load Halfword instruction (ARMv4 or later) with an address where the bottom bit is set is considered unpredictable.

Others

You will need to refer to the relevant descriptions for instructions such as Load Coprocessor or Load Double; however in general either the lower bits are ignored, or the instruction must work on correctly aligned data.
In the case of LDRD, the specified register must be even numbered, and the address must be doubleword aligned - every other permutation is unpredictable or undefined.

ARM v6

The ARM v6 family supports the ARM v7 behaviour, but incorporates a configuration option in the system control register to select between this and older-style behaviour.

ARM v7

In general, the ARM v7 will perform the logically expected operation by breaking the memory access down into multiple memory accesses in order to load the expected data. There will be time penalties in this, not only in accessing multiple memory locations, but also in the case where cache reloads are required, page boundaries are crossed, etc.

Gotchas!

Plenty.

On older ARMs (ARM v5 and earlier), when a Data Abort happened as a result of the unaligned access being faulted, it was implementation defined as to what exactly would be in the "base register". This means, if the register supplying the address has writeback enabled, it was up to the specific implementation to determine whether the register, at point of abort, would contain the original contents, or the updated contents.

The above does not happen on later ARMs (v6+) as it is specified that the base register will not have been updated if an abort is raised.

While the ARM's behaviour is (varied but) predictable, whether or not unaligned accesses 'work' or get faulted is not up to the ARM. It is up to the MMU. The standard ARM MMU has an option to trap unaligned accesses. Some ARM implementations may contain simpler MMUs which do not raise an abort for an (unsupported) unaligned memory access. You should be fairly safe if you are using an ARM SoC such as one of the OMAP family, however if you are using a simpler lower-specified ARM based microcontroller, you will need to consult the datasheet - assuming you can find out, the (Cortex-M0 based) LPC11U2X datasheet doesn't mention the word "abort", not once. Ditto the ARM9 Mini-SoC LPC2939...

Multiple memory accesses on ARM v6+ are not guaranteed atomic. This means interrupts can occur between the accesses; plus data aborts can occur with either part of the access.

Summary

In short, it is safer to just not use unaligned memory accesses. The pain of needing to know the particular cases of all the different types of ARM your code may run upon is surely greater than using multiple LDRBs for cases where data cannot be held on word aligned boundaries...