There seem to be lots of articles about development of Diphthongierung, but no reasons are listed why it developed.

Why Doppelvokale aa ee are more common:

My intuition as hobby musician is that it has partly to do with singing. Long held aa ee just sound common and can fit in many points of a song while a long held oo uu ii sounds not so common/nice and is physically harder to sing. The pitch of aa and ee is nearer to "average" pitch of human voice. Have you ever heard babies singing/screaming long oo uu ii, we are physically not used to this pitches. Imo above Natürlichkeitstheorie would also suggest this reasoning. German has many dialects, so strong influence by local culture/music was likely always a major factor too. Diphthongierung was also a localized phenomenon according to above links.