Here's the schematic of what you are intending I believe. Add a DPDT switch to disconnect D0/D1 for reliable code downloading.

For 8x8, need 64 LED across, and twice as many rows.I think a Mega has enough IO for that. Or use SPI.transfer's into 8 shift registers for the 64 across, and 4 more IO for the transistors - which will then need to sink1.5A each. (just 11 pins!)