Our prod database is of the size 1TB. Currently we have external redundancy and due to storage issues our database crashed multiple times this year. We are planning to propose a Normal redundancy at the ASM level.

My assumptions are as follows. Please correct me if i am wrong.

1. We require 3 times more storage space. ie approximately 3 tb. (Not sure if it is ok to have 2 fail-group for data disk group ? )

2. New Lun's to be created should be provided from a different storage array,

which should not arise from the existing box. These are placed in separate fail-group

3. If issue arise from one of the storage array, we should not have any problem as other two are fine

But what are the difficulties in adding the disk back to DG after the storage issue is fixed ?

( We need to discuss this with storage team; but before that i want clear my doubts)

1. We require 3 times more storage space. ie approximately 3 tb. (Not sure if it is ok to have 2 fail-group for data disk group ? )

2. New Lun's to be created should be provided from a different storage array,

which should not arise from the existing box. These are placed in separate fail-group

If you are using external redundancy then you are creating DISK level mirroring at storage level. There also you need more size for mirroring purpose. So its better to use individual disks from different storages and configure Disk Group with normal redundancy. This process of mirroring will be transparent to ASM Instance as well.

3. If issue arise from one of the storage array, we should not have any problem as other two are fine

But what are the difficulties in adding the disk back to DG after the storage issue is fixed ?

All operations will be online and there shouldn't be any difficulty.

Refer MOS tech notes:

How To Add Back An ASM Disk or Failgroup (Normal or High Redundancy) After A Transient Failure Occurred Or When The DISK_REPAIR_TIME Attribute Expired (10.1 to 12.1)? (Doc ID 946213.1)

1. We require 3 times more storage space. ie approximately 3 tb. (Not sure if it is ok to have 2 fail-group for data disk group ? )

Not for normal redundancy. High redundancy is a 3 way "mirror". Normal is 2 way.

Thus 2 fail groups for a single diskgroup.

2. New Lun's to be created should be provided from a different storage array,

which should not arise from the existing box. These are placed in separate fail-group

Correct. This is part of the ideal redundant solution. However cables and storage (fabric?) switch can also fail. So if 2 different storage arrays are used, there need to be redundant paths to these storage servers. So this mean dual port HBA/HCA cards per database server, wired to separate switches, with both storage servers being available on both switches.

3. If issue arise from one of the storage array, we should not have any problem as other two are fine

But what are the difficulties in adding the disk back to DG after the storage issue is fixed ?

You cannot change the redundancy of an existing ASM diskgroup (unless this is a recent 12c feature). So you need a migration plan to move from the existing external redundancy diskgroup's content to the new normal redundancy diskgroup. During this time 3x the storage will be needed, before the existing external redundant diskgroup 's LUNs can be handed back to the storage team as free space.

The migration likely needs a RMAN approach to move the database(s) from the external redundancy diskgroup to the normal redundancy diskgroup.

Before doing this however, some basic availability tests need to be done on the new normal redundancy diskgroup. A cable failure for example should have no impact. A switch failure should have no impact. A storage array failure should have 1 failgroup failing, with its disks marked as missing. Making that storage array available again, and then marking these disk's as online again, should work and an automated rebalance should be done by ASM to bring the missing failgroup online again.

Only after this is working, do the migration (database move) from the external redundancy diskgroup to normal redundancy diskgroup - because if this does not work, why bother with the migration at all?