I'm training a 3D U-Net on an EM dataset of a brain. The objective is to segment neurons in it. During the experiments, I've noticed, different random initialization of the network leads to different performances. I evaluate the performance based on mean Intersection over Union, and I observe differences as large as 5%.

I use xavier initilization with a uniform distribution and use a constant learning rate of 1e-4.