Information Theory - Shannon's Self-Information units

I'm familiar with information and coding theory, and do know that the units of Shannon information content (-log_2(P(A))) are "bits". Where "bit" is a "binary digit", or a "storage device that has two stable states".

But, can someone rigorously prove that the units are actually "bits"? Or we should only accept it as a definition and then justify it with coding examples.