Why is $A_{\mu}A^{\mu}$ the correct term to include? I guess that it must be Gauge and Lorentz invariant so why wasn't it included in the original Lagrangian? Why is the factor of $\frac{\mu^{2}}{8\pi}$ needed?

Why does the gauge invariance not matter? seen as the Lagrangian for an EM field has to be Gauge invariant in order to conserve charge
–
user1696811Jan 28 '13 at 19:21

@user1696811: Because $A$ is not a "gauge" field, but a massive vector field. Conservation of "charge" needs not only equations for $A$, but also equations for the "charge".
–
Vladimir KalitvianskiJan 28 '13 at 19:39

The mass term in any Field Lagrangian is always the term that is quadratic in the fields and has the opposite sign wrt. the kinetic term - this is crucial.

Why, you ask? Well, suppose we forget about the current term for a minute (we want to look at the field $A$ on it's own, no currents around, to easily identify the mass.)

Working out the field equations, you will obtain something like
$$ \partial_\mu F^{\mu\nu} = \partial^2 A^\nu - \partial_\mu \partial^\nu A^\mu = -\mu^2 A^\nu.$$
In the Lorentz gauge, $\partial_\mu A^\mu = 0$, this simplifies even further to
$$\partial^2 A^\nu = -\mu^2 A^\nu.$$

Now, there are several ways to get from this equation that $\mu$ should be interpreted as the mass of the field $A^\mu$. The most intuitive way is through the usual quantum mechanical viewpoint where the energy-momentum is represented by the derivative $\hat P^\mu = i\partial^\mu$, en hence, when acting on an energy-momentum eigenstate (i.e., a plane wave $e^{ik \cdot x}$), the equation reduces to
$$p^2 A^\mu = -\mu^2 A^\mu.$$

(The sign of $\mu$ will vary according to your metric conventions. Now, from special relativity we know that $p^2 = -m^2$, giving us the interpretation of a mass.

An even more exact procedure is to quantize the field and write down the Hamiltonian in terms of creation and annihilation operators, and you will find every creation operator, apart from the regular "kinetic energy" $\hbar\omega_{\vec k}$, also adds a default quantum $\mu$ to the total energy of the system, i.e., the corresponding mass energy.

If you stop to think about it for a while, you'll see a similar thing will happen for any field Lagrangian having a quadratic term.

PS: Note that, if we had chosen the sign on the mass term differently, the mass would have come out imaginary, i.e. $\mu^2 < 0$, which usually signals big trouble for your field theory

The normalization is really a matter of convention: the prefactor in front of the kinetic term is chosen to reproduce Maxwell's equations with the right prefactors, which de facto determines the factor in the mass term.

Edit: Vladimir makes a good point: I forgot to point out that the Proca Lagrangian is not gauge invariant (in the Maxwell sense: you can't arbitrarily add terms a la $\partial_\mu f$. Try it!). However, I seem to recall that you can show that the original Proca equations of motion can be split up to the joint system of equations $$\partial ^2 A^\mu = -\mu^2 A^\mu , \quad \partial_\mu A^\mu = 0.$$ (This is not trivial: by making this assumption, a priori one might be excluding more general solutions)