I've been reading Shalev-Shwartz & Ben-David's book, "Understanding Machine Learning", which presents the PAC theory in its Part I. While the theory of PAC learnability does appear very elegant and remarkable to me, I'm not so sure about its implications on practical machine learning problems. What is the primary utility (or utilities) of PAC learnability and VC dimension?

Is it that the PAC learnability theory tells us whether a hypothesis class $\mathcal H$ is well-chosen in the sense that test error can be made small with high probability? In other words, if $\mathcal H$ is PAC learnable, then we are assured that overfitting does not occur? So, is the primary value of the PAC theory to provide guidance on choosing the hypothesis class and the training set size? In addition, the PAC theory is remarkable because the assertions above hold without knowing the distribution on the input and output alphabet of the learning task?

I'm new to machine learning. I'd appreciate any corrections/comments/answers that would help me see how the PAC theory helps in practical machine learning problems.