1. I can't debug your proof from afar. Write the likelihood of the data explicitly (based on the pdf of the normal distribution). Remove constants that do not depend on a - this is what you get.

2. You can calculate them simply apply the algorithms we learned in class (pick some parameters for the points). In that year, there was slightly more focus on that subject that made this question easier, but it's completely solvable.

3. If I recall, this is answered in some other exam's solution - am I wrong?

I disagree about ax. I think it changes the dinamic of "how soft is the margin" - similar to changing c. (because the Xi_i don't scale with the sampling points.)
If i take a very large a , the resulting classifier will be almost hard-margin.
If i take a very small a, the resulting classifier will care very little about mistakes