where pX,Y(x,y) gives the joint distribution of X and Y, while pX|Y(x|y) gives the conditional distribution for X given Y.

Why the name 'marginal'? One explanation is to imagine the p(x,y) in a 2D table such as a spreadsheet. The marginals are got by summing the columns (or rows) -- the column sum would then be written in the margin of the table, ie. the column at the side of the table.