Your first task is to propose a model with tuning parameters $\theta$ for generating the data $D$.

This involves specification of $p(D|\theta)$ and a prior for the parameters $p(\theta)$.

You choose the distribution $p(D|\theta)$ based on your physical understanding of the data generating process.

Note that, for independent observations $x_n$,
$$ p(D|\theta) = \prod_{n=1}^N p(x_n|\theta)$$
so usually you select a model for generating one observation $x_n$ and then use (in-)dependence assumptions to combine these models into a model for $D$.

You choose the prior $p(\theta)$ to reflect what you know about the parameter values before you see the data $D$.

Again, no need to invent a special prediction algorithm. Probability theory takes care of all that. The complexity of prediction is just computational: how to carry out the marginalization over $\theta$.

In order to execute prediction, you need to have access to the factors $p(x|\theta)$ and $p(\theta|D)$. Where do these factors come from? Are they available?

What did we learn from $D$? Without access to $D$, we would predict new observations through
$$
p(x) = \int p(x,\theta) \,\mathrm{d}\theta = \int p(x|\theta) p(\theta) \,\mathrm{d}\theta
$$

NB The application of the learned posterior $p(\theta|D)$ not necessarily has to be prediction. We use it here as an example, but other applications are of course also possible.

There appears to be a remaining problem: How good really were our model assumptions $p(x|\theta)$ and $p(\theta)$?

Technically, this is a model comparison problem

[Q.] What if I have more candidate models, say $\mathcal{M} = \{m_1,\ldots,m_K\}$ where each model relates to specific prior $p(\theta|m_k)$ and likelihood $p(D|\theta,m_k)$? Can we evaluate the relative performance of a model against another model from the set?