From what I understand, a primitive(triangle) goes in to a tesselation shader and the tesselation shader spits out a bunch of primitives inside the "mother" primitive(triangle).

How does the tesselation shader know where to break the the primitive into smaller pieces, does it just half, half half half, does it use a texture? How does the tesselator know how to transform the vertices? Does it use a normal map?

When I watch the heaven benchmark, the rocky road, the stones sticks outwards, how does it know to stick out and not in?

I recommend reading the "tessellation overview" doc in the dxsdk docs, it will give you a good idea of how it all fits together.
Very briefly, your vertex shader stage feeds a primitive/patch to the hull shader.
In the hull shader stage you define the amount of geometry expansion you want, as well as the tessellation scheme which will determine how your original primitive is split up, as well as any additional data you want interpolated for the tessellated geo (you can have access to the entire patch data here).
then this goes through the actual tessellator stage, which is fixed (no shader).
The output is just flat tessellation, no displacement.
This is then fed into the domain shader where you can apply any displacement algorithm you want. (for example read in a height map and push/pull the verts in the direction of the normal based on the sampled height value, for very simple displacement).

then this is then passed on to GS or PS as it would without using the tessellator.

There are plenty of papers on displacement algorithms, issues you are likely to encounter (like seams), smoothing applications, etc. which I recommend you search for. Also, there are some dx samples and nvsdk samples for this you can/should look at.