I am currently reading a paper that is relatively tough for me since its a hardcore biology paper and I come from a math background. They have this diagram:

So the x axis corresponds to +/- 3 kb of the TSS. It shows the average Chip Seq enrichment per 50bp bin for total population of genes. The y axis is the log of the enrichment. I have a weak understanding of enrichment so if someone can explain this concept to me.

Furthemore, what does the graph tell me? I am guessing it has something to do with the Pol2 binding based on the specific histone marks.

In order to explain what enrichment is, it's important to consider what a ChIP-seq experiment actually is.

A ChIP-seq experiment is performed to determine where a protein of interest resides on the DNA. In order to perform this experiment you need to have some way of immunoprecipitating (IP'ing) the protein of interest. Sometimes you have an antibody for the protein of interest i.e. a subunit of pol II. Other times, if you don't have antibodies for the protein of interest you may engineer the protein to be fused to a small number of amino acids that you do have an antibody for i.e. TAP-tag, Myc-tag etc. Either way the protein that you expect to bind to DNA will be able to be IP (Hurray!).

Now when performing the IP, you would first incubate the antibody with your cell extract which contains protein, nucleic acids, and other parts of the cell that can go into solution (just think of this as the cells insides that do not precipitate out earlier in the experiment like large debris/cell wall material). So now you have antibody associated to the protein of interest (which is also hopefully associated to some DNA too) but how do you separate the antibody-protein-DNA complex from the rest of the extract? You add protein-sepharose or magnetic beads which are relatively heavy and the antibody can attach itself to. After incubation you can spin the beads and thus the antibody-protein-DNA complex down. You would then wash it off to rinse away everything but your protein which is still attached to the antibody-bead complex. Later you can add a solution to separate this complex then isolate the DNA and you now have DNA that the protein of interest was recruited to.

However you have a problem. The beads you use, are porous and random fragments of DNA and DNA-protein complex can fit in. So when you isolated your DNA-protein of interest you also had some background. So as a negative control we separately perform the ChIP-seq experiment without the antibody, or without the tagged version of the protein. This will isolate the DNA that is non-specifically isolated. So to get back to your original question, enrichment is Reads from the IP divided by Reads from the background IP sometimes called Mock IP or Untagged IP. Having said that, sometimes people just use genomic DNA isolated in the ChIP-seq experiment that has NOT gone through the IP process as a background control ("input"). Therefore input can be a negative control for all steps except for non-specific interactions with the beads during the IP.

I'm not going to go through every figure you just posted, but as far as interpreting these results goes I will give you some pointers. This paper appears to have performed 8 ChIP-seq experiments for different histone modifications and 1 ChIP-seq of pol II in Myoblasts and Myotubes. They had some a priori knowledge of what genes were up-regulated, down-regulated, always or never expressed in MT. For each composite plot they used the expression data to isolate only those genes for each line. The higher the enrichment the more of that protein is present relative to that location on the TSS. And the reason why you see the broad peaks surrounding the TSS for the histone marks is because that is where the -1 and +1 nucleosomes are.

I am a bit confused about the negative control part. My understanding is that when you perform IP you get random fragments of DNA, the background noise since the porous bead attracts these fragments. The negative control experiment also gives random fragments of DNA because the beads picks up fragments because its porous. Essentially you want the number of reads of random fragments from IP experiment and the control experiment to be the same. Am I correct?

I am a little confused by: This will isolate the DNA that is non-specifically isolated.

Assuming the experiment was performed properly and the quality of DNA from the IP and control are equal and everything else during sequencing run works, the number of reads you get will be purely a function of how long you run the sequencer. The reason enrichment is calculated is because in reality the DNA that is in the negative control is not perfectly distributed across the entire genome, therefore if a negative control was not included you would never know the difference between a true-positive and a false-positive (region with high background). To control for these situations, we divide the IP by the control IP or input.

Jason, thanks a lot. I have a much more solid understanding of chip sea now. If you dont mind, one last question. I now know what the negative control does and the purpose of it but I can't figure out how it helps us distinguish true pos from false pos. I googled this question and found a PDF that talks about controls and they have this picture: http://i.imgur.com/HjDgstP.png

Could you explain what I see in the picture for pol2 protein, but here is my explanation. The way I understood is that if we didn't have the dashed blue line (the control) it would be difficult for us to say that peak (dark blue line) is where the protein was bound. However since the dashed blue line peaks much lower (which represents the noise I guess) we have some guarantee the dark blue line peak is indeed then bound DNA. We can quantify this 'guarantee' by the formula you gave in your answer.