Hi, all.
I am currently facing a alignment problem, and I don't have any idea right now.
Now, I am trying to align Rice DNase-seq to IRGSP build4 genome. I am using SRR094111.sra data, and convert it to fastq with default parameter.
I use bowtie2 to do the alignment(parameter:bowtie2 -p 8 --local -x /all_format -U SRR094111.fastq -S SRR094111.sam), But I just got the following low rate alignment. I am not sure what is going on. I have tried to trim last several parameters, but I don't know how many base pair I should trim like that. Can anyone show me the detail pipeline to make right alignment? Thanks a lot.

The degree of DNase I digestion was assessed by pulsed-field gel
electrophoresis (PFGE: 20–60 switch time, 18 h, 6 V/cm; Bio-Rad). High
molecular weight (HMW) DNA after DNase I digestion was isolated, blunt
ended with T4 DNA polymerase. Biotinylated adaptor I (5’ Bio
ACAGGTTCAGAGTTCTACAGTCCGAC and 5’ P- GTCG GACTGTAGAACTCTGAAC) was
ligated to the DNA molecules. Dynal M-280 beads (Invitrogen) were used
for enriching DNase I digested DNA ends after MmeI digestion. Adaptor
II (5’ P-TCGTATGCCGTCTTCTGCTTG and 5’ CAAGCAGAAGACGGCATACGANN) was
then ligated to the MmeI treated ends. The DNA sample was amplified by
PCR using linker-specific primers (5’ CAAGCAGAAGACGG CATACGA and
5’AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA), and purified by PAGE
for isolation of DNA fragments with about 90 bp in size. The final
Illumina sequencing was performed using a primer specific to linker I
(5’CCACCGACAGGTTCAGAGTTCTACAGTCCGAC).

*The following is my fastQC report with failure information, others are all good.*

While the Q-scores are not great they are atrocious either. Since this is old GAII data I would suggest that you take into account (it is likely in Illumina format, phred+64). Try bowtie (instead of bowtie2) to see if ungapped alignments improve things. Trying to replicate the analysis in what ever paper this came from as closely as possible should be done first before you veer off in other directions.

The results from (what I think is) the original paper are remarkably different:

We obtained a total of 43 million sequence reads from the seedling
libraries and 57 million reads from the callus libraries (Supplemental
Table S1). Approximately 70% of the reads were mapped to unique
positions in the rice genome.

Are you sure that you align to the correct reference? Did you download and index the genome yourself or was it provided by someone else, did you sucessfully align other data to that reference before? Would be the most obvious explanation before you start chasing ghosts.