I'm now 6 months into the field of NGS and analysis of sequencing data. I have been working on RNA-Seq data and recently, just started to venture into CAGE-Seq data.

I wanted to ask how do we actually map CAGE-Seq data? We did a paired-end sequencing for the CAGE data and then got the fastq files. After cleaning, I got the clean reads files for read1 and read2 but both of them are of different size. When I run them on STAR, it said that mapping could not be done as the run finished for 1 read while the other 1 is still not.

Is this normal for CAGE-Seq data? Or should we just map read1 only as we are only interested in the TSS i.e. reads seq from 5' end?

For paired-end data my favourite approach is to convert paired alignments from BAM format, where each mate is represented on separate lines, to BED12 format, where each pair is on one line, using the pairedBamToBed12 tool. The 5′ end of the BED entries is the CAGE TSS. CAGEr supports loading data in BAM, BED, and other formats. I recommend you to read its vignette.