HiSeq 3000 output FastQC parameters are bad, should I ask for resequencing?

Recently we sent a batch of 24 samples of total RNA to a company for sequencing on a HiSeq 3000 platform. Both our own and the company's quality checks on the samples showed good integrity of RNA and very good purity, so they went ahead with library preparation and sequencing. After the wait we got a hard drive with the data. However, after running FastQC some of the plots show problematic issues which I'm hesitant to attribute to the samples, since they seem more of an issue with the flowcell and/or library preparation. The samples were sequenced for a downstream de-novo transcriptome assembly with Trinity, but I'm not sure if the sequences as they are now will be good enough for that, even after trimming off the adapter. I've attached the FastQC of two representative samples (for both forward and reverse reads, simplified file names). They were sequenced in different flow cells since all the samples were not sequenced together to have enough reads per sample when multiplexing. The most alarming things:

- Poor per-tile quality. Some regions of the flow cells seem like they failed in the later cycles, and they are localized, as if something had failed in one particular spot and not a generalized problem. You can also see this impacting the quality per sequence plot, where there's a hump in the sample with the worse per-tile quality.

-Adapter content. In some samples adapter starts showing up at around cycle 100, which to me suggests the fragmentation was a bit too aggressive and small fragments were used in the library preparation.

We paid quite a bit of money to have this sequenced, and it doesn't feel like the run was up to standard. Should we go back and ask the company to re-do this?

this is a subset of the data? How many reads did you get in total?
There was indeed an issue with the flowcell, localized low quality regions causing the low quality data. Likely Illumina would replace the reagents in this case.
The insert size is a more difficult question. By default RNA-seq libraries always contain a majority of short reads. It depends what you discussed with them before.

The FastQC reports I showed are not subsets, they are with all the reads for those samples. On average we got around 30-33 million reads for each sample (some up to 43 million). For the entire set of 24 samples we have around 860 million reads.

On the matter of fragment size, I do know Illumina libraries have smaller fragments. We asked for 2 X 150 bp sequencing. The fact that the adapter shows up in a detectable percentage of the reads to me means that in the library preparation the size selection of fragments to attach the adapters included pieces of much less than 150 bp, otherwise there wouldn't be a read through into the adapter sequence in the last cycles.

.....
On the matter of fragment size, I do know Illumina libraries have smaller fragments. We asked for 2 X 150 bp sequencing. The fact that the adapter shows up in a detectable percentage of the reads to me means that in the library preparation the size selection of fragments to attach the adapters included pieces of much less than 150 bp, otherwise there wouldn't be a read through into the adapter sequence in the last cycles.

Yes, this is certainly correct. However, RNA-seq libraries generated with most protocols have a strong bias towards smaller fragments - in contrast to genomic libraries. Please see the attached examples from Illumina and from NEB information. Shortening the fragmentation times mostly results in a more prominent tail of long fragments while retaining a majority of short fragments. Thus, moving the insert sizes to 250 and above requires severe size selection measures that will be accompanied by some loss of library complexity. We do indeed carry out such size selections for de novo transcriptome assembly purposes, but I believe we are the exception and most places will not do it. Since one throws the majority of the library with the size selection this warrants a discussion in my eyes.

I've already contacted the company that did the sequencing. However, in the worst case scenario, how can I proceed to do assembly with these reads? Should I filter all reads coming from the bad tiles or let the assembler evaluate the quality of the base in the read?

I did adapter trimming and a soft quality trimming with Trim_Galore. However, I'm still uneasy about including the sequences from the bad tiles into the assembler. You know, trash in, trash out. Is there any tool you would recommend to remove them? Should that be done before of after trimming?

Has the sequence provider said anything about the possibility that there was a hardware/software problem with this run. If there was then they should re-run the samples for you for no charge. Generally Illumina provides free reagent replacements to providers when they have a maintenance contract on the sequencer (which most will).

Thank you so much! Yes, that looks like it could do the work, and it seems we have the suite already set up in our University's cluster. I'll have to play around with the parameters, I am not too sure how strict to be given the per-tile plots I am getting.

I called the sequencing facility yesterday, the operations team is going over my inquiry, but they haven't gotten back. I'll get in touch again today. It is a big company, so they should have those quality assurances in place for what is clearly a technical problem on their part. On my part it is more about the time it will take to get that data (if they redo it), as we are already a little behind schedule.

I did try to use Filter by Tile, even with the aggressive parameters they suggest, and although reduced, I still had a number of bad tiles carrying over. Fortunately, Well, after some back and forth, the company will be resequencing the samples. I'll use the ones I have to start optimizing parameters with Trinity. I've done bioinformatics before, but haven't ever done assembly with this big of a dataset, so I'll have to read around a bit.