Challenges in obtaining high-quality data from a custom-made panel for the next generation sequencing (NGS) using Ion Torrent GeneStudio™ S5 platform

The goal of this part of the study was to optimize the sequencing procedure for 16 human genes and their regulatory regions that might be associated with differential immunological response to COVID-19. The study was performed on 60 COVID-19 patients from the General Hospital of Tešanj, Bosnia and Herzegovina, categorized into three groups of mild, moderate, and severe clinical manifestation, based on the diagnosis by the residential physician. Target coding sequences and their regulatory regions were amplified for the following genes: HLA-A, HLA-B, HLA-C, ACE2, IL-6, IL-4, TMPRSS2, IFITM3, IL-12, RIG-I/DDX58, IRF-7, IRF-9, IL-1B, IL-1A, CD55, and TNF-α . DNA was isolated from the whole blood samples stored at -20°C for six months using QIAamp® DNA Mini Kit according to manufacturer’s instructions. Since NGS analysis of target genomic regions was performed on the Ion Torrent GeneStudio™ S5 platforms, libraries were prepared using Ion AmpliSeq™ Library Kit Plus according to manufacturer’s instructions in a protocol optimized for low-quality DNA. Due to dissatisfactory sequencing results, further protocol optimization steps were employed through separating two primer pools, increasing the number of PCR cycles, and decreasing the annealing temperature for the primer pool which showed poorer amplification results. In the end, 36 samples produced optimal results, while the remaining 24 samples will be re-sequenced following repeated sample collection and DNA isolation, accompanied by additional protocol modifications.


Introduction
Considering the global impact of COVID-19 in terms of its consequences on healthcare, economy and social norms of behavior, multidisciplinary studies of issues related to this pandemic have been the focus of the scientific research for the past two years.the entire genomes.There is a number of different NGS platforms which can perform parallel sequencing of millions of small DNA fragments.Each of these fragments is sequenced multiple times, thus providing high sequencing depth with the goal of delivering accurate data and providing information on unexpected and/or novel gene variants.There are various potential uses of NGS in clinical practice, including capturing a broader spectrum of mutations than Sanger sequencing, high sensitivity in detection of mosaic mutations, and detection of rare mutations or mutations of low frequency (for example, in tumor tissues and liquid biopsy) (Behjati & Tarpey, 2013).
We hereby report technical issues encountered while sequencing a panel of 16 genes of interest and their regulatory regions for 60 COVID-19 patients from the General Hospital of Tešanj, Bosnia and Herzegovina.This research was performed using semi-conductor sequencer GeneStudio™ S5 (Ion Torrent, Thermo Fisher Scientific, Waltham, MA).

Materials and methods
Whole blood samples were obtained from 60 COVID-19-positive patients examined at the General Hospital of Tešanj, Bosnia and Herzegovina.All collected samples were classified into three groups according to the clinical manifestation of COVID-19, namely mild, moderate, and severe clinical manifestation (including five deceased patients).Patient classification into the study groups was performed by residential physicians based on the previously published guidelines (Baj et al, 2020; Table 1).Prior to sample collection, ethical clearance to conduct this research was obtained from the Ethics Committee of the Faculty of Engineering and Natural Sciences, International Burch University (Sarajevo, Bosnia and Herzegovina), as well as from the Joint Ethics Committee of the General Hospital of Tešanj (Tešanj, Bosnia and Herzegovina).All participants signed an informed consent form of voluntary participation in the research.
Samples were collected in November 2020, stored at -20°C and delivered to the laboratory for analysis six months later.Following delivery, samples were immediately de-frosted, and DNA was extracted using QIAamp® DNA Mini Kit (Qiagen, Hilden, Germany) according to manufacturer's instructions.Isolated DNA was quantified using Qubit® 3.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA), according to manufacturer's instructions, with quantification results ranging from 5 ng/µl to 40 ng/µl.Samples with concentration lower than 10 ng/µl were re-extracted and higher concentrations obtained.
According to DNA concentrations, libraries were prepared using Ion AmpliSeq™ Library Kit Plus (Thermo Fisher Scientific).Ion Torrent GeneStudio™ S5 (Thermo Fisher Scientific) was used as a sequencing platform, and it requires a minimum 1 ng of DNA.Sixteen libraries were prepared from the mild clinical manifestation group and protocol was immediately adjusted due to the fact that the blood samples used for DNA isolation were not used immediately after collection.The protocol was adjusted for low-quality DNA, meaning that three cycles were added in the amplification PCR and initial DNA input in the reaction was 20 ng.Following amplification, partial digestion of amplicons was achieved using exonucleases and two primer pools were pooled into one.Following the incubation step, barcodes were ligated on the samples and prepared libraries were purified using Agencourt AMPure XP reagent (Beckman Coulter, Brea, CA) and 70% freshly prepared ethanol.Purified libraries were quantified using Ion Library TaqMan™ Quantitation Kit (Thermo Fisher Scientific) on real-time PCR.Library quantification values were all higher than 100 pM and were diluted to 80 pM, according to manufacturer's instructions.

Results and Discussion
Sequencing results for the first group of 16 samples were satisfactory and no further optimization of the protocol presented above was planned at that point.However, when the second set of 16 samples was sequenced, the conflicting results were obtained.While some of the samples showed high coverage of 100x 100%, others showed 1x 50% coverage, even though all libraries had the same starting concentration.Namely, library quantification analysis gave the results ranging from 100 pM to 250 pM.All libraries were therefore diluted to the final concentration of 100 pM to achieve uniform library concentration in the final pool.Since the primer panel was custom-made, subsequent optimization steps, aimed at improving the coverage, included another library preparation step with protocol modification in which two primer pools were prepared separately for each sample in order to test and optimize the primers.After applying the same protocol, library quantification showed large difference between the library concentrations in two primer pools, which partially explained conflicting coverage on NGS (Table 3).Further optimization of the protocol was made by adding two more PCR cycles to the primer pool 2 and, finally, by lowering the annealing temperature by 2°C for the same pool.Obtained results were better, but still did not reach the desired quality of the output data.
Library preparation of the remaining 28 samples was later initiated to test the quality of samples and whether introduced protocol modifications were satisfactory.Library quantifications were still conflicting, so that out of a total of 60 samples, 36 gave optimal coverage results, while the analyses for the remaining 24 should be repeated.
As presented above, several experimental issues have been encountered while optimizing the NGS protocol for the target gene sequencing in a set of COVID-19 patients.Firstly, whole blood samples were collected from patients and kept frozen at -20°C for six months.While fresh samples are producing better results, it is not unusual to produce high-quality DNA isolated from samples that have been stored for several years (Tagliaferro et al, 2021).Also, QIAamp® DNA Mini Kit that was used in the present study is best valued for its time-effectiveness, lower possibility of sample contamination and ease of use (Chacon-Cortes & Griffiths, 2014).However, previous research has shown that for older samples, other DNA extraction methods, such as modified salting-out method, should be used (Mardan-Nik et al, 2019).Other studies have shown that silica-based DNA isolation using commercial kits, such as QIAamp® DNA Blood Midi Kit (Qiagen) (Mardan-Nik et al, 2019) and DNeasy® Blood & Tissue Kit (Qiagen) (Tagliaferro et al, 2021) are both capable of producing high-quality DNA extracts suitable for subsequent PCR amplification.
Furthermore, the custom-made primer panel that was produced for the purpose of this research has proved itself to be the major experimental obstacle in producing satisfactory data following NGS analysis of the samples.Inconsistent data produced for two primer pools, as well as conflicting results for different samples despite comparable starting DNA concentrations point out to this conclusion.Furthermore, the primers were designed in silico for the research-use-only (RUO) application and with the goal of primer panel validation.In order to improve the primer pool performance, protocol modifications were included according to the pre-established practice of increasing the number of PCR cycles to achieve target region amplification.Although the multiparallel sequencing platform was used and 16 genes and their regulatory regions are being analyzed at once, sequencing fragments of around 200 bp in length were still used in this research to achieve better sequence coverage and, therefore, higher result accuracy.Since the sequencing issues were encountered following these modifications, it is suggested that the primer pairs for different genes might differ in their annealing efficiency, as well as that excessive primer-dimer formation was encountered due to sequence complementarity.
Previous studies dealing with NGS data accuracy, sensitivity, specificity, and precision have highlighted the importance of using proper bioinformatics pipeline for the variant calling with the purpose of producing optimum results (Shin et al, 2017), no matter which NGS platform is used as a method of choice with an aim of replacing the Sanger sequencing method (Sandmann et al, 2017).In addition, the importance of proper library preparation is often emphasized as a crucial step in any successful NGS analysis (Forth & Hoeper, 2019), including adding the proper amount of DNA and conversion of that DNA sample into a functional library that can be successfully sequenced.Previously published detailed technical notes emphasize the importance of several critical steps in generation of high-quality NGS data, such as obtaining adequate amount and quality of DNA sample to be sequenced, decontamination of pre-amplification area, primer design, as well as gel test to check for the amplification of expected PCR products, amplicon purification and potential primer dimer formation (Wohlhieter et al, 2021).

Conclusion
Since it is generally accepted that the time elapsed from sample collection to sample analysis does not significantly influence the quality of isolated DNA, and based on the DNA quantification results, we can conclude that the sample quality was not a significant limitation in sequencing challenges encountered during this research.It is reasonable to assume that the custom-made primer panel failed to perform as expected in the NGS protocol.Following the sequencing protocol optimization and consulting literature sources explaining optimization of other custom-made sequencing panels, it was clear that accurate results cannot be obtained for 24 samples using hereby established protocol.
When it comes to future steps aimed at solving the issue of unsatisfactory library quantification results, the first option is to replace those samples with freshly collected whole blood samples and repeat the analyses.Furthermore, if this approach does not contribute towards solving the issue, primer pool separation will be employed in order to obtain satisfactory library quantification data for the amplicons contained in primer pool 1.Finally, missing amplicons will be generated using TruSight One Sequencing Panel (Illumina, San Diego, CA) for the clinical exome sequencing of more than 4,800 genes associated with human disease, including the genes of interest in the present research.

TABLE 1 .
Baj et al, 2020)f the study participants into three clinical manifestation groups (mild, moderate, and severe) based on observed symptoms (table taken and adapted fromBaj et al, 2020).

TABLE 2 .
The number of primer pairs used for sequencing 16 target genes and their regulatory regions..

TABLE 3 .
Library concentrations for two primer pools in the second set of 16 samples.