ParseCNV takes CNV calls as input and creates probe based statistics for CNV occurrence in (cases and
controls, families, or population with quantitative trait) then calls CNVRs
based on neighboring SNPs of similar significance. CNV calls may be from aCGH,
SNP array, Exome Sequencing, or Whole Genome Sequencing.

A brief introduction: DNA base call GWAS deals with 3
possible states at each SNP [AA,AB,BB] where A and
B=[A,T,C,G] and A≠B.

CNV association deals with 5 possible states at each
SNP [0,1,2,3,4] where 2 is typically high frequency as
the normal diploid one maternal one paternal copy expectation. To leverage 3
state association, we divide out deletion [0,1,2] and
duplication [2,3,4] association.

Pre and Post association quality control metrics are important
to limit bias in overly stringent Pre association quality control.

CNVs are also called Copy Number Abberations,
Changes, Differences or Polymorphisms (CNA, CNC, CND or CNP) depending on the
discipline, context and the level of specificity. CNVs are the popularly
assessed subset of Structural Variations, which include insertions and
inversions.

See simple example input files: Cases.rawcnv Controls.rawcnv
Cases.fam ChrSnp0Pos.map IDToPath.txt and try to run them as a test of the
software on your system. If you have trouble with your input files, create
similarly simple versions to quickly run the software and isolate the problem.

Dependencies: bash, R,
and plink. Try which bash, which R, and which plink and if any returns command
not found instead of path, try: "module avail" and "module load
R" otherwise download and install these dependencies then do "export
PATH=/YourDirectoryPath/R-3.1.2/bin:$PATH".

Format conversion
scripts for popular algorithms (QuantiSNP, XHMM, and GenomeStrip)are included. Typically this
reformatting is relatively easy in Excel. Calls for a given sample should be
sorted together in subsequent rows. CNState values: state1,cn=0
state2,cn=1 state5,cn=3 state6,cn=4. These states are also referred to as
homozygous deletion, hemizygous deletion, hemizygous duplication, and
homozygous duplication respectively in literature. Be careful there is no blank
line at end of file.PennCNV
states [1,2,5,6] correspond to CN [0,1,3,4].

QuantiSNP: You have to download from here. The various other websites do not work.Make sure your files are: “SNPName Chromosome
Position LRR BAF” the header is not read and if you switch LRR and BAF no error
is given.

If you only have
PED/BED genotype file, you need to use the Illumina Genome Studio Project with the Idats from
your samples loaded and export BafLrr files. Use column chooser to show B
allele Freq and Log R Ratio along with SNP Name, Chr, and Position. Use the
PennCNV packaged script kcolumn.pl to convert the all samples text file into
single sample text files. See detailed instructions here: http://penncnv.openbioinformatics.org/en/latest/user-guide/input/
If you are using Affymetrix, see here:
http://penncnv.openbioinformatics.org/en/latest/user-guide/affy/ Once you have
generated a PennCNV .rawcnv file (using the --conf option), you can convert into
a Plink CNV file using a simple bash command: sed 's/:/\t/' Cases_wConf.rawcnv
| sed 's/-/\t/' | sed 's/chr//' | sed 's/state.\,cn\=//'| sed 's/conf=//' | sed
's/numsnp=//' | awk '{print "0 "$7" "$1"
"$2" "$3" "$6" "$10" "$4}' or use
ParseCNV which uses PennCNV .rawcnv as input directly. See Plink CNV format
specification here: http://pngu.mgh.harvard.edu/~purcell/plink/cnv.shtml

Otherwise, it is very hard to make any meaningful
inference of CNVs from a SNP genotype PED file. We can infer CNV by intensity (LRR)
alone, but not by genotype alone.

The best you could do is detect runs of homozygosity
using plink:

plink --ped file.ped --map file.map
--homozyg

http://pngu.mgh.harvard.edu/~purcell/plink/ibdibs.shtml#homo​

This would flag regions, which could be deletions
(with intensity drop) or simply runs of homozygosity (which do not have the
intensity drop). You can convert the plink ROH output into a plink CNV input
file using Excel or awk. You can just put Type as 1 to presume they are mostly
single copy deletions.

You could similarly look for "runs of
no-calls" which would flag regions of homozygous deletion or duplication,
again non-differentiatable without the intensity data.

This is the ambiguity you would have to accept given
the limited data available.

Some papers say you can impute CNVs from SNP data,
but I am skeptical and at best would only detect mostly common CNVs, which are
typically of less interest.

FAM

The family or FAM file has the family relationships
in the study subjects to track inheritance of CNVs. Tab Delimited:

IndividualID and Affected columns are required others
can be zeroed if not applicable.

For example: “11.txt0012”

If you have Cases.rawcnv and they are just unrelated
cases, without any families (unrelated), or quantitative traits you can just
run: awk '{print $5}' Cases.rawcnv | uniq | sed 's/^/0\t/' | sed
's/$/\t0\t0\t0\t2/' > Cases.fam

MAP

The SNP definition file is four columns:

Chromosome ProbeID
PositionCentiMorgan PositionBasePair

similar to Plink map file
(PositionCentiMorgan is fine just zeroed out). Sorted by position and
chromosome Sorted.map created automatically which should be used with the
resulting ped. Map could be an intersection or union of chip types if study is
across platforms. Be sure that all SNPs used for CNV calling are provided.

For example: “1rs1739822016196820”

If you are using
sequencing data or are uncertain about the full set of probes, a blank or
incomplete file can be used as the map input and the map will be dynamically
determined by the start and end positions (breakpoints) of the CNVs observed in
the .rawcnv files.

The full .map chr snp pos
information should be available on the Illumina/Affymetrix website, baflrr
"signal intensity files" PennCNV inputs, in Illumina GenomeStudio / Affymetrix
Genotyping Console, or PennCNV .pfb file. You can use the linux awk command to
rearrange the columns if needed.

Chr X should be X, not 23 as in some plink processed
files.

For example if you have the PennCNV .pfb run: awk
'{print $2"\t"$1"\t0\t"$3}' ChipX.pfb > ChipX.map

ParseCNV will automatically define probes missing in
the map according to the .rawcnv definitions but this
should only be utilized for a small number of probes to properly represent the
actual data resolution.

--idToPath
<file.txt> Additional
SampleID to BafLrrFilePath file for Specific BAF LRR access of all associated
samples for associated CNVRs. The input file should contain 2 columns: IDs as
in column 5 of .rawcnv and full path to baf lrr file
(signal intensity). In many cases, the 2 columns may be the same.

--batch <file.txt> A
subset of samples which form a discernible batch to monitor the contributions
of such as a different chip type/version, processing lab, sample type/source,
or reagent batch.

--mergePVar <1> Change
the default of merging probes with p-values no more than one power of ten
variation.

--mergeDist
<1000000> Change the
default of merging probes with proximal distance no more than 1000000 base pairs
(1MB). The default is good for 500k but for 2.5M since 5x more markers, use 1/5
of 1M MB = 200,000.

--tdtSpecify
a family based study with trios to allow TDT deletion and duplication p-value
to drive CNVR boundary determination rather than case:control.

--maxPInclusion <0.05>
The maximum p value of
probe based statistics to include in CNVR determination and thus reported
associations.

§run ParseCNV with --includePed and --out <myPrefix> (Cases.rawcnv can
have all samples and Controls.rawcnv can just be a blank file with no blank
line at end NullControl_ForQuantitativeTrait.rawcnv)

§run plink --ped CNVDEL.ped --map myMap_UseWithPed.map --make-bed --out
CNVDEL_Bed and again with CNVDUP.ped (this will compress and allow .fam to be
easily edited, just make sure you keep the sorted order of .fam the same)

§To
correct for population variation, include “--covar myMDSResult.mds” or “--covar
myPCAResult.pca.evec” to the plink command as detailed below. (Note
–assoc and --tdt do not work with --covar so use --linear or --logistic
instead.)

§If you want to use other P-value
generating tools such as GEMMA update the header to be “SNP”, “BETA” or “T” or “OR”,
and “P”, just note InsertPlinkPvalues dependencies

Collapsing approach used to combine non-overlapping
CNVs which all overlap the same gene.

Gene based association approach now available in
ParseCNV_GeneBased.pl

perl ParseCNV_GeneBased.pl Cases.rawcnv
Controls.rawcnv

sort by OR>1 for Case enriched
significant Genes.

The gene based approach will
give many more potential hits because it is more flexible and sensitive. Not as
many features are there but it is useable. It should work in a similar way for
METAL with all necessary input columns.

The gene-based association collapses CNVs based on
overlapping any sub-segment of the gene rather than the same sub-segment as in
the case of a CNVR defined by ParseCNV. Thus one case may overlap exon 1 of
gene x while another case may overlap exon 2 of gene x and yield significance
in a gene-based association but not in a traditional CNVR SNP-based statistics
strict overlap association.

I have tested bosTau7 Cow and susScr3 Pig
specifically to work. If you have a different species or build you can try
replacing “bosTau7” in the commands below. Mouse,
chimp, and fly should work. You may need to run the UCSC LiftOver web tool if
definition files are not available in your build version.

wget
http://www.nature.com/nature/journal/v491/n7424/extref/nature11622-s3.xls ###xls2csv and xls2txt could work but just include in downloads <susScr3_PigSegDups_GroenenMAetalNature2012SupTable16.txt>

The input is calls in PennCNV format with confidence
column and Algorithm column, sorted by ID and algorithm. You also need to provide
a map file for all probes being considered. Make sure the map is sorted by
position and chromosome (sort -n -k 4,4 file.map | sort -s > file_Sorted.map).

The
program does not currently generate newoverlapping
segments consensus CNV calls from multiple algorithm comparison for further
analysis. I have used PennCNV calls that had good overlap in QuantiSNP rather
than generating an overlapping segments consensus call. Many CNV calling
algorithms are conservative in the boundary calls, especially if you did not
run the PennCNV clean_cnv.pl script to merge fragmented CNVs.

Make sure you look at either PvsQ or QvsP CallMatching and
MatchSummary, looking at both will be confusing. For the QvsP output, the first
9 columns refer to the QuantiSNP calls. The next 6 columns refer to the overlap
profile of the QuantiSNP call with the PennCNV call. The next 9 columns
starting with RefCallMatchingCN_CNVBoundariesChrStartStop refer to the
overlapping PennCNV call with the same CN contributing to the “hitsnp” column.
The next 9 columns starting with RefCallDiffCN_CNVBoundariesChrStartStop refer
to the overlapping PennCNV call with a different CN contributing to the “hitDiffCN”
column.

The columns with numbers in the QvsP.txtCNVMatchSummary.txt
file are the hitDiffCN mismatches. For example, 01 is the count of QuaniSNP
calls that were CN=0 but PennCNV called as CN=1.

PennCNV_ConcordInh_NoROH.txt

We see full overlap of sample level rawcnv calls of
the three algorithms PennCNV, QuanitSNP, and Nexus with reference to the
PennCNV calls (indicated by “vsP”), except PennCNV chr1:8-10
where the other algorithms called chr1:8-9but this still is a consistently detected CNV.

We see full inheritance reporting with inheritance
from mother chr1:1-3, inheritance from father
chr1:4-7, and de novo chr1:8-10 for the proband CNV calls. We see inheritance
to proband chr1:1-3 for the mother CNV call. We see
inheritance to proband chr1:4-7 for the father CNV
call. In the case of a CNV overlapping but with a different CN state between
family members, similar columns are reported with an additional column
specifying the different CN state. Certainly CN 3 and 4 seems the most typical
permissible different CN to allow as a match but others may indicate abnormal
noise patterns.

InsertPlinkPValuesInput is general tab delimited P, SNP, and one of
BETA/T/OR header present.

You
can run for each study to get study specific red flag confidence scores or run
ParseCNV and Plink again on the combined data to use as input for combined
study red flags

ParseCNV generates the output *CNV_Brief.stats which
is the SNP based CNV statistics which can be used for meta-analysis using
METAL. Since you have different arrays, the intersection SNP set will be low so
I would use the gene name as the marker rather than the SNP name. Using PennCNV
scan_region.pl with input chrX:start-stop for each SNP
will annotate the closest gene to each SNP. Then sort -g in unix
to sort by p-value so best (lowest) p-value is first since that is the one
METAL will use.

run ParseCNV for each study to generate
the p-values for each study separately.

You need to rearrange *CNV_Brief.stats into a
scan_region.pl input using:

PennCNV is the motivating input. The PennCNV input is
very human readable, widely used and informative. CNV .vcf is widely used in
sequencing applications, but not human readable. Therefore, the idea is to
convert CNVs called by different algorithms into PennCNV format with a final
column stating the algorithm for tracking. In most cases, the CNV calls file
columns and coding can be edited relatively easily in Excel. The main exception
is BirdSuite and GenomeStrip .vcf matrix formats. For these common yet very
different formats, use ConvertVCFToPennCNV.pl. For the commonly used QuantiSNP
conversion use ConvertQuantiSNPToPennCNV.pl.

Similar settings should be used between algorithms to
ensure comparability such as using GC base content correction.

Nexus “Min Region” should be used not “Chromosome
Region” since that is the average position between probes. AgilentGenomicWorkbench/Cytogenomics
has the sample ID listed only once as a header of the CNV calls subsections.

Consider CallRate, LRR_SD, GCWF, CountCNV, PCA, and
PI_HAT as the most important sample quality metrics. CallRate and LRR_SD are
widely used and critical to evaluate. The problem is these metrics are derived
from various sources, even if you use SNP arrays with PennCNV, only LRR_SD and
GCWF are provided in the PennCNV .log. CountCNV is automatically figured out by
the .rawcnv files. CallRate, PCA, and PI_HAT are based
on genotyping A/T/C/G, not CNVs. For CallRate, you can provide Plink --missing .imiss, Illumina LIMs Project Detail Report .csv, or
Illumina Genome Studio Samples Table. PCA population stratification components
can be Eigenstrat smartPCA .pca.evec or Plink .mds.
PI_HAT is from Plink --genome .genome file. PCA is not
automatically used to create IDs to remove, which would be done for a study
where almost all samples are tightly clustered and a few outliers are then
removed. Typically the PCA shows admixture and various ethnicities and
population stratification correction without sample removal based on PCA is the
best approach.

See QC_Plot.pdf sample QC distributions with outliers
to remove in blue and corresponding QC_RemoveIDs.txt with ChipIDs failing the
various QC metrics. You may delete some rows from this file if you want a more
aggressive ChipID/Sample inclusion or if the threshold does not look correct on
the plot. Remove thresholds for each quality metric are determined by the R
package extremevalues function getOutliers. You can also use exclusions based
on other PennCNV log metrics provided in QC_Plot_2.pdf and QC_RemoveIDs_LRRmean_BAFmean_BAFSD_BAFDRIFT_WF.txt.

Then run this code found in the GeneRef folder to
create a "clean" .rawcnv file to use for ParseCNV
association

perl FilterCNV.pl QC_RemoveIDs.txt
CNVCalls.rawcnv 5 remove

UPDATE: The R getOutliers is not very dynamic in
selecting outliers based on a variety of possible data quality metric
distributions. This typically results in too many samples being excluded.
Again, look for linear mode for inclusion and exponential mode for exclusion.
This can be easily done in R or Excel. Once an appropriate threshold is
determined from reviewing the first run distributions or from a static know
threshold for consistency, ParseCNV_QC.pl can be run again specifying
thresholds to drive the plots and sample remove report.

Run once with no
thresholds specified and review automatically determined thresholds based on
outliers in your dataset.

Then you may
specify some adjusted threshold for a given quality metric based on your review
of the plots.

Removing CNV calls may introduce significant bias and
there are a limited number of metrics to make an informed decision. Probably
the best metric is the confidence score based on the cumulative probability
from the HMM. ParseCNV annotates average numsnp, length, and confidence of each
potential CNVR association which captures this
quality/confidence issue with less bias than upfront CNV call quality control.

Step by Step Processing
Description:

CNV calls are mapped into SNP based statistics which are
then merged into CNVRs based on proximity (1Mb) and similar significance (power
of ten P value) of neighboring SNPs.

The most significant subregion is presented in the
case of multiple significant proximal CNVRs.

Deletion and Duplication p values are then calculated
by pooling 0 and 1 copy for deletion and 3 and 4 for duplication.

The runtime log lists the processing
steps:

scan_range2: Takes SNP based statistics and
collapses into CNVRs based on distance 1Mb and p-value one power of ten in
negative log.

scan_region: Takes CNVR and annotates nearest
UCSC gene (more inclusive than RefSeq genes) and proximal distance until gene
boundary (including introns, 0 is direct overlap). Definition files can be
updated or different builds used by downloading from UCSC Tables.

scan_DescPathway: Takes CNVR with gene and gives full
text description of gene name and pathway (many not_found in current version of
definition file).

scan_rangeJustPos2: Takes SNP based statistics and
gives SNP indexes based only on distance 1MB to limit redundancy of small
regions with different significance levels.

CNVToPed: --includePed since large file. Del
and Dup Ped files created for more statistics in Plink. In the Del Ped,
state1cn=0 → 1 1 and state2cn=1 → 1 2 other → 2 2. In the Dup
Ped, state5cn=3 → 1 2 state6cn=4 → 1 1 other → 2 2. This definition
closely resembles the CNV state frequencies (1 1=rare, 1 2=common, 2 2=very
common). Note many probes will be always diploid (no CNV calls) so NA will
result in Plink. Sorted.map created automatically should be used with the
resulting ped.

vlookup SNP to Region ID: Add column Region ID.

countBarcodeOCCURENCE_V2: Count Region ID occurrences.

vlookup Count Sig Regions: Add column Count Region ID.

Sort by p-value: Sort from Chr-Pos to low->high p-value.

Data-filter-advanced-unique
records only:
Include the first occurrence of each Region ID to filter out many significant
regions close together, keeping the lowest p-value occurrence.

SpecificBafLrrAccessMany: Assumes all BafLrr files have
ProbeID as first column and have same probe sorted error, but can be different
from map. If Probe ID not first column, provide probe order reference file
--probesBafLrr <file.txt>. Needs additional SampleID to BafLrrFilePath
file specified on command line by --idToPath <file.txt>. Highly
recommended if BafLrr files available. Takes CNVRs and contributing sample IDs
and creates files with BAF and LRR values for each CNVR with all samples and a
master file with all CNVRs and signals in all samples with CNV contributing to
association in BafLrr Folder. Used for review of specific association region
across many samples.

Main output: CNVR_ALL_ReviewedCNVRs.txtNote results sorted by deletion p
are followed by results sorted by duplication p (See CNVType column).

Copy and paste text into
EXCEL to review output! Do not try viewing in shell terminal except perhaps
*_brief.txt.

For
big data, cells may overflow to the next line since the total number of
characters that a cell can contain is 32,767 characters. The Sample
ID columns are usually the problem. You will notice this problem if the first
column is not all chr:start-stop values. To fix this,
add a column with an index, sort by the chr:start-stop
column, delete rows not starting with “chr” at the top and bottom, and sort by
the index. Full details will not be present for very common CNVs but these are
typically not of interest anyway. Also consider making Sample IDs a simple
index number rather than a lengthy text string. You can use vlookup in Excel to
replace sample IDs with index sample IDs in your input
.rawcnv file.

Deletion and Duplication significant
CNVRs with counts in cases and control and p-value. Sample IDs, CN states, CNV calls
contributing to each association and many other statistics and tracking data
are provided (100 fields in total verbose output and 40 highly informative
fields in brief output).

UCSC custom track bed file with CNV calls.

CNV_ContributingCalls.txt: CNV contributing calls for
each CNVR

CNVDEL.ped CNVDUP.ped: .ped files for further
statistics deletion and duplication are generated for use in Plink.

CNVR_ALL_ReviewedCNVRs: All significant CNVRs with CNVType,
redFlagCount, and redFlagReasons.

People are often confused by the small range and
value of the first 2 columns: CNVR and CountSNPs. This is the most significant
overlapping region, not the entire span of contributing CNVs. CNVRange and
AvgNumSnps columns reflect the entire span of contributing CNVs. CNVs at a
particular locus do not have precisely the same boundaries, especially rare
CNVs (see paper Figure 3).

For querying replication based on genomic coordinate
range use FindSnpsInRange.pl StatsFile Chr Start Stop (much better than grep
cytoband since not specific enough).

AllRes.pdf: Image created with BAF and LRR of
specific association regions across contributing cases with --idToPath option

If you get the error: “-bash: convert: command not
found” run: “sudo apt-get install imagemagick”. This command combines the BAF
LRR plots into a single image to scroll through CNVRs sorted by significance.

If you have a local R install due to lack of
administrator privileges on your system, you need to provide the full path to
the R executable such as "/work/1a/joe/R/R-3.1.2/bin/R". This path
needs to replace "R" in instances of "R CMD" in these 3
scripts:

5x10-4 is a conservative bar for CNV
genome-wide significance surviving multiple testing correction based on
analysis of Illumina and Affymetrix genome-wide SNP arrays. The typical bar of
5x10-8 used in GWAS is not appropriate for CNV considering:

§the number of probes with a nominal frequency of CNV occurrence (only
probes with some CNV detected are informative)

§the number of probes with enrichment in cases vs. controls and vice versa
(evidence of more case enriched loci than control enriched loci)

§probes with less than 1% population frequency of CNV (optionally)

§the number of CNVRs (multiple probes are needed to detect a single CNV and
should not count separately for multiple testing correction)

These are done in order of increasing effort per
locus but the number of loci will be filtered down by each step.

Red flag count <= 3 may be considered high
confidence results. More important red flags to fail a CNVR as low confidence
include: AvgProbes<5, DgvEntries>10, PenMaxP>0.5 and high frequency,
SegDups>10, Recurrent, FreqInflated,
AvgConf<10, ABFreqLow.

Standard Filter for high significant and confident
results for SpecificBafLrr Visual: Sort RF <=3, delP<5x10^-4 and
ORDup>1 OR dupP<5x10^-4 and ORDup>1, (on exon)

Overlapping multiple Database of Genomic
Variants (DGV) entries, representing CNV signals observed in “healthy”
individuals, suggesting that a potential association result in the study at
hand may be false.

TeloCentro

any overlap

Residing at centromere and telomere proximal
regions as they often have sparse probe coverage and only have a single
flanking diploid reference to base CNV calls.

CNVs captured with low average number of
probes, contributing to association with low confidence. If an association
depends on a preponderance of small CNVs, the likelihood of false positive is
high.

Recurrent

any overlap

Locus frequently found in multiple studies
such as TCR, Ig, HLA, and OR genes. TCRs undergo somatic rearrangement due to
VDJ recombination causing inter-individual differences in the clonality of
T-cell populations and thus are not true CNVs, necessitating exclusion.

PopFreq

>0.01

CNV regions with high population frequency
(for rare CNV focused studies) indicate that probe clustering is likely
biased due to a high percentage of samples with CNV used in clustering
definition thus biasing CNV detection.

PenMaxP_Freq_HighFreq

PenMaxP >0.5, Freq >0.5,

HighFreq >0.05

CNV peninsula of common CNV (sparse probe
coverage and nearby high frequency CNV) indicates that within the range of
contributing CNV boundaries there is a non-significant (p>0.5) p-value which is notably different from the CNVR association
typically due to random extension of common CNVs to neighboring sparse or
noisy probes.PenMaxP is the worst p-value in the span of CNV calls
contributing to the significant CNVR. Freq is the frequency of this PenMaxP
worst p-value. HighFreq is the frequency any non-nominally significant
p-value (P>0.05).

FreqInflated

>0.5 sids at this locus have
>(maxInflatedSampleCount-2) occurrences in all significant results

A large gap in probe coverage exists within
the CNV calls indicating uncertainty in the continuity of a single CNV event,
typically due to dense clusters of copy number (intensity only) probes with
large intervening gaps.

ABFreq

<1% values (0.1,0.4) or (0.6,0.9)

For duplications, AB banding of BAF at 0.33
and 0.66 for CN=3 or 0.25 and 0.75 for CN=4 are very important observations
given the relatively modest gain in intensity observed in duplications.

AvgConf

<10

The HMM confidence score in PennCNV is a
superior indication of CNV call confidence compared to numsnps and length in
studies comparing de novo vs. inherited CNV calls, giving an indication of
the strength of the CNV signal or aggregate difference in probability between
the called CN and the next highest probability CN. Other CNV calling
algorithms give different range confidence scores or lower values might mean
more confidence (i.e. call p value) so threshold may need modification. It is
recommended to be in .rawcnv file as column 8 i.e. “conf=20.659”
but not required.

AvgLength

<10kb

A classical confidence scoring parameter is
the length of the CNV. If the CNV is too small, it is submicroscopic and even
if many probes are tightly clustered, bias of local DNA regions and probe
overlap make confidence difficult.

If you did the case:control
(default), you want to look at “DelTwoTail” and “DupTwoTail”.

If you did the --tdt, you want to look at “TDTDel”
and “TDTDup” for TDT P-value and “NormParDel” and “NormParDup” for the count de novo.