1: These builds can still run on AMD processors, but they're statically linked to Intel MKL, so some linear algebra operations will be slow. We will try to provide an AMD Zen-optimized build as soon as supporting libraries are available.

Source code and build instructions are available on GitHub. (Here's another copy of the
source code.)

3 Jul: .fam/.psam files now load properly when
only the IID column is requested or present.

29 Jun: .bim/.pvar files with more than ~134
million variants load properly again (given sufficient memory).

25 Jun: "--pca approx" eigenvalues should now
be (approximately) correct (they were previously double what they should have
been). Fixed a few odd-sample-count export cases which were broken around 30
May.

22 Jun: Fixed a few log messages which were
broken in the 19-20 Jun builds. Added debug-print code to support an ongoing
multithread-VCF-dosage-import bug investigation (if you are encountering
mysterious "File read failure" errors during VCF import or "Malformed .pgen"
errors when reading the result, adding "--threads 1" to your VCF-import command
will probably solve your immediate problem, but if you can also send me a .log
file from the failing multithreaded run (or even better, test data) that would
be very helpful).

20 Jun: Fix GRM/PCA/score-computation bug
introduced on 30 May. If you used the 30 May or
an early June build for GRM/--pca/--score, you should repeat the operation(s)
with this build; apologies for the error.

19 Jun: Fixed rare --ref-allele/--alt1-allele
corner case which could occur when a missing allele was replaced with a very
long allele.

21 May: --pgen-info command added (displays
basic information about a .pgen file, such as whether it has any phase or
dosage data).

20
May: Unbreak --import-dosage + --map.

17 May: --import-dosage and .gen import were
broken for the last several weeks; this should be fixed now. A1 column added
to --adjust output in preparation for multiallelic variants. --glm 'a0-ref'
modifier renamed to 'omit-ref'.

15 May: Fixed chrX allele frequency
computation bug when dosages are present. --ld modified to be based on major
instead of reference alleles, to play better with multiallelic variants.
--hardy header line and allele columns changed in preparation for multiallelic
variant support.

8 May: --vcf dosage=HDS should now handle
files with no DS field properly.

28 Apr: Fixed a --glm bug which occurred when
autosomes and sex chromosome(s) were both present, or both chrX and chrY were
present. If you performed a whole-genome --glm
run with the 9 Feb 2018 build or later, you should rerun with the latest
build. However, single-chromosome and autosome-only --glm runs were
unaffected by the bug.

22 Apr: --export bgen-1.2/bgen-1.3 should now
work for chrX/chrY/chrM; also fixed import bugs for those chromosomes.

16 Apr: --ref-from-fa contig line parsing
bugfix.

14 Apr: --export bgen-1.2/bgen-1.3 implemented
for autosomal diploid data. Operations like --pca which require decent allele
frequencies now error out when frequencies are being estimated from less than
50 samples, unless you add the --bad-freqs flag. Phased dosage support
implemented. Sample missingness rate in exported .sample files is now based on
dosages rather than hardcalls. Non-AVX2 phase subsetting bugfix. --vcf +
--psam bugfix. --vcf dosage= now ignores the hardcall when a dosage is
present; instead, it's regenerated under --hard-call-threshold 0.1 (unless you
specified a different threshold). --bgen 'ref-second' modifier renamed to
'ref-last', to generalize properly to multiallelic variants.

31 Mar: --export haps{legend} should now work
properly when --ref-allele/--ref-from-fa/etc. flips some alleles in the same
run.

23 Mar: Windows builds should work properly
again (the 20-21 Mar Windows builds were badly broken). --glm now supports
log-pvalue output (add the 'log10' modifier), and these remain accurate below
the double-precision floating point limit of p=5e-324.

11 Mar: Fix --pheno segfault in last week's
builds that could occur when the file didn't have a header line.

9 Mar: Fix "File write failure" bug that
occurred when a single write operation was larger than 2 GB (this could occur
when running --make-bed with more than 128k samples). Reduced --make-bed
memory requirement.

FID is now an optional field: if it isn't in the input .psam file, it's
omitted from several output files by default (these now have 'maybefid'
and 'fid' column sets, where the default set includes 'maybefid'), and
treated as always-'0' by any operation which requires FID values (such as
--make-bed). When exporting genomic data files, 'maybefid' also treats
the column as missing if all remaining values are '0'.

Relatedly, when importing sample IDs from a VCF or .bgen file, the
default mode is now "--const-fid 0", and no FID column will be written to
disk at all. --keep, --remove, and similar commands also now have
"--const-fid 0" semantics when an input line contains only one token.
You can now act as if IID is the only sample ID component, if that's what
makes the most sense for your workflow. Conversely, it is now necessary
to explicitly use --id-delim when you want to split the VCF/.bgen sample
IDs into multiple components.

MT is treated as a haploid chromosome again. In PLINK 1.9 and earlier
plink2 builds, MT was treated as diploid-ish to avoid throwing away
information about heteroplasmic mutations; as a consequence, the
--glm(/--linear/--logistic) genotype column and commands like "--freq
counts" used a 0..2 scale. Now that plink2 has proper support for
dosages, this kludge is no longer necessary.

--glm's 't' column set has been renamed to 'tz', to reflect it being a
T-statistic for linear regression but a Wald Z-score for
logistic/Firth. The corresponding column in .glm.logistic{.hybrid} and
.glm.firth files now has 'Z_STAT' in the header line.

Also, --glm now defaults to regressing on minor instead of ALT allele
dosages (this can be overridden with 'a0-ref').

The final alpha 1 build has been tagged in GitHub, and will remain
downloadable from here for the next few months.

11 Feb: .king.cutoff.in/.king.cutoff.out files
now end in .id, for consistency with other output files with sample IDs and no
other information. Similarly, --mind's output file now has the extension
.mindrem.id and defaults to having a header line. You can now use
--no-id-header to suppress the header line (and force the columns to be
FID/IID) in all .id output files.

9 Feb: Forcing .pvar QUAL/FILTER output when
no such values are loaded no longer causes a segfault.

5 Feb: AVX2 phase-subsetting bugfix.

3 Feb: --score 'dominant' and 'recessive'
modifiers added.

30 Jan: Fix .pgen writing bug which occurred
when the number of variants was a multiple of 64 and the number of samples was
large.

24 Jan: "--export oxford" now supports
bgzipped output.

21 Jan: --glm now always reports an additional
'A1' column, indicating which allele(s) correspond to positive genotype column
values. --glm column sets have been changed to revolve around A1 instead of
ALT, so minor script modifications may be necessary when switching to this
build.
In this build, A1 and ALT are still synonymous. This will change in alpha 2:
A1 will default to the minor allele(s) to reduce multicollinearity (imitating
PLINK 1.x's behavior in the absence of --keep-allele-order), though you will
still have the option of forcing A1=ALT.

12 Jan: Fixed "--glm interaction" bug that
occurred when multiple consecutive variants had no missing calls.
We recommend redoing all --glm runs with the
'interaction' modifier which were performed with a build produced between 27
Nov 2017 and 10 Jan 2018 inclusive.

9 Jan: Added 'no-idheader' modifiers to a few
commands, and made that the default for --make-grm-bin/--make-grm-list to avoid
breaking interoperability.

7 Jan: --vcf can now be given a sites-only VCF
when the run doesn't require genotype data. Sample ID files, such as those
produced by --write-samples, now include a header line by default; this will be
necessary to distinguish between FID-IID and IID-SID output in the future.
(With --write-samples, you can suppress the header line by adding the
'noheader' modifier.)

18 Dec: --extract/--exclude can now be used
directly on UCSC interval-BED files (ok for coordinates to be 0-based or for
no 4th column to be present). "--output-chr 26" now causes PAR1/PAR2 to be
rendered as '25' (for humans), to restore interoperability with programs like
ADMIXTURE which can't handle alphabetic chromosome codes. --merge-x
implemented (usually needs to be combined with --sort-vars now). --pvar can
usually handle 'sites-only' VCF files (e.g. those released by the gnomAD project) now.
--thin, --thin-count, --thin-indiv, and --thin-indiv-count implemented.

15 Dec: Fixed --extract-if-info and
--exclude-if-info's behavior for non-numeric values which start with a number.
Existence-checking flags renamed to --require-info and --require-no-info for
naming consistency.

11 Dec: --king-table-subset flag added. This
makes it straightforward to perform two-stage relationship/duplicate detection:
start with --make-king-table on a small number of higher-MAF variants scattered
across the genome, and then rerun it with --king-table-subset on an appropriate
subset of candidate sample pairs from the first stage. --bp-space implemented
(useful for the first stage above).
The two-stage workflow was first implemented by Wei-Min Chen in a recent
version of KING;
contact him for citation information.

7 Dec: Fixed bug which could occur when
filtering samples from a phased dataset. Windows AVX2 build now available.

28 Nov: --import-dosage 'format=infer' (this
is now the default) and 'id-delim=' (needed for reimport of "--export
A-transpose" data) options added. Fixed --import-dosage bug that caused it to
error out on missing genotypes under format=1. --no-psam-pheno (or
--no-pheno/--no-fam-pheno) can now be used to ignore all phenotypes in the
sample file, while keeping the phenotype(s) in the --pheno file if one was
specified.

14 Nov: Fixed bug that caused --export A{D} to
hang when the number of variants was between 65 and about a thousand.

4 Nov: Linux and OS X prebuilt AVX2 binaries
now available; these should work well on most machines built within the last 4
years. Fixed another Firth regression spurious NA bug. Fixed --score bug that
occurred when sample filter(s) were applied simultaneously. Fixed a --ld
phased-hardcall handling bug. Array-popcount upgrade in progress (thanks to
recent work by Wojciech Muła, Nathan Kurz, Daniel Lemire, and Kim
Walisch).

3 Nov: Fixed multipass --export A{D} bug.
--dummy dosage-freq= now fills in hardcalls with the default
--hard-call-threshold cutoff of 0.1 when --hard-call-threshold is not
explicitly specified.

16 Oct: --ref-from-fa flag implemented, to set
reference alleles from a FASTA file. (Note that this may be unable to
determine which allele is reference when length changes are involved, but it
should always work for SNPs and multi-nucleotide polymorphisms.) --update-name
implemented. Fixed column-set parsing bug in 13 Oct build.

9 Oct: Fixed --ld's handling of some dosage
and haploid cases. Fixed bug which could cause --make-pgen to discard
phase/dosage information when extracting a small variant subset. --geno-counts
no longer double-reports chrY counts.

8 Oct: --ld implemented, with supported for
phased genotypes and dosages (try "--ld [var1] [var2] dosage"). Fixed tiny
bgen-1.1 import bug that triggered when the number of threads exceeded the
number of variants. Allele frequency computation no longer crashes on chrX
when dosages are present but only hardcalls are needed.

1 Oct: Fixed GRM computation bug which
sometimes caused segfaults when both dosages and missing values were present.
--glm is now a bit faster when many covariates are present.

2 Jul: Improved multithreading in BGEN
v1.2/1.3 importer. Python writer can now be called with multiple variants at a
time.

25 Jun: Basic BGEN v1.2/1.3 import (unphased
biallelic dosages; suffices for main UK Biobank data release).
--warning-errcode flag added (causes an error code to be returned to the OS on
exit when at least one warning is printed).

20 Jun: --condition-list + variant filter
bugfix.

5 Jun: --make-pgen memory requirement greatly
reduced. End time now printed to console in most situations.

4 Jun: --hwe no longer causes a segfault when
chrX is present and no gender information is available. Fixed --dummy bug.

1 May: VCF dosage import/export, --vcf-min-gq,
and --read-freq implemented. --score can now work with standard errors.
--autosome{-par} now works properly. SNPHWE2 and SNPHWEX functions relicensed
as GPL-2+, to enable inclusion in the HardyWeinberg R package.

20 April: .sample export bugfix (didn't work
if file was over 256 KB and no phenotypes were present). --dummy implemented
(can now generate dosages).

19 April: --hardy/--hwe chrX bugfix (thanks to
Jan Graffelman for catching the problem and validating the fix).
--new-id-max-allele-len now has three modes ('error', 'missing', and
'truncate'), and the default mode is now 'error' (i.e.
--set-missing-var-ids and --set-all-var-ids now error out when an allele code
longer than 23 characters is encountered, instead of silently truncating).
--score implemented, and extended to support variance-normalization and
multiple score columns (these two features provide a simple way to project
new samples onto previously computed principal components).

11 April: --pca var-wts bugfix, and --pca
eigenvalue ordering bugfix. --glm linear regression and --condition{-list}
support added. --geno/--mind/--missing/--genotyping-rate can now refer to
missing dosages instead of just missing hardcalls (note that, when importing
dosage data, dosages in (0.1, 0.9) and (1.1, 1.9) are saved but there usually
won't be associated hardcalls).

Preservation of reference alleles (without requiring constant use of
--keep-allele-order), phase information, and the VCF QUAL, FILTER, and
INFO fields. Use --make-pgen instead of --make-bed when importing a VCF;
the fileset can then be referenced with --pfile. We will provide 1000
Genomes phase 3 downloads in the new fileset format as soon as
multiallelic variants are also supported.

The new .pgen file format incorporates SNPack-style
genotype compression, frequently reducing file sizes by 80+% with
negligible computational cost. To allow users to take advantage of
genotype compression without sacrificing compatibility with scripts
expecting old-style .bim and .fam text files, PLINK 2.0 supports a hybrid
.pgen + .bim + .fam usage mode (--make-bpgen/--bpfile). We've also
provided a Python library for reading and writing .pgen files, to
simplify migration to the new format. (PLINK 1 .bed files are valid
.pgen files, so code written on top of the library is
backward-compatible.)

Firth regression ('--glm firth-fallback', '--glm firth'). Standard
logistic regression fails to converge, yielding 'NA' or nonsense results,
when the 2x2 allele/phenotype contingency table has an empty cell
("quasi-complete separation"); this is common, and especially likely
to happen with the strongest associations. Firth regression can
prevent you from missing these associations. The fast 'firth-fallback'
mode (only use Firth regression when there's either an empty contingency
table cell or regular-logistic-regression convergence failure) gets you
most of the benefit for a fraction of the computational cost.

'--pca approx' (equivalent to EIGENSOFT 6 fastmode with default
parameters). If you have more than ten thousand samples, only need the
top principal components, and can tolerate ~0.1% error in the last PC,
this can save you a ton of compute time.

The 64-bit Linux build can handle linear algebra on matrices with
more than 231 elements (so regular --pca is no longer limited
to ~46000 samples), as long as your system has enough memory.

KING-robust kinship coefficients (--make-king, --make-king-table,
--king-cutoff). These remain accurate when good population allele
frequency estimates are unavailable. We have found --king-cutoff to be
much more reliable than the PLINK 1.9 --rel-cutoff flag for removal of
close relations.

Proper support for dosages (decimal allele count expected values).
When .gen/.bgen files are imported, hardcalls and dosages are
saved to the .pgen. Operations which naturally extend to decimals (e.g.
--pca, --glm, --freq, --maf/--mac) use the dosage information when it's
present, while methods that can only make use of hardcalls (e.g.
KING-robust, Hardy-Weinberg exact test) simply ignore the dosages.
--hard-call-threshold can now be used to change the saved hardcalls without changing the dosages.

Much more multithreaded code.

Most commands let you control which columns appear in the main output
file(s).

Graffelman and Weir's extended chrX
Hardy-Weinberg exact test, which takes male allele frequencies into
account. We've found that this tends to identify quite a few obviously
miscalled chrX variants which were not caught by the usual QC
filters.

Oxford-style haplotype filesets can now be imported and exported
(--haps, '--export haps'/'--export hapslegend').

Sample-major PLINK binary files can now be efficiently exported
('--export ind-major-bed'). This is close to 3 orders of magnitude
faster than the previous implementation (PLINK 1.07 --make-bed +
--ind-major).