Figure S2. Original log 2 transformed raw CodeLink microarray data shown in an MA-plot. Upper
panel: the x-axis shows the average of the average gene expression of BF and FF infants
for each probe. The y-axis shows the difference between the two averages. The color
bar shows the count density of the plotted data. BF samples exhibited a systematically
higher gene expression level relative to FF samples. Lower panel: loess normalization
of the original log 2 transformed raw CodeLink microarray data. This normalization
procedure corrected for the systematic increase in BF gene expression relative to
FF gene expression seen in the upper panel. The data were adjusted by the loess fit
(blue line) shown in the upper panel.

Figure S4. Example of canonical correlations of random gene sets. Analogous to the random gene
set shown in Figure 4. Random (1,000) gene sets were sampled and analyzed. The first 5 of 1,000 are shown.

Figure S5. Example of the best performing genes in random gene sets. These data are analogous
to the random gene set shown in Figure 5. Random (1,000) gene sets were sampled and analyzed. The first 5 of 1,000 are shown.

Figure S6. A principal components analysis (PCA) of the virulence characteristics combined
with all host gene triples. Top panel: host intestinal biology genes. Middle panel:
immunity and defense genes. Bottom panel: random genes. The plots show the proportion
of variation explained by the first and second principal components versus the variation
explained by just the second principal component. The analyses provide a characterization
of a lower dimensional structure underlying the data. When combined with the virulence
characteristics, the immunity and defense genes (middle panel) generally exhibit a
simpler latent structure compared to the other gene sets (top and bottom panels),
as judged by the slight northeast shift in the point cloud. While the latent structure
identified by PCA need not reflect a relationship between the virulence characteristics
and the host genes, it may, in which case the immunity and defense genes are slightly
more promising as a set with respect to future canonical correlation analysis (CCA)
aimed at uncovering simple and strong relationships between the metagenomic and host
transcriptome data. In this way, PCA may be used as a screening device to identify
promising gene triples for CCA analysis.