ABSTRACT

Fermented vegetables are highly popular internationally in part due to their enhanced nutritional properties, cultural history, and desirable sensorial properties. In some instances, fermented foods provide a rich source of the beneficial microbial communities that could promote gastrointestinal health. The indigenous microbiota that colonize fermentation facilities may impact food quality, food safety, and spoilage risks and maintain the nutritive value of the product. Here, microbiomes within sauerkraut production facilities were profiled to characterize variance across surfaces and to determine the sources of these bacteria. Accordingly, we used high-throughput sequencing of the 16S rRNA gene in combination with whole-genome shotgun analyses to explore biogeographical patterns of microbial diversity and assembly within the production facility. Our results indicate that raw cabbage and vegetable handling surfaces exhibit more similar microbiomes relative to the fermentation room, processing area, and dry storage surfaces. We identified biomarker bacterial phyla and families that are likely to originate from the raw cabbage and vegetable handling surfaces. Raw cabbage was identified as the main source of bacteria to seed the facility, with human handling contributing a minor source of inoculation. Leuconostoc and Lactobacillaceae dominated all surfaces where spontaneous fermentation occurs, as these taxa are associated with the process. Wall, floor, ceiling, and barrel surfaces host unique microbial signatures. This study demonstrates that diverse bacterial communities are widely distributed within the production facility and that these communities assemble nonrandomly, depending on the surface type.

IMPORTANCE Fermented vegetables play a major role in global food systems and are widely consumed by various global cultures. In this study, we investigated an industrial facility that produces spontaneous fermented sauerkraut without the aid of starter cultures. This provides a unique system to explore and track the origins of an “in-house” microbiome in an industrial environment. Raw vegetables and the surfaces on which they are handled were identified as the likely source of bacterial communities rather than human contamination. As fermented vegetables increase in popularity on a global scale, understanding their production environment may help maintain quality and safety goals.

INTRODUCTION

Lactic acid bacteria (LAB) are naturally found on vegetable surfaces. As a result of fermentative metabolism, they produce organic acids (1, 2). Lactic acid fermentation could proceed by indigenous bacteria or with the aid of starter cultures to shift the bacterial populations inherent to vegetable microbiomes (3). Although LAB constitute a small fraction of the total bacteria present on raw vegetables, they quickly adapt to conditions when subjected to a purposeful fermentation process. The LAB flourish, whereas other bacteria that are incapable of withstanding low pH and high salinity are inactivated. These include potential spoilage and pathogenic microorganisms that are outcompeted by the LAB. This has lasting benefits to food production, including maintaining safety and stability prior to consumption. In terms of organoleptic properties, fermentations transform raw vegetables to provide distinctive flavors and textures dependent on the desired outcome (e.g., kimchi, sauerkraut, cucumber pickles, etc.). Fermented foods have become increasingly popular in recent years due to purported beneficial properties to enhance gastrointestinal health and overall well-being (4–10).

In order to understand the microbiology of a vegetable fermentation facility, the bacterial communities colonizing the facility were profiled. This is an effort to link the establishment of bacterial microbiomes with desired production outcomes, such as mitigating safety and spoilage risks and standardizing desirable traits for the consumer.

A limited number of food fermentation environments have been profiled previously. Bacterial and fungal communities were linked to the cheese ripening process in a cheese production facility (11). In another study, microbial assemblages were characterized in a brewery environment (12). Interestingly, raw substrates, such as grain, hops, and yeast, were the main source of inoculation of the brewery environment, as opposed to human skin, outdoor air, and soil sources. Fermentative microbes are typically considered deleterious contaminants unless purposely sought in the production of sour beer and other brewing practices. Both the creamery and brewery studies identified starter cultures as a major contributor to surface-colonizing microbiota. The potential for shotgun metagenomics in foods has been examined in cheese, kimchi, Chinese rice wine, and cocoa beans (13).

Standardization of natural sauerkraut fermentation may be hindered, as it relies on several factors with some more controllable than others. To date, the physical location (i.e., where barrels are housed) where fermentation occurs has not been linked to final product quality in a scientifically rigorous manner. On occasion, isolated batches of fermented vegetables spoil although they are located in the same fermentation room as unspoiled batches. This occurs in the absence of a clear explanation that may ultimately be due to microbiological conditions. Briefly, the microbiologies of built environments are typically vastly different depending on their primary purpose. The fermented food production facility studies described herein use different substrates (i.e., dairy versus grain versus vegetables) and employ fermentation approaches that involve a different set of microbes that are unique to the system. Moreover, the spontaneous fermentation (i.e., no starter culture) of this sauerkraut production increases the uncertainty of the microbial composition of the facility, if not the food product itself. This provided the key questions and scientific motivation to profile the microbiology of several surfaces that contact raw cabbage throughout sauerkraut production. The phylogenetic diversity measures of the bacterial communities in the production facility were compared to identify sources of microbial contamination using Bayesian modeling. Moreover, whole-genome shotgun sequencing was performed to provide further granularity to the microbiology of the particular production facility. High-throughput 16S rRNA gene amplicon sequencing provides a deeper resolution of the unique ecosystems that colonize the facility. The objective of the study is to provide a better understanding of the bacterial communities contaminating or inhabiting facilities where vegetables are fermented. Given the role of endogenous bacteria in traditional spontaneous vegetable fermentations, this study serves as a model for human-microbe transfers in a food processing system that requires a considerable degree of human handling. Microbes could be transferred from human skin to facility surfaces during human contact experienced in locations such as the processing and storage areas. These include bacterial families, such as Streptococcaceae and Propionibacteriaceae.

RESULTS AND DISCUSSION

Production facility microbiota are distributed relative to sampling location.Amplicon sequencing of the 16S rRNA phylogenetic marker gene was employed to survey bacterial consortia in the fermentation facility environment. In total, 32 swab samples were collected representing storage areas, a processing room, vegetable handing equipment, fermentation rooms, door handles, and a restroom. See Fig. 1 for the detailed scheme of sampling locations within the facility and Fig. S14 in the supplemental material for material flow to the sampling locations.

Facility map with sampling surface key. Facility map is color-coded to show different surface types, sample names, and sampling areas. Raw vegetables enter the facility through a garage door on the left side of the map and are transported on wooden pallets to the processing room and hand-processed for fermentation.

Beta diversity was measured to assess community differences between locations. Bray-Curtis dissimilarity of microbial communities at the phylum level revealed that surfaces generally cluster based on their physical location (Fig. 2). Accordingly, raw vegetable and vegetable handling surfaces are clustered closely with higher proportions of Proteobacteria, whereas environmental surfaces (i.e., processing room, fermentation room, and dry storage surfaces) exhibit an abundance of Firmicutes and Actinobacteria (Fig. 2). Thus, distinctly clustered patterns emerged, indicating phylogenetic variation between bacterial communities at different sampling sites. Two samples from the processing room floor (i.e., floor drain and mopping sink surfaces) exhibited higher concentrations of Actinobacteria and clustered independently from the other collected samples. It has been previously reported that the Gram-negative Proteobacteria are more abundant on moist surfaces, whereas Gram-positive Firmicutes are abundant on dry and cold residential kitchen surfaces (14). This is consistent with the detection of Firmicutes in high abundance on refrigerator surfaces (Fig. 2). The phylogenetic profiles of the floors of the dry storage and fermentation room were more heterogeneous at the phylum level, characterized by various proportions of Proteobacteria, Firmicutes, and Actinobacteria. In general, the phyla Proteobacteria, Firmicutes, and Actinobacteria significantly varied between raw vegetables (i.e., raw cabbage and vegetable handling surfaces) and environmental surfaces (i.e., processing room, fermentation room, and dry storage surfaces) (Fig. 3).

At the family level, storage room floors were characterized by high relative abundances of Micrococcaceae, Lactobacillaceae, Leuconostocaceae, and Bacillaceae (Fig. 4). This is expected since Lactobacillaceae and Leuconostocaceae are the primary bacterial actors early in vegetable fermentations and other food microbiomes (2, 15–19). Similarly, high abundances of Leuconostocaceae were observed in the fermentation room, both on the floor and in filled barrels, and on most door handle surfaces (Fig. 4). Interestingly, this taxon was detected in very low abundance (<0.01%) in raw vegetable and food processing equipment, although the more sensitive whole-genome sequence data indicated 1 to 2% abundance levels in swab samples from raw cabbage or sink but not the shredder (Fig. S13). This suggests that appreciable concentrations of Leuconostocaceae are introduced to the facility as a result of the fermentation process. This is consistent with findings reported by Bokulich and Mills, in that substrate contact is the greatest predictor of in-house microbiome composition (11). High abundances of Moraxellaceae and Pseudomonadaceae were detected in raw vegetables and vegetable handling areas, thus distinguishing them from other environmental surfaces. This is consistent with the findings from a previous study that indicated a relatively higher abundance of these families on raw vegetables (20). Raw vegetable and vegetable handling surfaces were significantly different in Moraxellaceae, Leuconostocaceae, Lactobacillales, and Pseudomonadaceae abundances compared to processing room and dry storage areas (Fig. 5). Streptococcaceae and Propionibacteriaceae were significantly higher on fermentation barrels, which is likely due to increased human handling compared to that on other surfaces. Further details of bacterial families that significantly differed between surfaces are depicted in Fig. 5 and Table 1.

Relative abundances of bacteria in different swab surfaces at family level

Raw vegetable microbiomes segregated from all other environmental surfaces using nonmetric multidimensional scaling (NMDS) (Fig. S2a). Analysis of similarity (ANOSIM) using Bray-Curtis distances indicates that these two groups differ significantly (R = 0.69, P < 0.05) from each other compared to within-group differences (Fig. S2b). Raw vegetables had significantly lower Shannon diversity index and species richness (P < 0.05) values than did environmental surfaces, despite similar species evenness in the two groups (Fig. S3). This suggests that diversity index and species richness vary due to distinct microbial community compositions rather than uneven species distribution.

Linear discriminant analysis effect size (LEfSe) analysis at the phylum level (Fig. S4a) identified Proteobacteria and Bacteroidetes as highly abundant in raw vegetables compared to Firmicutes, Actinobacteria, Chloroflexi, and several other phyla that are predominant on facility surfaces. These phyla can also serve as biomarker phyla, as depicted in an odds ratio plot (Fig. S4b). LEfSe analysis at the family level identified Pseudomonadaceae, Enterobacteriaceae, and Comamonadaceae families as highly abundant on raw vegetables compared to Lactobacillaceae, Leuconostocaceae, and other families within the production environment (Fig. S5a). Log odds ratios identified these as biomarker families that can be used to segregate these groups. Thus, the Pseudomonadaceae, Enterobacteriaceae, and Comamonadaceae families are candidate biomarker families for raw vegetables compared to Lactobacillaceae and Leuconostocaceae for fermentation-associated biomarkers. A detailed cataloguing of the 33 abundant bacterial families identified is depicted in Fig. S5b.

Dispersal patterns of genera associated with spontaneous fermentation.By visually mapping all surface microbiomes to their physical locations within the production facility, a spatial model of bacterial niche colonization emerges (Fig. 6a). The most prominent genus anchored to a specific physical location is Leuconostoc, members of which colonized the main warehouse, dry storage areas, the lobby, and several door handles throughout the facility. Interestingly, this genus is in low abundance in the packing and processing rooms, raw vegetables, and the fermentation room directly above the walk-in refrigerator. The fermentation room is part of a new addition to the factory and had not yet been used to house fermentation barrels at the time of sampling. This is significant, as Leuconostoc spp. permeate the environment as a result of fermentation to potentially influence subsequent processes. Accordingly, Leuconostoc and Lactobacillales spp. dominated the areas where natural fermentation occurred. It is noteworthy that the sole exception is the dry storage area below the walk-in refrigerator where a relatively high abundance of Bacillus spp. was identified. This area experiences a degree of foot traffic, and thus, soil harboring Bacillus spp. was likely deposited here. A similar observation was made on the floors between the barrels in the fermentation room with lower Bacillus representation relative to other floor areas in the same room. Differences in the surface topologies are also known to influence Bacillus colonization and thus may aid survival under both arid and humid conditions (21). Chryseobacterium spp., Pseudomonas spp., and Enterobacteriaceae were detected in a nonrandom colonization pattern. These taxa likely flowed in from the raw cabbage entering the facility, as they were detected on the raw vegetables and processing equipment with which they were in contact. Accordingly, fresh fruit and vegetables often carry high levels of Enterobacteriaceae and Pseudomonas spp. as commensal microbiota; Chryseobacterium and Pseudomonas spp. have also been recovered from salads (22–24). These taxa, however, were not observed in high abundance on the floor or doors, suggesting that they are transferred via direct contact with vegetables and not by human handling. This is supported by k-mer species-specific identification, where the cabbage swab samples yielded Pseudomonas protegens (2.99% of bacterial reads) and a low signal for Pseudomonas putida (0.54%). Further evidence includes P. protegens (0.93%) and Pseudomonas fluorescens (0.45%) in the sink sample and a large contribution by P. fluorescens (10.68%), with lesser concentrations of P. putida (4.04%), P. protegens (1.38%), and Pseudomonas synxantha (1.22%) on the swabbed shredder surface (Fig. S13). In addition, and counter to expectations, the raw vegetables exhibited a low relative abundance of LAB (25). Thus, the LAB that colonize the facility were potentially acquired from increased populations as a result of fermentation (17, 25–27). An alternative explanation is that the LAB adhere tightly to the vegetable substratum and thus were not sampled by swabbing. That both occur simultaneously is supported by time-course k-mer analysis, where large amounts of LAB k-mer reads were found in the heavily mixed sauerkraut product at time zero in fermentation. Among the species sequences scored, 55.74% of sequence reads matched Lactococcus lactis subsp. lactis, 11.65% matched Lactobacillus plantarum, and 18.22% matched Leuconostoc mesenteroides (Fig. S13). Another interesting observation is that small concentrations of Staphylococcus spp. (<0.01%) were identified in fermentation barrels and bathroom floor swab surfaces (Fig. 4). These barrels were recently acquired and thus may have Staphylococcus contamination from handling during their manufacture.

Spatial distribution heatmap of bacteria in the fermentation facility environment. (a) Plots indicate relative abundances of genera as a percentage of the entire community. The scale bar on top of each map normalizes the relative abundances of the defined taxa. In the Leuconostoc map, for example, the darkest green color indicates that the walk-in refrigerator community was 19% Leuconostoc. In Staphylococcus, the darkest green color represents 4% of the total community structure. (b and c) Predicted microbial contamination sources within the vegetable fermentation facility. The maps illustrate percentages of predicted sources for members of the community, as estimated by SourceTracker. The scales above the maps represent the percentage of the total community composition for which the source accounts; data are shown for raw vegetables (blue) (b) and unknown origins (orange) (c).

The spatial distribution of bacteria within this enclosed facility is intrinsically related to the particular surface type colonized. This provides information about the microbial community structure of the built environment that may influence the food fermentation process. It is clear that fermentations impact the colonization pattern within the facility. These results may provide insight into the microbial communities that colonize surfaces that are cleaned more regularly than are the walls and ceilings. The food packing and processing room is maintained at good manufacturing practice (GMP) standards for cleanliness, which includes routine floor cleaning, and employees must wash their hands, wear gloves, and wear hair protection before entering. It is possible that the dry storage and fermentation areas accumulate raw vegetable material to maintain environmental lactic acid bacterial populations. This phenomenon is supported by previous studies, which suggest that bacterial communities are established on food production facility surfaces despite routine cleaning (11). This also suggests that cleaning practices are essential for maintenance of this low-diversity environment to prevent the colonization of other species that could disrupt the microbial community structure in this environment (11). Clearly, natural fermentation promotes a particular microbial consortium within the facility, generally in the absence of typical human skin bacteria, such as members of the families Streptococcaceae and Propionibacteriaceae. Some of the literature suggests that human skin is a primary source of bacterial contamination and can be transferred to environmental surfaces (14), such as facility areas that receive the most human contact. Environmental microbiology studies experience a potential bias in operational taxonomic unit (OTU) representation due to cell lysis, sample collection, storage, PCR, and other aspects of sample preparation (28). In this study, we have followed the protocol adapted by Bokulich and Mills that has been employed in subsequent studies (11, 13).

Source tracking reveals origins of a diverse set of colonizing microbiota.A Bayesian method (i.e., SourceTracker; see Materials and Methods) was used to further investigate the origins of microbial diversity within the facility (29). The SourceTracker approach is used in studies of the built environment. Its utility is in identifying the potential origin of bacterial contamination. This tool predicts the percentage of a reservoir community originating from a given source, which is defined a priori. Model source samples were derived from the raw cabbage and vegetable handling surface data obtained in this study. For human skin profiles, we compared our study to reference human skin profiles that were previously characterized (30). Figure 6b depicts the distribution of raw cabbage-colonizing microbiota on several surfaces near the packing and processing area. Locations farther from the processing area had a greater abundance of bacteria with unknown origins (Fig. 6c). These bacteria likely originate from the fermentation process itself and were not used to train the Bayesian modeling tool. Again, the processing room and vegetable surfaces differ in microbial composition, with human-associated microbiota contributing a very minor source of contamination. This is significant in that hygienic practices appear to be effective with the fermentation process restricting those human-associated microbes that may be able to colonize environmental surfaces.

Microbial succession within a newly established fermentation room.The newly established fermentation room (described above) was resampled (n = 15) ∼5 months subsequent to an initial characterization. This room is part of a new addition to the facility and was not previously used for fermentations. Swabs were obtained from barrels, as well as the walls, floors, and ceiling in order to provide greater resolution within the relatively small environment. These 15 samples were compared with 9 samples collected previously from the same room in order to track changes over time.

Weighted UniFrac distances, visualized as a principal-coordinate analysis (PCoA) plot, indicate the segregation of raw vegetable microbiomes from other community assemblages (Fig. 7a). More than 67% of the variability is captured by the first two principal components (PC1 and PC2). We also identified a single outlier community that was sampled from a chopped cabbage mix rather than a swab of the surface. This suggests that bacteria that strongly adhere or colonize internal structures within the vegetable are more likely to impact the built environment. Similarly, comparisons of these communities of sequence-based clusters, or operational taxonomic units (OTUs), reveal distinct differences between the two sampling points (Fig. 7b). PC1, which explains 30.19% of the variation, segregated the two sampling points when using weighted UniFrac distances. OTUs were identified taxonomically and were examined at the family level (Fig. 8). The pretransition floor, ceiling, and walls exhibited greater community diversity than did posttransition surfaces, as reflected by a high Shannon diversity index and species richness on these surfaces (Fig. S6). These microbial communities were composed of the bacterial families Enterobacteriaceae, Pseudomonadaceae, Streptococcaceae, Micrococcaceae, and Lactobacillaceae, and several other lower-abundance taxa (Fig. 8). In contrast, communities observed on unused fermentation barrels were more homogeneous and hosted significantly lower diversity and evenness than communities in postfermentation barrels (Fig. S6). The pretransition swabs from the barrels were composed mainly of Pseudomonadaceae (20% compared to 7%), Micrococcaceae (22% compared to 1%), Lactobacillaceae (13% compared to 0.1%), Lactobacillales (27% compared to 0.1%), and Leuconostocaceae (8% compared to 0.2%) compared to posttransition barrels (Table 2). These bacterial families in prefermentation barrels were most like the built environment surfaces observed in the rest of the food production facility. Postfermentation room barrels were significantly dominated by Oxalobacteraceae (23%), Comamonadaceae (16%), and Enterobacteriaceae (11%) compared to <0.1% abundance in prefermentation room barrels (Fig. 9 and Table 2). Following transition to an active fermentation room for about 5 months, the microbial community compositions shifted appreciably. Diversity indices significantly decreased for posttransition surfaces compared to pretransition surfaces, except for barrels (Fig. S6). Walls, floors, and spaces between the barrels following transition to an active fermentation room were segregated into distinct clusters relative to pretransition surfaces (Fig. S7). At this point, floor surfaces were significantly enriched (P < 0.05) with Oxalobacteraceae (37% compared to 3%), and Bacillaceae (40% compared to 2%) relative to prefermentation floor surfaces, whereas Moraxellaceae (6 to 7%) and Enterobacteriaceae (6 to 8%) were detected in similar concentrations (Fig. 9 and Table 2). The floor between the barrels posttransition was segregated distinctly due to high abundances of Enterobacteriaceae (58%), Moraxellaceae (21%), and Leuconostocaceae (8%) compared to both pre- and posttransition floor surfaces (Fig. 9 and Table 2). Of note, the floor between barrels was not sampled prior to the transition to a fermentation room. In contrast, these posttransition surfaces were distinct from pre- and posttransition floor surfaces. Thus, floor surfaces were distinguishable within the same room based on the sampling location. The low abundances of Oxalobacteraceae (3%) and of Bacillaceae (2%) on the floor between the barrels relative to other floor areas in the room may be due to differences in access to the floors by foot. These bacterial families are generally associated with soil surfaces. The swabs collected from the posttransition wall surfaces were significantly enriched (P < 0.05) with Comamonadaceae (53% compared to 3%), and in pretransition wall surfaces, high abundances of Lactobacillaceae (11% compared to <1%) and Micrococcaceae (12% compared to 4%), were detected compared to posttransition wall surfaces, suggesting the bidirectional flow of environmental microbial populations to the newly established fermentation room (Fig. 9 and Table 2). It was somewhat surprising that LAB were found in higher abundances prior to active fermentations. Although a decrease in Lactobacillaceae abundance is likely explained by the accompanying increase in bacterial diversity following the area's transition, a more diverse community may restrict the Lactobacillus populations in this stable ecosystem.

(a) Principal-coordinate analysis of environmental swab surfaces. Weighed UniFrac distances were used to assess beta diversity. Raw vegetable swabs (blue circles) are genetically distinct from environmental swabs (red circles). Approximately 67% of the variability can be explained by the first two principal components (PC1 and PC2). (b) Principal-coordinate analysis among pre- and postfermentation surfaces. A high degree of genetic dissimilarity between the prefermentation (blue circle) and postfermentation (red circles) environmental surfaces was observed. Approximately 50% of the variability can be explained by PC1 and PC2.

Interestingly, the walls and ceiling surfaces sampled posttransition contained between 20% and 70% of an unidentified genus within the family Comamonadaceae (Fig. 8). These bacteria have been detected in low abundance in the raw vegetables and vegetable handling surfaces (3 to 5%) compared to dry storage and processing room surfaces (Fig. 5 and Table 1). This suggests that these bacteria originated from raw cabbage and vegetable handling surfaces rather than another exogenous source. Using nonmultidimensional scaling (NMDS), the comparison between the fermentation room pre- and postactive surfaces also revealed that these groups segregate from each other (Fig. S8a). Analysis of similarity (ANOSIM) using Bray-Curtis distances determined that the sampling points are significantly different (R = 0.2, P < 0.05) from each other compared to within-group differences (Fig. S8b). These groups were significantly different in Shannon diversity index and species richness (P < 0.05), with similar levels of species evenness in the two groups (Fig. S9). This suggests that diversity and species richness were not significantly different due to uneven species distribution but rather due to distinct microbial community compositions among pre- and postfermentation surfaces.

Linear discriminant analysis effect size (LEfSe) analysis at the family level identified Lactobacillaceae, Leuconostocaceae, and Micrococcaceae as highly abundant in prefermentation surfaces compared to Paenibacillaceae in postfermentation surfaces (Fig. S10). A log odds ratio plot indicated that these could be defined as biomarker families to be used to segregate groups and that can potentially identify risk to product and microbial ecology of that surface (Fig. S11). As depicted in Fig. S10 and S11, Lactobacillaceae, Leuconostocaceae, and Micrococcaceae families are biomarkers for prefermentation room surfaces compared to Oxalobacteraceae, Comamonadaceae, Bacillaceae, and other families in postfermentation surfaces.

Accordingly, these fermentation-associated families may be monitored within the facility to determine if there is a high abundance of any undesirable bacterial populations. A detailed description of all bacterial families identified is provided in Fig. S11.

Metagenomic analysis of the bacterial succession within the fermentation product.A metagenomic sequence analysis of the postfermentation food product was performed to determine changes of dominant and minority bacterial species during fermentation. This approach also enabled a comparison with the 16S rRNA phylogenetic profile of the facility microbiome. Samples were collected at 0, 5, 8, and 11 weeks from the sauerkraut fermentation initiated on 27 October 2016. The k-mer analysis revealed that species from the genera Lactobacillus, Lactococcus, and Leuconostoc were most abundant in each sample. A substantial decrease in the abundance of Lactococcus lactis subsp. lactis occurred over time. This species was most abundant at time zero (T0) (55% abundance) and decreased to 30% by 11 weeks. Similarly, Lactobacillus plantarum abundance increased from 12% to 31% by 8 weeks and decreased to 24% by 11 weeks. Leuconostoc mesenteroides decreased from 18% at T0 to 11% by 11 weeks. Lactobacillus rhamnosus was surprisingly detected at 11% abundance at 11 weeks, whereas it was detected below 0.1% in all other samples (T0 to 8 weeks). This indicates that L. rhamnosus was not typically found in the production facility and was detected only in the final product. Lactobacillus brevis experienced an ∼4-fold change in abundance (6% to 23%) from T0 to 5 weeks and decreased slightly (18 to 16%) from 8 to 11 weeks. Of note, other species of Leuconostoc and Lactobacillus did not substantially change over this time course and were low (1 to 3% matched reads) or at the limits of detection (0.12 to 0.16%) at all time points (Fig. S12). We saw surprisingly strong LAB sequence predominance in the T0 samples. From other fermentations, we see anaerobic plate counts (predominantly LAB) of 105 after mixing all fermentation components. The level of LAB is initially low and increases for some species of LAB and increases in different phases for different LAB. The k-mer-based identification performed on the raw cabbage mix, shredder, and a recently used clean sink revealed that all samples exhibited distinct microbial profiles from each other (Fig. S13). It is noted that fermentation samples yielded millions of sequence reads, with more than 40% of the reads matching bacterial k-mer database. The sequence determinations from the swab sampling produced a high failure rate for DNA yield (about 50% for 32 samples) with a subsequent 50% failure rate for library and sequence production. The DNA yield for the samples that worked ranged between 1.26 to 6.64 ng/μl. The DNA yield from swabs when the library preparation was not successful was less than 1 ng/μl. An additional consideration for the whole-genome data produced from low-DNA swab samples was that the swab sample obtained sequences that contained less than a 2% match to the k-mer database compared to the fermentation samples (40% bacterial database identification [ID]).

Summary of study conclusions.Spontaneous fermentation transforms raw vegetables into pickled foods primarily with the aid of lactic acid bacteria. Lactic acid bacteria are acid-tolerant anaerobes, which flourish in this environment and quickly bloom to dominate the pickling vegetable microbiome. However, it was unclear if these microbial communities originate from the raw vegetables or if these bacteria are laterally transferred between the food products within the built environment. Within this facility, microbial communities are enriched with the phylum Proteobacteria and have a low abundance of Firmicutes on raw vegetables and the vegetable handling surfaces. Moreover, these communities are significantly different when contrasted with the fermentation room, processing room, and dry storage surfaces. Multivariate analysis identified that raw vegetable and vegetable handling surfaces were more similar to each other than the fermentation room, processing room, and dry storage surfaces. A surface-specific microbial profile was characterized among various locations within the facility. Source tracker analyses determined that raw vegetables were the main source of microbes rather than human handling and that handling likely constitutes a very minor contaminant within the facility. Raw vegetable surfaces exhibit low diversity index and species richness compared to the high microbial diversity in all other environmental surfaces. Biomarker analysis identified that the phylum Proteobacteria is likely to be associated with raw vegetable surfaces relative to environmental surfaces that are linked with Firmicutes. Microbial diversity analysis performed on a fermentation prior to and after transition to usage revealed that diversity and richness were high in pretransition compared to posttransition surfaces. After the new fermentation room transitioned to active production, its resident microbiome was less diverse. The establishment of an active fermentation room reduced the overall microbial diversity, which is similar to what was observed in the raw vegetable surfaces rather than other environmental surfaces. It is possible that new resources introduced to the room influenced the microbial composition. Moreover, the room's microbiome structure may have been in a transient state during previous sampling, with the populations shifting toward other more diverse areas of the facility posttransition. This hypothesis could be further tested in the future through judicious sampling of areas undergoing similar transitions. Moreover, this particular microbial environment could be compared with those involving other vegetable fermentations. In general, the lactic acid bacteria comprise a large portion of the core community established in the production facility. Lactic acid bacteria were not observed in high abundance on raw vegetables, which is consistent with the expected enrichment as a consequence of fermentation. Thus, the high abundance of LAB in fermentation vessels is likely responsible for distributing these fermentative microbes to the rest of the facility environment.

MATERIALS AND METHODS

Facility description.The production facility specializes in spontaneously fermented vegetables and consists of a storage area for processing materials, a processing room where food is prepared, and several fermentation rooms that are maintained at 18 to 27°C, allowing for vegetables to ferment undisturbed in barrels. The facility areas that receive the most human contact are the processing room and the storage room, where there is a large door through which vegetables are delivered to the factory.

Sauerkraut production.Raw cabbages are chopped, shredded with an industrial food shredder, and layered in barrels with salt and freshwater. Full barrels of shredded cabbage are covered with water bags for the duration of the fermentation, creating an airtight seal. After approximately 6 weeks, when a batch passes a taste test and pH test, the finished sauerkraut barrels are brought back to the processing room and hand-packaged into jars for distribution. (Of note, sauerkraut typically ferments for a minimum of 6 weeks but often ferments for several months.) Other fermented vegetable products are prepared in a similar way, with some added ingredients and differing fermentation times.

Sample collection and DNA extraction.In total, 56 samples were collected for further analysis (Table 3). This includes 32 samples collected from different surfaces within the facility (Fig. 1), with another 9 samples from a newly established fermentation room during a single visit (n = 41). A total of 15 new samples were collected ∼5 months later (n = 15). These 15 samples were collected from the newly established fermentation room in order to track changes over time. This room was part of a new addition to the facility and was not previously used for fermentations. These 15 swabs were obtained from barrels, walls, floors, and the ceiling to provide greater resolution within the facility among pre- and postproduction environments. The facility was in full operation at the time of sampling. Surfaces were swabbed using sterile cotton-tipped wooden swabs (Puritan Medical, Guilford, ME), which were moistened with sterile 1× phosphate-buffered saline (PBS) solution (Fisher BioReagents, Fair Lawn, NJ). Swabs were firmly pressed against surfaces and streaked in overlapping S strokes while rotating the swab to ensure full contact. Approximately 25.8 cm2 of area was sampled per swab. Wooden swab tips with the cotton applicator were snapped off into sterile 1.5-ml polyethylene tubes containing 0.75 ml of sterile PBS, using the plastic lid to avoid manual contact. Swabs and tubes were stored at −20°C prior to analysis (11). Total DNA was extracted from swab tips with a PowerLyzer PowerSoil DNA isolation kit (Mo Bio, Carlsbad, CA) by placing the entire swab tip along with PBS in the extraction tube and following the manufacturer's protocol, with the addition of a bead-beating step conducted on a FastPrep-24 instrument (MP Biomedicals, Santa Ana, CA) at 4.5 m/s for 45 s. One hundred microliters of purified DNA was eluted in pure H2O. The eluate was further concentrated to approximately 25 μl with a Vacufuge plus (Eppendorf, Hamburg, Germany) for 30 min at 45°C on the V-AQ setting. Purified DNA was stored at −20°C for further processing.

16S rRNA amplicon sequencing library construction.Purified nucleic acids were used as a template (0.1 to 11.0 ng total DNA) to PCR amplify the V3-V4 region of the 16S rRNA, as well as the Illumina overhang adapter in preparation for sequencing. The primer set developed by Illumina was FwOvAd_341F (5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG) and ReOvAd_785R (5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC).

PCR amplification was performed using 2× HiFi HotStart ReadyMix (Kapa Biosystems, Wilmington, MA, USA) under following conditions: 95 C for 3 min, 25 cycles of 95°C for 3 s, 55°C for 30 s, and 72°C for 30 s, and 72°C for 5 min. AMPure XP beads (Beckman Coulter, Danvers, MA, USA) were used to purify the PCR products from free primers and other contaminants. A second PCR was performed to attach dual indices and Illumina sequencing adapters using the Nextera XT index kit (Illumina, San Diego, CA, USA), followed by a second round of AMPure XP bead purification. PCR products were quantified using the Qubit double-stranded DNA (dsDNA) BR assay (Life Technologies, Carlsbad, CA, USA). The quality of PCR products was measured by DNA analysis ScreenTape assay on the TapeStation 2200 system (Agilent Technologies, Santa Clara, CA, USA). PCR products were pooled in equimolar concentration (4 nM) and denatured immediately prior to sequencing. Sequencing was performed on an Illumina MiSeq platform (paired-end, MiSeq reagent kit v3, 10% Phi-X) at the Genomics Resource Laboratory, University of Massachusetts Amherst, and at the FDA WEAC laboratories. In total, 14,355,685 paired-end sequences were generated.

16S rRNA gene bioinformatic and statistical analyses.Illumina paired-end reads were quality filtered and analyzed using Qualitative Insights into Microbial Ecology (QIIME) v1.9.0 (31). Raw forward and reverse reads were aligned using fastq-join (32) and combined into a single fastq file using the split_libraries tool, which truncates reads with three consecutive base calls that exhibit a Phred score below 19. In total, 8,839,451 sequences (61.57% of total) were assembled and deemed passable following quality filtering. Reads were then subjected to open reference operational taxonomic unit (OTU) picking using the QIIME pipeline, implementing the uclust alignment algorithm at 97% identity, against the GreenGenes database, 13_8 release. It is noted that step 4 of the pick_open_reference_otus.py pipeline was omitted due to the large size of the data set. OTUs that were identified as chloroplasts or mitochondria and with fewer than 10 assigned sequences were removed from the OTU table to minimize inflated diversity estimates (33). A phylogenetic tree of final OTU alignments was generated with FastTree. The quality-filtered OTU table contained 4,939,277 sequences (55.87% of input) distributed across 5,388 OTU, with a table density of 0.2. The table was rarified to 5,524 sequences for subsequent analyses (Fig. S1). Relative abundance values were calculated by reads per OTU by the total reads in that sample. Data visualization was performed using Emperor (https://biocore.github.io/emperor/) and R Studio. Statistical analysis for multiple test corrections was carried out using GraphPad Prism (version 7.0). Alpha- and beta-diversity analyses were carried out using weighted UniFrac distance between samples for bacterial 16S rRNA sequences. Principal-coordinate analysis (PCoA) was computed using a UniFrac distance matrix. Hierarchical clustering analysis was performed using the relative abundance of OTUs, and a heat map was generated in the R library package. Analysis of similarity (ANOSIM) with 999 permutations was used to test significant differences between surface groups based on weighted UniFrac distance matrices. Similarity percentage analysis (SIMPER) on Bray-Curtis distances was used to identify the main contributors of the bacterial families responsible for the differences in the surface types. QIIME OTU table and metadata files were further analyzed with the Calypso web server (http://cgenome.net/wiki/index.php/Calypso) (34) for feature selection and multivariate data analysis on group-based comparisons. A detailed description of the samples acquired in this study is provided in Table 3.

Metagenomic k-mer analysis.Illumina DNA sequencing libraries and sequencing runs were designed to generate maximum output from unique small swab samples, or fermentation samples with more DNA, on the order of 1,000,000 to 10,000,000 reads. Only three of the first 20 factory swab samples contained enough DNA to generate adequate whole-genome sequencing libraries. Sequencing reads were analyzed for microbial composition using a custom C++ program called k-mer analysis developed at the FDA (CFSAN, OARSA laboratories, M. K. Mammel). The k-mer database is designed to match only species-specific sequences. The purpose of the k-mer analysis is to detect what species are present and their relative abundances. A species-specific k-mer data set was built using 786 complete GenBank complete genome entries, averaging 50,899-mers for each species. These included 14 Enterobacter, 20 Escherichia, 9 Leuconostoc, 8 Listeria, 24 Pseudomonas, 97 Salmonella, 18 Streptococcus, 15 Vibrio, and two Weissella species entries. For the 171 Lactobacillus species included, there were on average 57,800 k-mers each, for an average coverage of 2,030,000 bp per genome (Tables S1 and S2). It is designed to match 30-base k-mer sequences from sequence reads to a comprehensive sequence collection, a database of bacterial species. The database was constructed of species-specific 30-bp k-mer signature sequences. For each species of interest, each nonduplicated k-mer from a reference whole-genome sequence was placed into the database. The k-mers not found in at least 2/3 of a set of additional genome sequences of the same species as well as k-mers found in genomes of other species were removed. The resulting k-mer database used in this work contains 5,372 target entries, each consisting of approximately 40,000 (range, 255 to 80,000) k-mers unique to each species in the database. The result of analysis of a set of sequence reads from a mixed collection of species is a table showing the number of reads found that match each species in the database. To correct for bias due to differing numbers of k-mers used per database entry and genome size, normalization was performed, and the results are tabulated as the percent contribution to the microbial population of identified species. A number of different software packages are available to determine species content; other programs for profiling bacterial communities from short metagenomics shotgun sequencing data containing mixed species include KRAKEN and MetaPhlAn. The majority of the unassigned reads were designated “other” bacterial species from several tests conducted. The unassigned reads (reads not matching the k-mer database) were analyzed by several methods. First, when metagenomic data “en masse” were compared against GenBank data sets (i.e., pairwise alignment), we did not get hits to Brassica. Second, the bacterial content was measured using a panel of 40 single-copy genes as phylogenetic markers to determine the percentage of those that were of bacterial or other origin. For example, the time zero sauerkraut fermentation contained 16,855 bacterial hits, 108 Archaea hits, and 81 Eukaryota hits. The bacterial hits represent 98.89% of the matches, with the Eukaryota 0.47%. Although the Eukaryota class is extremely limited, it does contain Brassica rapa. Third, if you take 50 random reads against the nonredundant GenBank data set, 46 reads give top hits to expected Lactococcus, Lactobacillus, and Leuconostoc species, and four reads give no hit. We also mapped the reads against the cabbage genome (Brassica oleracea genome whole-genome sequence [WGS] JJMF01 project). Our analysis suggested that at T0, 0.63% of the reads aligned to the Brassica genome. We have provided additional analyses details in Tables S1 to S3.

ACKNOWLEDGMENTS

J.E.E. acknowledges an American Society for Microbiology Summer Undergraduate Research Fellowship. Furthermore, J.E.E. acknowledges the UMass Amherst Commonwealth Honors College for providing a Research Assistant Fellowship as well as the UMass Amherst Center for Agriculture, Food, and the Environment Summer Scholars Fellowship. We acknowledge the Center for Produce Safety (grant SCB14056) for providing partial funding for this work.

We acknowledge the UMass Amherst Genomics Resource Laboratory for assistance with genome sequencing. We thank Dan Rosenberg and Katie Korby of Real Pickles for access to their production facility and for assistance in sample collection. We acknowledge the support of and helpful discussions with members of the Sela Lab and Cindy Kane for technical assistance.

The funding organizations had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The views expressed in this article are those of the authors and do not necessarily reflect the official policy of the Department of Health and Human Services, the U.S. Food and Drug Administration (FDA), or the U.S. Government. Reference to any commercial materials, equipment, or process does not in any way constitute approval, endorsement, or recommendation by the FDA.