Data analysis of GC-MS and LC-MS in Envrionmental Science

Miao Yu

2018-04-25

Qualitative and quantitative analysis of contaminants are the core of the Environmental Science. GC/LC-MS might be one of the most popular instruments for such analytical methonds. Previous works such as xcms were devoloped for GC-MS data. However, such packages have limited functions for environmental analysis. In this package. I added functions for various GC/LC-MS data analysis purposes used in environmental analysis. Such feature could not only reveal certain problems, but also help the user find out the unknown patterns in the dataset of GC/LC-MS.

Data analysis for single sample

Data structure of GC/LC-MS full scan mode

GC/LC is used for separation and MS is used for detection in a GC/LC-MS system. The collected data are intensities of certain mass at different retention time. When we perform analysis on certain column in full scan mode, the counts of different mass were collected in each scan. The drwell time for each scan might only last for 500ms or less. Then the next scan begins with a different retention time. Here we could use a matrix to stand for those data. Each column stands for each mass and row stands for the retention time of that scan. Such matrix could be treated as time series data. In this package, we treat such data as matrix type.

For high-resolution MS, building such matrix is tricky. We might need to bin the RAW data to make alignment for different scans into a matrix. Such works could be done by xcms.

Data structure of GC-MS SIM mode

When you perform a selected ions monitor(SIM) mode analysis, only few mass data were collected and each mass would have counts and retention time as a time seris data. In this package, we treat such data as data.frame type.

Data input

You could use getmd to import the mass spectrum data as supported by xcms and get the profile of GC-MS data matrix. mzstep is the bin step for mass:

data <- getmd('data/data1.CDF', mzstep = 0.1)

You could also subset the data by the mass(m/z 100-1000) or retention time range(40-100s) in getmd function:

data <- getmd(data,mzrange=c(100,1000),rtrange=c(40,100))

You could also combined the mass full-scan data with the same range of retention time by cbmd:

data <- cbmd(data1,data2,data3)

Visulation of mass spectrum data

You could plot the Total Ion Chromatogram(TIC) for certain RT and mass range.

plottic(data,rt=c(3.1,25),ms=c(100,1000))

You could also plot the mass spectrum for certain RT range. You could use the returned MSP files for NIST search:

plotrtms(data,rt=c(3.1,25),ms=c(100,1000))

The Extracted Ion Chromatogram(EIC) is also support by enviGCMS and the returned data could be analysised for molecular isotopes:

plotmsrt(data,ms=500,rt=c(3.1,25))

You could use plotms or plotmz to show the heatmap or scatter plot for LC/GC-MS data, which is very useful for exploratory data analysis.

plotms(data)
plotmz(data)

You could change the retention time into the temprature if it is a constant speed of temperature rising process. But you need show the temprature range.

plott(data,temp = c(100,320))

Data analysis for influnces from GC-MS

enviGCMS supplied many functions for decreasing the noise during the analysis process. findline could be used for find line of the boundary regression model for noise. comparems could be used to make a point-to-point data subtraction of two full-scan mass spectrum data. plotgroup could be used convert the data matrix into a 0-1 heatmap according to threshold. plotsub could be used to show the self backgroud subtraction of full-scan data. plotsms shows the RSD of the intensity of full scan data. plothist could be used to find the data distribution of the histgram of the intensities of full scan data.

Data analysis for molecular isotope ratio

Some functions could be used to caculate the molecular isotope ratio. EIC data could be import into GetIntergration and return the infomation of found peaks. Getisotoplogues could be used to caculate the molecular isotope ratio of certain molecular. Some shortcut function such as batch and qbatch could be used to caculate molecular isotope ratio for mutiple and single molecular in EIC data.

Quantitative analysis for short-chain chlorinated paraffins(SCCPs)

enviGCMS supply function to perform Quantitative analysis for short-chain chlorinated paraffins(SCCPs) with Q-tof data. Use getsccp to make Quantitative analysis for SCCPs.

If you want a graphical user interface for SCCPs analysis, a shiny application is developed in this package. You could use runsccp() to power on the application in a browser.

Data analysis for multiple samples

In environmetnal non-target analysis, when multiple samples are collected, problem will raise from the heterogeneity among samples. For example, retention time would shift due to the column. In those cases, xcms package could be used to get a peaks list across samples within certain retention time and m/z. enviGCMS package has some wrapped function to get the peaks list. Besides, some specific functions such as group comparision, batch correction and visulization are also included.

Wrap function for xcms package

getdata could be used to get the xcmsSet object in one step with optimized methods

getdata2 could be used to get the XCMSnExp object in one step with optimized methods

getupload could get the csv files to be submitted to Metaboanalyst from xcmsSet object

getupload2 could get the csv files to be submitted to Metaboanalyst from XCMSnExp object

getmzrt could get a list or csv files with peaks list, mz, retention time and class of samples from xcmsSet object

getmzrt2 could get a list or csv files with peaks list, mz, retention time and class of samples from XCMSnExp object

getmzrtcsv could read in the csv files and return a list for peaks list

Data imputation and filtering

getimputation could impute NA in the peaks list

getdoe could filter the data based on DoE, rsd, intensity

getfeaturest could get the features from t test, with p value, q value, rsd and power restriction

getfeaturesanova could get the features from anova, with p value, q value, rsd and power restriction

Visulation of peaks list data

plotmr could plot the scatter plot for peaks list with threshold

plotmrc could plot the differences as scatter plot for peaks list with threshold between two group of data