Research Methodology

I have attended in the Workshop hosted by UVa library statistics consulting group. It was useful for me because they provide the detailed descriptions and explanations about each code. Because my first experience of R did not include any of those step-by-step explanations about how to use R. This workshop provided explanations of the each code in the SCRIPTS window with “#”. And by clicking ctrl + enter, the results of running those code were show in the CONSOLE, so that I could follow easily what I was doing with R.

I have summarized my notes below:

ctrl +enter: bring codes from scripts to console

write down orders in R script (upper left) and click ctrl+enter to see the results in the Console

F-ratio = The most important part of the table is the F-ratio, which is calculated using equation, and the associated significance value of that F-ratio. For these data, F is 99.59, which is significant at p < .001 (because the value in the column labelled is less than .001). This result tells us that there is less than a 0.1% chance that an F-ratio this large would happen if the null hypothesis were true. Therefore, we can conclude that our regression model results in significantly better prediction of record sales than if we used the mean value of record sales. In short, the regression model overall predicts record sales significantly well.

Regression: make a prediction about the future when the outcome is a continuous variable

Logistic regression: when the outcome is a categorical outcome(fireman, doctor, or pimp)

A small standard error tells us that most pairs of samples from a population will have very similar means (i.e. the difference between sample means should normally be very small). A large standard error tells us that sample means can deviate quite a lot from the population mean and so differences between pairs of samples can be quite large by chance alone.

I used MANOVA (Multivariate Analysis of Variance) for a quantitative data analysis of my dissertation.

So, when to use MANOVA?

When you have several dependent variables (DV)

When there is only one independent variable or when there are several, we can look at interactions between independent variables, and we can even do contrasts to see which groups differ from each other.

What are benefits of using MANOVA?

we can look at interactions between independent variables

we can even do contrasts to see which groups differ from each other

MANOVA can tell us the relationship between DV(outcome variables).

Compared to ANOVA, what are good things of using MANOVA? why MANOVA is used instead of multiple ANOVAs?

the more tests we conduct on the same data, the more we inflate the familywise error rate; the more dependent variables that have been measured, the more ANOVAs would need to be conducted and the greater the chance of making a Type I error.

MANOVA has greater power to detect an effect, because it can detect whether groups differ along a combination of variables, whereas ANOVA can detect only if groups differ along a single variable

ANOVA can tell us only whether groups differ along a single dimension whereas MANOVA has the power to detect whether groups differ along a combination of dimensions.

I attended two-day Educational Data Mining (EDM) workshop by Dr. April Galyardt provided by College of Education, University of Georgia from June 9 to 10, 2014. Before beginning of the workshop, I had several questions before beginning of the workshop. My goal of taking this workshop is to get clear answers about these questions.

EDM is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in.

It is similar to Learning analytics and knowledge (LAK). LAK is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs.

EDM vs. LAK

LAK and EDM share the goals of improving education by improving assessment, how problems in education are understood, and how interventions are planned and selected. EDM is more focused on generalizability. While LAK researchers have placed greater focus on addressing needs of multiple stakeholders with information drawn from data. (see, p.3 of handout, for details)

2. When may EDM be useful for my research based on my program of inquiry?

when need detailed formative assessment

Compared to Regression, I may use EDM when I need more interpretability for my data.

useful for design based research (DBR)

For my research..

When my regression data does not meet my needs. If it looks like more complicated things are going on in my variables. If I want to do more interpretation for my data.

3. What types of data can I use for EDM design?

not necessarily to be a big data but anything I want to.

4. What tools may I use for EDM?

R or Rapidminer (but R is more common for EDM)

<Day 1>

So far, EDM starts from the concept of regression. But the way of finding the best fit model is different from Regression. For example, Regression only accounts on significant data/variables. However, in EDM, with Lasso, some significant variables are critical but also not significant data can be used as predictable variable. Because, Lasso shows when we can have better model(s) when which variable is added. Even though it is not significant, we still can interpret the variable affected the model based on the graph.

Help for Your Analyses Needs?

The Academic Computing Center (ACC) provides research consultation, data analysis support, and assistance in interpreting statistical analyses for faculty and students in the College of Education. The ACC is associated with the Department ofEducational Psychology and Instructional Technology.

Assumptions of the t-test

The sampling distribution is normally distributed. In the dependent t-test this means that the sampling distribution of the differences between scores should b normal. not the scores themselves.

The independent t-test, because it is used to test different groups of people, also assumes: homogeneity of variance (variances in these populations are roughly equal), scores are independent (because they come from different people).

* Standard error: the standard deviation of sample means. It is a musure of how representative a sample is likely to be of the population. As sample get large (greater than 30), the sampling distribution has a normal distribution with a mean equal to the population mean(=central limit theorem)