Monday, March 10, 2014

Data Analysis: Analysis of Variance (ANOVA)

This week we are learning how to use ANOVA (Analysis of Variance) for your statistical analysis and I want to keep it simple. Let's just focus on oneway ANOVA first. When you are comfortable with it, we can move to twoway ANOVA (e.g., when you have more than one independent variable for one dependent variable). You are using this technique for when your dependent variable is continuous (e.g., age, income, GPA etc.) and your independent variables are categorical (e.g., gender, ethnicity). Fundamentally, you are using ANOVA to find out means (or average) of two or more groups (of your independent variables). Does it sound like another technique that we used from last week? Yes, t-test. T-test does the same thing as ANOVA does. However, t-test can only run a variable that has a maximum of two groups or two levels such as gender (male or female), owning a pet (yes or no), applying for a Ph.D. program (yes or no) etc. When your variable (again, it's your independent variable) exceeds two groups or two levels, then you need ANOVA. Examples would be ethnicity (African American, White, Hispanic, and Asian), types of movie (drama, action, comedy etc.), or year in college if you prefer to use it as a category (freshmen, sophomore, junior, and senior) etc. There are two common commands for ANOVA: (1) anova income gender and (2) oneway income gender, t . I have created a table of type of tests based on the nature of your variables for your easy reference.


Note that the red highlight must be there and if you miss just one thing such as a comma, then it will not run. So you have to be careful about each every detail of your command. This is the reason why I highlighted it in red to ensure that you won't miss anyone of them.

Let's look into this data and run a few ANOVAs. We have questions about students' academic performance (var name is "rank") and mother education and father education. The question is whether parent education plays a key role in student performance. In other words, we can ask if the means of performance (rank) are different by each level of parent education. Here we are using education as a group or categorical variable. We can of course use it as a continuous variable as well, but for the purpose of this practice, let's use it as a group variable. First thing you need to do is to tab parent education (var name is edumom/edudad). So tab edumom and then tab edudad then it will give you percentage of each category of the education level. Why do you need to do the tab first? Because you want to know how does this variable looks like in terms of the response. Here is how it looks like:

Link to data file: http://dl.dropboxusercontent.com/u/60032040/anovapracticedata.dta

tab edumom 
tab edudad


The above table gives you information of percentage of each category of schooling level. Now it is time to run your ANOVAs. We will use the command: oneway rank edumom,t  and oneway rank edudad, t . The ", t" is an option for you to request a table of mean and standard deviation in addition to the statistical values. The variable "rank" is students' academic ranking from 1-50 in which 1 indicates highest performance (again you can do tab rank to see how your this variable's distribution looks like just like you did for edumom and dad).

oneway rank edumom, t 


First of all, look at the circle numbered 1 which shows you the significant level of your analysis. Based on the analysis above, your model is significant (at less than .05 level) and you can say that there is a meaningful difference between mother education and the average score of student performance. Now, it's time to look at the circle numbered 2. We see that the rank mean of mothers with no schooling is the highest (13.5) (meaning that lower performance) and is gradually decreasing with higher levels of education to 5.8 when mothers have above high school education. However, we still do not have any idea if one level is different from another. If there are only two levels, we know right away which one is higher or lower. Now we need a follow up analysis (or post-hoc). There are a few techniques, but we can just use bonferroni test (or bon). Here is now the command would look like, same as above, but adding bon: oneway rank edumom, t bon 

oneway rank edumom, t bon   


What we are looking from the above table is the significant levels of the each comparison (a total of 10 pairs). Let's just look at the pairs that are significant (i.e., less than .05 level). However, all the pairs are above .05 level, but there are two pairs that are marginally significant as shown in the red circles.

No Schooling--Secondary: marginally significant at p=.078
Primary--Secondary: marginally significant at p=.060.

NOW, do your own using father education.
Good luck!

TABLE MAKING

Here is how ANOVA Table should look like in your paper in APA style. Note that ŋ is eta-square or effectsize based on this method. 


REPORTING THE RESULTS 

"Table 1 shows the results based on Analysis of Variance (ANOVA) between mother education and students' academic performance. The results suggest that mother education is significantly associated with students' academic performance, F(4, 609)=3.19, p<.05, ŋ =.02. Post-hoc analysis using Bonferoni method shows that mother with secondary education is significantly different from mothers with no education and mothers with primary education in relations to students' academic performance."  

EFFECT SIZE

What is effect size and how do you obtain it? 

By definition, effect size is a simple way of quantifying the difference between two groups or a way to present the practical significance (rather than statistical significance) of the results. As a convention, effect size based on eta2 (eta-square) of .01 is considered small, .06 medium, and 14 large. The interpretation of effect size should be considered contextually. Coe (2002) argued that "the effectiveness of a particular intervention can only be interpreted in relation to other interventions that seek to produce the same effect. In education, if it could be shown that making a small and inexpensive change would raise academic achievement by an effect size even as little as 0.01, then this could be very significant improvement, particularly if the improvement applied uniformly to all students, and even more so if the effect were cumulative over time." 




Now you will learn how to obtain effect size. The command "oneway" will not work here. We need to use "anova" command this time. Effect size option is not pre-installed in your STATA, so you need to install it. To do so, type findit effectsize . Then a small page popping up looking like this: 


You can click on anyone of them to install. Here is how it looks like: 


It will tells you once the installation has been completed. Now it's time to check your effect size. Type the following command: 

anova rank edumom 


then 


effectsize edumom 


And here is how it looks like: 




So there, you got the effect size. It's .02 as also shown in your ANOVA sample table 1 above. 

R-square also suggests an effect size. You could use that as well. The above R-square is also .0205. The benefit of using effectsize command following the anova command is that you have more effect size options such as omega and Cohen. 

PRACTICE ON YOUR OWN

Using the anovapracticedata (use the same data above: http://dl.dropboxusercontent.com/u/60032040/anovapracticedata.dta), please answer the following questions:

1. Does type of transportation (var name transport) have any relationship with students' academic performance (var name: rank)? What is the effect size?

2. Do students who prefer to work in a group (var name groupwork) perform better academically compared to those who prefer working alone? What is the effect size?

3. Does mother education impact their involvement with their children's education (var name: parentinvolvement)? What is the effect size?

4. Create a question(s) on your own. The independent variable has to have more than two levels/groups.


***Note. This website does a great job in explaining within and between group variances.

4 comments:

  1. Oftentimes those who have conducted researches or studies are not aware of the differences between qualitative research and quantitative research methods. In this article, we are going to discuss one by one what sets them apart from the other. See more stata statistical analysis

    ReplyDelete
  2. Thanks Sothy for this post. In this post you discuss about Analysis of Variance. I see that you attached good reference list. Your instruction is really good. You can click here to know about Statistical Data Analysis.

    ReplyDelete

  3. I appreciate your work on Data Science. It's such a wonderful read on Data Science course. Keep sharing stuffs like this. I am also educating people on similar Data Science training so if you are interested to know more you can watch this Data Science tutorial:-https://www.youtube.com/watch?v=1ek7IdGhbXI

    ReplyDelete