Monday, April 7, 2014

Correlation

Update (May 12 2014): there is this website that attempts to to build graphs showing correlations between two completely unrelated data. The website's name is Spurious Correlations: Discover a New Correlation. By definition, spurious correlation is a relationship between two variables that are apparent depending on (or with the present of) a third factor. For example, a significant relationship between students dropping out of school and family socioeconomic status depends on students' own academic performance, meaning that family SES alone may not be impacting students' dropping out of school if the students themselves performing well. Therefore, to say that family SES is correlated with student dropout may be misleading or you can say that the relationship between the two variables are spurious. This website updates interesting correlations between two things everyday: http://www.tylervigen.com/

This week we are learning how to conduct correlation analysis. Correlation is a statistical technique that allows you to examine relationship between two variables, both of which are continuous. Correlation does not tell you causal relationship, rather a bi-directional relationship between two variables. Correlation value is expressed by the r value, or a Pearson correlation value. The highest value of r is 1; the higher the value, the stronger the relationship between two variables. The value can be positive or negative; both tell you the direction of the relationship. For example, you may want to know if education of parents is correlated with academic involvement with their children, or education of parents is correlated with age at first school enrollment for their children etc. Correlation gives you an answer in terms of a direction of relationship, and based on the above questions, parents with higher level of education are more likely to be involved in their children's education (positive direction) or parents with higher level of education are less likely to enroll their children late in school (negative). Now let's look at our data as an example. We want to know if education of mother (var name: edumom) is correlated with mother involvement (var name: involvemother). Correlation command is pwcorr. So, pwcorr edumom involvemother, sig (note that IV and DV can be placed anywhere after pwcorr. In ANOVA, DV has to come first after oneway).

use http://dl.dropboxusercontent.com/u/60032040/studentdata2013.dta

pwcorr edumom involvemother, sig


The above output shows that there is a correlation between education of mother and involvement. For your academic paper, you would say that:

"Mother education is positively correlated with mother involvement, r = .10, p<.01. Specifically, the higher levels of mother education, the higher the involvement the mothers have with their children's education."    

You can also request Stata to show you the star on the pair of variables that are significant by using this command option: star (#). You can also request Stata to show you the number of observation of the pair of variables: obs. It looks like this:

pwcorr edumom involvemother, sig obs star(.05) 


The above output shows you see the star on the pair of the variables, with N=804.

Now let's try correlation with more than two variables.

pwcorr edumom edudad involvemother involvefather gender ageenroll rank age, sig obs star(.05) 


Now, let's try to put it into APA style on your own. Things that need to be reported include Pearson r, significant value, and number of observation (N). If there are different numbers of observation, specify the range, from lowest to highest. You don't have to report all of them. That is because by default, Stata uses pairwise deletion method (only pair of missing variables are deleted). Try it! APA style of correlation looks like this:


Correlation examines the relationship between two or more variables separately, meaning that relationship between two variables is independent of other variables (e.g., does not take into account the influence of other variables). It examines between A-B, A-C, or B-C, so A-B is independent of the other two sets. It is just like saying that Income-Education, Education-Age, or Income-Age, but it does not tell you if relationship between income and education depends on age or other variables such as gender. When you have other variables that you want to control for, you need multiple regression. Regression allows you to model your outcome variable based on two or more independent variables, all of which are continuous or dummy in nature. No categorical variables are allowed in regression or correlation.

You con also specify Stata to run for particular group by using "if" command. "if" cannot be placed after a comma. For example,

pwcorr edumom edudad involvemother involvefather ageenroll rank age if gender==1, sig obs star(.05) 

For more information on correlation, you can type "help correlate" in your Stata command box. Or visit: http://www.stata.com/manuals13/rcorrelate.pdf

What if your variables are dichotomous--or more specifically binary? 

Dichotomous variable is the same as categorical variable. Binary variable is a type of dichotomous variable, but with values specifically assigned to both groups as 0 or 1 (e.g., female=0 and male=1). Binary variable is the same as dummy variable where it takes the values of 0 and 1, representing absence or presence of a group.

Let's look at an example below between desk (having a desk at home ("1") or not ("0")) and academic engagement (continuous var) and age (continuous var) of the students. We use command:

pwcorr gender desk engagement age, sig star(.05) 



What we can see based on the above output is that the variable desk is significantly correlated (or associated) with academic engagement (r=.09, p<.01) and age of the students (r= -.11, p<.001). The results specifically suggested that students who have a study desk at home tended to show higher level of academic engagement compared to those who do not have a study desk at home, and students who do not have a desk at home tended to be older compared to those who have a study desk at home.

So how do you read the binary variable "desk"? You know that having a study desk is coded as "1" and not having a study desk is coded as "0". So you look at the sign in front of the correlation (red circles). If it is positive, it represents the "1" which is having a study desk. If it is negative (in the case of age var), it represents the "0" which is not having a study desk.

Now it's your turn to run your own analysis with a binary variable. Use rank and gender and then try to interpret the findings.


PRACTICE ON YOUR OWN

Examine the correlation among the following variables:

rank gender edumom edudad electricity tv cell desk calculator breakfast engagement genderrole

Then, build APA styled table based on the results.

Finally, describe your findings in APA style.

use http://dl.dropboxusercontent.com/u/60032040/studentdata2013.dta

5 comments:

  1. You can find correlation between almost anything, if you just choose a short enough interval, the right interval and combination of factors. It's not really interesting, other than to hopefully show people, that it's not implying causation. :)

    ReplyDelete
    Replies
    1. Welcome To Cie491: Statistical Data Analysis Using Stata: Correlation >>>>> Download Now

      >>>>> Download Full

      Welcome To Cie491: Statistical Data Analysis Using Stata: Correlation >>>>> Download LINK

      >>>>> Download Now

      Welcome To Cie491: Statistical Data Analysis Using Stata: Correlation >>>>> Download Full

      >>>>> Download LINK MB

      Delete
  2. Great job done keep posting more. I will share your link with my friends.
    Statistical Data analysis

    ReplyDelete
  3. Such an incredible post, This story is stacked with the intriguing things which are so important. I fundamentally need to express this is the story which click here individuals should comprehend that life is so troublesome. I basically regard the maker who makes this story so well.

    ReplyDelete
  4. Welcome To Cie491: Statistical Data Analysis Using Stata: Correlation >>>>> Download Now

    >>>>> Download Full

    Welcome To Cie491: Statistical Data Analysis Using Stata: Correlation >>>>> Download LINK

    >>>>> Download Now

    Welcome To Cie491: Statistical Data Analysis Using Stata: Correlation >>>>> Download Full

    >>>>> Download LINK

    ReplyDelete