Hypothesis Testing got simple !

T- Test for Hypothesis Testing

It took me a lot of courage to write on the such a niche subject of Statistics. I swear, I used to quite petrified of statistical terms such as hypothesis testing, T Test, ANOVA, Chi-Square test etc., and still am little bit.

But when I tried to do it on SAS, actually performing these test is not that difficult. Let me try to help you understand the same.

Download the data using following link, we will use the same for almost all tests:

Data for tests

Download the data, keep it at a location and assign the location to library "a" in SAS.

In the data, there is a variable "Write" with label "writing score" is having marks secured by several students. Let's assume looking at the data itself, you develop a perception that the mean of writing score is 50.

Well, this is nothing but your Hypothesis, you can be right with it or can be wrong equally. We can check it with the data .... and that's called "Hypothesis Testing".

Statistician have made a complex method of representing the same (complicating things is their habit).

They first state a Null Hypothesis. Well, Null hypothesis may be negative or positive sounding.

In this case Null hypothesis is "Mean is equal to 50" (positive sound).

Now, whenever there is a Null hypothesis, there is a reverse one too , called Alternative Hypothesis. In this case Alternative Hypothesis is "mean is not equal to 50". It is written as :
H0   :   Mean = 50
H1  :   Mean ~=50 

To test the hypothesis related to equality of means, we go for T-test. T-Test can be of various types, let's take these on, one by one :

1.  One Sample t-test    -- We hypothesize the mean to be equal to a number as in above example

Let's perform the test in SAS with its very simple syntax.

Proc ttest data= a.sample_1 H0=50;
var write;

So looking at this P value, what should I conclude and what actually this P value is?

I asked one of my hard core statistician friends about p value. He said, "it is most difficult task to define p value". So for many years I left thinking about it. But then I decided to write my own definition of p value in layman terms ... here it is:

P value is the probability of not being able to reject the null hypothesis ( since we never say, we accept it), but for sake of remembering it only you can say P value is the probability of accepting null hypothesis. So by common sense 1- p is the probability of rejecting the null hypothesis.

Now, if p value is 0.3 then 1- p would be 0.7. In this case, we should definitely reject the null hypothesis ( as higher probability on rejection side). No, we don't look at the p value relative to 1- p. There is always a cut-off associated with it which is called Alpha value.  Generally the cut off value is 0.05 (it can very test wise).

So considering Alpha = 0.05, if p value is less that 0.05, then we reject the null hypothesis. If it is more than 0.05, we are not able to reject null hypothesis.

Coming back to result of T-test, as the p value is <0.001 which is much less than alpha = 0.05 in this case, we reject the null hypothesis. So we can conclude in this case that mean is not equal to 50.

Well, there are further bifurcation in it, like you can test a sample's mean is > X, <X too. These are called one tail tests. In the above example we have checked mean = X, which is called two tail test. I am not much delving into it.

Click here to go ahead >>>