Business And Economics Statistics Academic Performance Of University Students.pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.18 MB, 18 trang )

<span class="text_page_counter">Trang 1</span><div class="page_container" data-page="1">

HANOI UNIVERSITY

FACULTY OF MANAGEMENT AND TOURISM

Business and Economics StatisticsAcademic Performance of University Students

</div><span class="text_page_counter">Trang 2</span><div class="page_container" data-page="2">

I. Scenario...

II. Questions...

Question 1: What inference technique should be considered for this study? Explain...3

Question 2: Produce descriptive statistics for the dataset...4

Question 3: Check all assumptions of the inference technique you suggest in Question 1. Are the assumptions satisfied? Explain...

Question 4: Perform the inference technique you suggest in Question 1. Remember to provide all the necessary steps. What are your interpretations and conclusions? Explain. What are your interpretations and conclusions if we use 0.05 level of significance...12

Question 6: Discuss the credibility of the interpretations and conclusions of Question 4. Is there anything we should be concerned about? Explain...14

III. Conclusion...

</div><span class="text_page_counter">Trang 3</span><div class="page_container" data-page="3">

Table of Figures

Figure 1: Structure of data...3

Figure 2: Some first rows of the data...4

Figure 3: Structure of the data when factors have not been converted yet...4

Figure 4: Structure of the data when factors have been converted...5

Figure 5: Frequency table (sample sizes)...5

Figure 6: Mean of GPA according to Gender and Major...5

Figure 7: Median of GPA according to Gender and Major...6

Figure 8: Standard deviation of GPA according to Gender and Major...6

Figure 9: Summary of GPA according to Gender and Major...7

Figure 10: Boxplot...7

Figure 11: Mean plot with 90% CI...9

Figure 12: Q-Q Plot...11

Figure 13: Two-way ANOVA ouput...12

Figure 14: Interaction Plot between Gender and Major...14

</div><span class="text_page_counter">Trang 4</span><div class="page_container" data-page="4">

I. Scenario.

A survey was conducted by a large university in the United States to find the relationshipbetween majors and academic performance (GPA) for both its female and male students. Theanswers required the interviewees to indicate their majors and GPA scores were based on a 0to 4.0 scale. The purpose of this study is to examine for any substantial interaction betweenmajor and gender and to check for any significant differences in GPA due to these twovariables with 0.1 level of significance.

<small>Figure 1: Structure of data</small>

Question 1: What inference technique should be considered for this study? Explain.

In this case study, two-way ANOVA should be seen as an inference method for the tworeasons. In general, this test evaluates the mean differences of each factor. Moreover, itspurpose is to test for some connections between majors and genders with differences in GPAdue to these two variables. Therefore, our team decided to use two ways ANOVA for the factthat it compares the difference between groups that have split into two independent variables(major and gender) and dependent variable (GPA) as well as it indicates the interactionbetween them.

</div><span class="text_page_counter">Trang 5</span><div class="page_container" data-page="5">

Question 2: Produce descriptive statistics for the dataset.

We use Rstudio to describe statistics for this question. To start with, we import the Excelfile “StudentSurvey 2.csv” into R for further calculation:

⮚ studentsurvey 2<-read.table("StudentSurvey 2.csv",header = TRUE,sep =",", quote="/",stringsAsFactors = FALSE )

In addition, there are 234 observations in this case study; therefore, we should see somefirst observations to have better knowledge related to this data using head () function in R:

⮚ head(student survey 2)

<small>Figure 2: Some first rows of the data</small>The internal structure of the data can be obtained by:

⮚ str(student survey 2)

<small>Figure 3: Structure of the data when factors have not been converted yet</small>

From the above output, it is clear that there are 234 observations with 4 variables:observation, gender, major and GPA. Since Gender and Major are characters, we willconvert them into factors by using the following R codes:

⮚ student survey 2$gender<-factor(studentsurvey 2$gender, levels=c("1","2"),labels=c("Female","Male"))

⮚ studentsurvey 2$major<-factor(student survey 2$major,levels=c("1","2","3"),labels=c("Administration", "Accounting", "Finance"))

Then we use the R code str (Student Survey) to get the new structure of the data file with“Gender” and “Major” converted into factors:

</div><span class="text_page_counter">Trang 6</span><div class="page_container" data-page="6">

<small>Figure 4: Structure of the data when factors have been converted</small>

A frequency table can be created to see the sample size of each treatment group with thefollowing R code:

⮚ table(student survey$gender,student survey 2$major) <small>Figure 5: Frequency table (sample sizes)</small>

It can be seen that all 6 treatment groups have the same sample size of 39. This selection isour best choice to use a two-way ANOVA test.

Next, we use by () function in R to find several descriptive statistics such as mean, median,standard deviation, summary, … for each treatment group listed by the factors and theiroutput respectively:

⮚ by(studentsurvey2$gpa,list(studentsurvey2$gender,studentsurvey2$major), mean)

<small>Figure 6: Mean of GPA according to Gender and Major</small>

by(studentsurvey2$gpa,list(studentsurvey2$gender,studentsurvey2$major), median)

</div><span class="text_page_counter">Trang 7</span><div class="page_container" data-page="7">

<small>ure 7: Median of GPA according to Gender and Major</small>

by(studentsurvey2$gpa,list(studentsurvey2$gender,studentsurvey2$major), sd)

<small>Figure 8: Standard deviation of GPA according to Gender and Major</small>

⮚ by(studentsurvey2$gpa,list(studentsurvey2$gender,studentsurvey2$major), summary)

</div><span class="text_page_counter">Trang 8</span><div class="page_container" data-page="8">

<small>ure 9: Summary of GPA according to Gender and Major</small>

Each code gives the specific descriptive statistics of the outcome variable (GPA) for eachtreatment group with the listed Gender first then the Major. The final code Summary helpsto find 5 basic statistics along with the GPA: Minimum value, the first quantile, mean,median, the third quartile and maximum value.

To get further information, we conduct the boxplot and the mean plot.

⮚ boxplot(gpa~ interaction(gender,major), data = studentsurvey2, xlab = "Gender andMajor", ylab = "GPA", col = c("red", "blue", "yellow","grey","brown","pink"))

<small>Figure 10: Boxplot</small>8

</div><span class="text_page_counter">Trang 9</span><div class="page_container" data-page="9">

Initially, the box plot shows clearly several descriptive statistics: medians, quartiles,maximum and minimum data among different. Each cell has different characteristics forall. Taken into account the most special cell, the Male - Accounting group seems to havethe highest median value, the stable and uniform GPA values when the variance within thegroup is smallest because of the smallest interquartile range and marginal value rangebetween highest and lowest value. The Female - Finance group has the lowest at almostevery value: median, minimum value and maximum value when others have the highestGPA above 3.5, first and third quartiles with the average interquartile range and largevariance. In contrast, the highest GPA, interquartile and variance belong to the Male -Finance group.

The skewness of each group is obvious through boxplot. The data of each group can bedistributed asymmetrically, positive-skewed or negative-skewed based on the distance frommedian to two endpoints. Taking three groups of male into consideration, Male -Administration distribution is left-skewed when the number of GPA values larger thanmedian value is less than the number of those which is smaller than median. In the sameanalysis, it can be seen that Male - Accounting is the example of right-skewed distributionand asymmetric distribution is discovered at Male - Finance group. Also, there are 3outliers when appearing three white dots in Male – Accounting, Female – Finance andMale – Finance respectively but 3 out of 234 will not affect our test result.

We still use meanplot to identify mean value of each group and compare means betweengroups with the following codes and their outcome:

⮚ install.packages("gplots")⮚ library(gplots)

⮚ plotmeans(gpa~ interaction(gender,major), data = studentsurvey2, xlab = "Genderand Seat", ylab = "GPA", main="Mean Plot + with 90% CI")

</div><span class="text_page_counter">Trang 10</span><div class="page_container" data-page="10">

<small>Figure 11: Mean plot with 90% CI</small>

It can be seen from the mean plot, there are six groups which are presented in the mean plotwith 90% confidence interval. The result of the mean plot for mean values is the same as By ()function when we run it for means. The Female – Accounting group has the highest mean andthe lowest one is Female – Finance group. Besides, means of six groups are different whichare satisfied for assumption of two-way ANOVA.

Question 3: Check all assumptions of the inference technique you suggest in Question 1. Are the assumptions satisfied? Explain.

As you know from question 1, two-way factorial analysis of variance is always the bestinference method to cope with this case.However, it is necessary to check all the assumptionof this inference system before showing our two-way ANOVA with the aim of ensuring thatour results are valid.There are three assumptions which we need to check for two-wayANOVA

● Samples are independent, simple random samples of size n from each of k (=a*b)<small>ij</small>

● All populations are normally distributed.

● All populations have the same standard deviation: = = …=

To use these general conditions to check whether the study satisfies three assumptions fortwo-way ANOVA or not, some subjects should be denoted in detail:

</div><span class="text_page_counter">Trang 11</span><div class="page_container" data-page="11">

● n<small>ij</small>: Cell (combination of the factors)● i (Factor A): Gender

Assumption 2:All populations have the same standard deviation

</div><span class="text_page_counter">Trang 12</span><div class="page_container" data-page="12">

Secondly, we are going to check the assumption 2 of equal standard deviations. Looking atthe output of the “By” function in R for both male and female gender which is done inquestion 2, we can see that the ratio between the largest sample standard deviation over thesmallest sample standard deviation (= 0.765563/ 0.5129712) is around 1.49240932, which isless than 2. Therefore, we infer that all populations have the same standard deviations.Assumption 3:All populations are normally distributed

In order to check all populations are normally distributed or not, we can use Q-Q plot withR command:

qqPlot(lm(gpa ~ gender + major + gender*major, data = studentSurvey), simulate =T,main="Q-Q Plot", labels=F)

<small>Figure 12: Q-Q Plot</small>

We usually use a normal Q-Q plot to see the normality of residuals. The scatter measures upthe data to a perfect normal distribution. It can be seen from the plot that the scatter line closesto the line without outliers. Therefore, it is possible for Q-Q plot to meet two requirements, asa result, the population is normally distributed.

</div><span class="text_page_counter">Trang 13</span><div class="page_container" data-page="13">

ANOVA test 2-way factors:

- Step 1: Identify null and alternative hypothesis:

Ho: There is not a significant interaction between major and gender in GPA.Ha: There is significant interaction between major and gender in GPA.- Step 2: Test statistic and p-value

❖ Check assumptions: We use Two-way ANOVA to test the hypothesis.● All populations are normally distributed.

● Samples are independent, simple random samples of 39 from each of 6 populations.● All populations have the same standard deviation.

❖ Test statistic and p-value:

We used Rstudio to calculate and had the output as following:

StudentSurvey2.result<-aov(GPA ~ Gender*Major, data = StudentSurvey2)summary(StudentSurvey2.result)

<small>Figure 13: Two-way ANOVA ouput</small>- Step 3 : Level of significance

The level of significance: α=0.1- Step 4: Decision rule and conclusion

Reject Ho if p-value < ∝

As we mentioned in question 1, the primary purpose of a two-way ANOVA is to examine theinfluence of two different categorical independent variables on one continuous dependentvariable, therefore, we now consider the interaction between major preference and gender aspriority.

● If α = 0.1

As can be obtained from the chart using R, P-value <α (0.06504< 0.1). Therefore,following the decision rule, we reject Ho.

</div><span class="text_page_counter">Trang 14</span><div class="page_container" data-page="14">

Conclusion: We have enough evidence to conclude that there is significant interaction in GPAbetween major and gender with 90% confidence.

- With the test for Gender: P-value < α (0.00737 < 0.05).

Inferring from the result, we have enough evidence to conclude that mean in GPA of factorgender are different.

- With the test for Major: P-value < α (8.89e-13 < 0.05).

Inferring from the result, we have enough evidence to conclude that the mean in GPA of atleast one factor major are different.

Question 5: Draw an interaction plot and interpret the plot. Is the plot consistent withthe conclusions made in Question 4?

Another way to see that there is a significant interaction in GPA due to Major and Gender isthe interaction plot here with Rcode:

interaction.plot(studentsurvey2$gender,studentsurvey2$major,studentsurvey2$gpa,type="b", col=c("red","blue"), pch=c(16, 18),main="Interaction between Gender andMajor")

</div><span class="text_page_counter">Trang 15</span><div class="page_container" data-page="15">

<small>Figure 14: Interaction Plot between Gender and Major</small>

Theoretically, the more nonparallel the lines are, the greater the strength of the interaction.From this interaction plot, it can be seen that there is an interaction between gender and major.The Accounting major and the Administration major are two examples of the stronginteraction while Administration and Finance show a moderate one of gender and major.Overall, male students have a higher GPA than female students. Female students studyingaccounting show better performances than other female students in other majors. Their GPA isslightly higher than male student’s when the blue line presents a negative relationship. Theline of Administration shows a positive relationship when the male performs better GPA thanfemale, approximately equal to Accounting male’s GPA. This is a proof for a stronginteraction between gender and major. Since the GPA of both male and female in Finance aremuch lower than that of the other two majors, the interaction here is pretty weak. However, asshown in the plot, the three lines are non-parallel, so it can be assumed that the interaction ismoderate. This result is consistent with the conclusions made in question 4 when we canfollow the alternative hypothesis with 0.1 level of significant, not up to 0.05 level ofsignificant.

Question 6: Discuss the credibility of the interpretations and conclusions of Question 4. Is there anything we should be concerned about? Explain.

a. The credibility of the interpretations and conclusion.

In terms of interpretation, it is noticeable that the assumptions are accurate and allassumptions have been apparently confirmed and convinced without any bewilderment. Inthe area of α=0.1 (α: level of significance), we could conclude that there exists interaction in

</div><span class="text_page_counter">Trang 16</span><div class="page_container" data-page="16">

GPA between gender and major. It discovers the exact probability of type I error is 10 percent,which exists when the null hypothesis is rejected. In this level of significance (α=0.1) there isa noticeable interaction, and then turn a blind eye to the following two sets of hypotheses forthe main effects. A noticeable interaction tells us that the change in the correct averagereactions for a level of factor major depends on the level of factor gender. The outcome ofcoincident changes cannot be concluded by examining the main effects independently. In thescheme of question 4, we tried to compare p-value with α=0.05 and wrapped up that theredoesn't exist interaction in GPA between gender and major. It measures that the probability oftype I error is 5 percent. By way of explanation, it is 95% believable to conclude that there isno interaction in GPA between gender and major. The level of significance of a hypothesis testis equal to the probability of a type I error. Consequently, when α changes from 0.1 to 0.05,the probability of type I error reduces by the same amount. This will result in an increase inthe Probability of type II error which occurs if we do not reject Ho even it is wrong. Theinteraction is not significant enough to the point of level of significance type I which is equal0.05 to reject. We claimed the interpretations and conclusions of question 4 are somewhattrustworthy and three factors (GPA major and gender are remarkably different. As a result,, ) there is not a noticeable interaction, then continue to test the main effects. The factor majorsums of squares will reflect random variation and any differences between the true averagereactions for different levels of factor major. In a similar way, factor gender sums of squareswill reflect random variation and the true average reactions for the different levels of factorgender. In short, it is 90% conceivable to wrap up that there is interaction in GPA betweengender and major.

b. Limitation.

In spite of two-way ANOVA’s advantages to solve the case, there are a few demerits weshould consider in this paper. First of all, our samples are quite exiguous. Therefore, theoutcomes of the samples might not reflect excellently the whole GPA and diminishedcredibility. On the other hand, we just have learned the interaction between two factors, anddissimilarity in GPA between majors and genders. Notwithstanding, GPA is affected by avariety of factors such as the number of hours they work per week, social environment,living place and income level of family, previous academic performance, learningability, time spent for studying. In addition, there is no documentation to verify that thesamples were aimlessly chosen. The final demerit is the conditions of manipulating ANOVAtest especially in this context we can identify that the data is clumsily satisfy this requirement.Based on these demerits we have 90% confidence level instead of 95%. To sum up, although

</div>