/******logistic regression****/ /****answering questions for pages 25-28. Taken from: http://personalpages.manchester.ac.uk/staff/mark.lunt/stats/7_Binary/text.pdf ****/ /****in this example we took a random sample of 1100 observations from the original data set**/ clear all use "C:\Users\fqeadan\Documents\PH539\hippain.dta" tab sex hip_p, row /***one could get the same result by using:**/ tab hip_p sex, co * i Prevalence is 9.62% in men, 15.79% in women tab sex hip_p, row chi2 * ii The difference in prevalence between men and women is very significant cs sex hip_p, or * iii Confidence interval is (1.22, 2.54) * iv The odds ratio and the relative risk are very similar * v Yes, the confidence interval does not contain 0, which is the null hypothesis risk difference codebook sex /**in our case F=1 nad M=0**/ logistic hip_p sex glm hip_p sex, family(binomial) link(logit) eform /***without eform one gets the betas instead of the ORs***/ glm hip_p sex, family(binomial) link(logit) /****to change the reference category one could use ib#.var***/ logistic hip_p ib1.sex * vi The odds ratio is exactly the same as that produced by cs * vii The confidence intervals are the same to 3 decimal places (the methods used to calculate them differ, but generally give very similar results) egen agegp = cut(age), at(0 30(10)100) label define age 0 "<30" 30 "30-39" 40 "40-49" 50 "50-59" label define age 60 "60-69" 70 "70-79" 80 "80-89" 90 "90+", modify label values agegp age tab hip_p agegp, chi2 * viii Yes: chi2 is very significant logistic hip_p age sex estimates store mr * ix Yes: p = 0.000 * x Odds of hip pain increase by 1.03 for each year increase in age logistic hip_p i.sex##c.age estimates store mf lrtest mr mf * xi No: the interaction term i.sex#c.age is not significant (p=0.749) logistic hip_p sex i.agegp * xii Odds for a man aged 50-59 are 12.63 times the odds for a man aged less than 30 /*****Let's change the reference category***/ logistic hip_p sex ib90.agegp logistic hip_p age sex /***hosmer lemeshow goodness of fit null hypothesis***/ /***H0: model is correct***/ estat gof * 4.1 Yes. However, this is not really appropriate, since there are so many covariate patterns. It would be better to use only 10 groups estat gof, group(10) * 4.1 In this case, there is evidence that the predicted and observed values differ more than can be explained by random variation lroc logistic hip_p i.agegp sex estat gof estat gof, group(10) * 4.3 Yes, this model is adequate lroc gen age2= age*age logistic hip_p age age2 sex estat gof, group(10) * 4.5 Yes, the coefficient for age2 is highly significant, and there is * no longer evidence of lack of fit. lroc * 4.6 The area under the curve with this model is similar to that use age * as a categorical predictor. predict p predict db, dbeta /**dbeta is very similar to Cook's D in ordinary linear regression**/ scatter db p vif, uncentered * 5.1 No, there are no points that are obvious outliers predict d, ddeviance scatter d p, yline(3.84) * 5.2 there are 4 outliers scatter p age * 5.3 the two lines are the prevalences in men and women graph twoway scatter p age || lowess hip_p age if sex == 1 || lowess hip_p age if sex == 0 * 5.4 the fit is good for men, but fits poorly to women over 80 * The quadratic model is reasonable for men, not women