cd "G:\APC COURSE" clear all /*CHAPTER 3*/ /*SECTION 3.1*/ infile A P D Y in 2/l using "lung5-M.txt" /*QUESTION 1*/ ***TABULATE AGE AND PERIOD, AND PERSON-YEARS BY BOTH*** tab A tab P table A P , contents(sum Y) **NICER FORMAT - GIVES RATES PER 1000 PY** gen weight=1/1000 table A P [iweight=weight], contents(sum Y) format(%7.1f) /*QUESTION 2*/ ***Fit standard GLM with factor variables for Age and Period*** glm D i.A i.P, family(poisson) lnoffset(Y) baselevels glm, eform baselevels /*see help fvvarlist*/ /*QUESTION 3*/ ***Suppress the constant term and put Age effects on rate scale** glm D ibn.A i.P, family(poisson) lnoffset(Y) nocons baselevels glm,eform baselevels /*QUESTION 4*/ ***Make 1968 the reference for Period*** glm D ibn.A ib1968.P, family(poisson) lnoffset(Y) nocons baselevels /*QUESTION 5*/ ***Use matrices that are stored by GLM to show estimates** matrix beta=e(b) matrix var=(vecdiag(e(V))) matrix agebeta=beta[1,1..10] matrix agevar=var[1,1..10] matrix list agebeta matrix list agevar /*QUESTION 6*/ ***predictnl makes the prediction and standard error*** ***Could also use partpred - user written command*** partpred agebeta, for(ibn.A) ci(agebeta_lci agebeta_uci) eform predictnl agebeta2= _b[40.A]*40.A+_b[45.A]*45.A+ _b[50.A]*50.A+ _b[55.A]*55.A /// + _b[60.A]*60.A+ _b[65.A]*65.A+ _b[70.A]*70.A+ _b[75.A]*75.A+ _b[80.A]*80.A+ /// _b[85.A]*85.A, se(agebetase) gen agebeta2_uci=agebeta2+1.96*agebetase gen agebeta2_lci=agebeta2-1.96*agebetase ***Exponentiate results - Could have done this directly with predictnl*** ***see help predictnl - Can also make CI straight the way*** gen eagebeta=exp(agebeta2) gen eagebeta_lci=exp(agebeta2_lci) gen eagebeta_uci=exp(agebeta2_uci) **CHECK partpred and predictnl give same results*** su *age* drop eage* gen Amid=A+2.5 ***Put on different scale (per 1000 person-years*** replace agebeta=agebeta*1000 replace agebeta_lci=agebeta_lci*1000 replace agebeta_uci=agebeta_uci*1000 ***Scatter plot of results*** scatter agebeta agebeta_uci agebeta_lci Amid, sort yscale(log) **Line plot of the results*** line agebeta agebeta_uci agebeta_lci Amid, sort yscale(log) **Nicer plot of the results using rarea*** twoway (rarea agebeta_lci agebeta_uci Amid, fint(50) fcolor(orange) /// lcolor(orange_red) lwidth(thin) sort) (line agebeta Amid, sort), /// yscale(log) legend(off) saving(A, replace) name(A) /*Question 7 */ ***predict Period RR term*** partpred periodRR, for(ib1968.P) ci(periodRR_lci periodRR_uci) eform gen Pmid=P+2.5 **Nicer Plot of the results*** twoway (rarea periodRR_lci periodRR_uci Pmid, fint(50) fcolor(ltblue) /// lcolor(blue) lwidth(thin) sort) (line periodRR Pmid, sort), legend(off) /// yscale(log) saving(P, replace) name(P) /*QUESTION 8*/ **Showing that we could have got this from original model*** glm D ibn.A i.P, family(poisson) lnoffset(Y) nocons baselevels partpred agetest, for(ibn.A 1968.P) eform at(1968.P 1) ***Get back our original agebeta from before (we multiplied it by 1000)** gen agebetao=agebeta/1000 su agetest agebetao /*QUESTION 9*/ **GO BACK AND SAVE ESTIMATES** /*QUESTION 10*/ partpred lperiodoriginal, for(i.P) partpred lperiodRRoriginal1968, for(1968.P) egen lperiodRR1968b=max(lperiodRRoriginal1968) gen diff=lperiodoriginal-lperiodRR1968b **Showing that we could have got this from original model*** gen lperiodRR=log(periodRR) su diff lperiodRR /*QUESTION 11 */ **I'm not nerdy enough** **TIDY UP - WORKS TO AN EXTENT*** graph combine A P, ycommon name(comb,replace) /*CHAPTER 3*/ /*SECTION 3.2*/ ***SAME CONCEPTS AND CODE AS ABOVE - NEED TO JUST CREATE THE COHORT TERM *** /*SECTION 3.3*/ cd "G:\APC COURSE" clear infile A P D Y using "lung5-M.txt" drop in 1 **Generate cohort term*** gen C=P-A /*Question 2*/ **Factor model with constant*** glm D i.A P, family(poisson) lnoffset(Y) baselevels eform /*Question 3*/ **Factor model with constant included in Age term*** glm D ibn.A P, family(poisson) lnoffset(Y) baselevels nocons /*Question 4*/ **Fit the Age drift model for the centred period effects** gen P1970_5=P-1970.5 glm D ibn.A c.P1970_5, family(poisson) lnoffset(Y) baselevels nocons eform estimates store period_drift estimates save period_drift, replace partpred age_period, for(ibn.A) /*Question 5*/ **Fit the Age drift model for the centred cohort effects** gen C1908=C-1908 glm D ibn.A C1908, family(poisson) lnoffset(Y) nocons baselevels estimates store cohort_drift estimates save cohort_drift, replace /*Question 6*/ estimates table * estimates stats * /*Question 7 */ glm D ibn.A C1908, family(poisson) lnoffset(Y) nocons baselevels partpred age_cohort, for(ibn.A) ***Transfer the age cohort model to the age effects for age period*** gen transf=_b[C1908]*(62.5-A) gen Acheck=age_cohort+transf ta Acheck ta age_period /*Question 9*/ estimates use period_drift gen periodRR=exp(P1970_5*_b[P1970_5]) estimates use cohort_drift gen cohortRR=exp(_b[C1908]*C1908) twoway (line periodRR P, sort yscale(log)) (line cohortRR C, sort yscale(log)) /*SECTION 3.4*/ cd "G:\APC COURSE" clear infile A P D Y using "lung5-M.txt" drop in 1 gen C=P-A /*Question 2*/ ***Age period cohort factor model*** glm D i.A i.P i.C, family(poisson) lnoffset(Y) baselevels estimates store apcmodel ***Factor Model for just age*** glm D i.A, family(poisson) lnoffset(Y) baselevels estimates store age /*Question 3*/ ***Perform likelihood ratio test*** lrtest age apcmodel **NEED TO DO OTHER COMPARISONS*** /*Question 4*/ ***Apply constraints and references*** glm D ibn.A i.P i.C, family(poisson) lnoffset(Y) nocons baselevels glm D ibn.A i.P ib1908.C, family(poisson) lnoffset(Y) nocons baselevels ***Need to specify a constraint then use collinear option for GLM*** /*Question 5*/ constraint 1 1993.P = 0 glm D ibn.A ib(first).P ib1908.C, family(poisson) lnoffset(Y) /// nocons constraint(1) collinear baselevels partpred Aeff, for(ibn.A) eform ci(Aeff_lci Aeff_uci) partpred Peff, for(ib(first).P) eform ci(Peff_lci Peff_uci) partpred Ceff, for(ib1908.C) eform ci(Ceff_lci Ceff_uci) gen Apt=A+2.5 gen Ppt=P+2.5 gen Cpt=Ppt-Apt line Aeff* Apt, sort yscale(log) line Peff* Ppt, sort yscale(log) line Ceff* Cpt, sort yscale(log) ***see help apc_cglim (user-written package - need to install) *** ***USE CONSTRAINTS AND COLLINEAR TO FIT THE OTHER MODELS*** /*SECTION 3.5*/ clear all /*Question 1*/ infile A5 P5 C5 D Y up Ax Px Cx using "lung5-Mc.txt" drop in 1 /*Question 2*/ ***Quick loop to draw a Lexis Diagram with the triangular subsets in Stata*** ***Could make into a general program if useful?*** **MAKE SURE TO RUN THE LOOP AND THE TWOWAY COMMAND AT THE SAME TIME AS *** ***ONLY USED LOCAL MACROS*** local fn forvalues i=1858(5)1953 { local j=`i'+40 if `j'<1943 { local j=1943 } local k=`i'+85 if `k'>1993 { local k=1993 } local fnt (function y=x-`i', range(`j' `k') lc(bluishgray) lw(thin)) local fn `fn' `fnt' } local aspect=(85-45)/(1993-1943) twoway `fn', legend(off) aspect(`aspect') xlabel(1943(10)1993,grid glc(bluishgray) /// glw(thin)) ylabel(40(5)85,grid glc(bluishgray) glw(thin)) xmtick(##2,grid /// glc(bluishgray) glw(thin)) xtitle(Period) ytitle(Age) title(Lexis Diagram) /*Question 3*/ ***Generate synthetic Cohort term** gen S5=P5-A5 glm D ibn.A5 i.P5 i.S5, family(poisson) lnoffset(Y) nocons baselevels estimates store sapc /*Question 4*/ glm D ibn.A5 i.P5 i.C5, family(poisson) lnoffset(Y) nocons baselevels estimates store rapc estimates stats * /*Question 5*/ ***DO PLOTS**** /*Question 6*/ ***Generate group variables for the levels of the variables*** **The factor variable syntax won't allow non-integer values*** egen grAx=group(Ax) egen grPx=group(Px) egen grCx=group(Cx) glm D i.grA i.grP i.grC, family(poisson) lnoffset(Y) nocons baselevels estimates store triangle **tab Ax, gen(factA) **tab Px, gen(factP) **tab Cx, gen(factC) **glm D factA* factP* factC*, family(poisson) lnoffset(Y) nocons estimates stats * /*Question 9*/ gen diff=P5-A5-C5 tab up diff /*Question 10*/ **Fit separate models for upper and lower triangles*** glm D i.grA i.grP i.grC if up==1, family(poisson) lnoffset(Y) nocons baselevels glm D i.grA i.grP i.grC if up==0, family(poisson) lnoffset(Y) nocons baselevels /*Question 11*/ ***DO PLOTS**** /*Question 12*/ ***Make the spline terms*** capture drop *spl* rcsgen Ax, gen(Aspl) df(6) orthog rcsgen Px, gen(Pspl) df(6) orthog rcsgen Cx, gen(Cspl) df(6) orthog **Make a constant term for the age values*** gen Aspl0=1 **Fit model with splines - no set constraints*** glm D Asp* Pspl* Cspl*, family(poisson) lnoffset(Y) nocons estimates store trianglesplines /*Question 13*/ estimates stats * lrtest triangle trianglesplines /*Question 14*/ ***USE predictnl or xpredict or partpred for the spline terms **** ***MAKE A PLOT OF PREDICTIONS*** /*Question 15*/ **Fit Cohort terms first to change the constraint chosen by GLM*** glm D Asp* Cspl* Pspl*, family(poisson) lnoffset(Y) nocons estimates store trianglesplinesCP /*SECTION 3.6 & SECTION 3.8 - Haven't got the dataset*/ /*SECTION 3.7*/ clear all /*Question 1*/ infile sex A P C Y D A5 P5 C5 using "lung-apc.txt" drop in 1 ***Using apcfit for just males*** apcfit if sex==1, age(A) period(P) poprisktime(Y) cases(D) nper(100000) /// agef(agefmales) perf(perfmales) cohf(cohfmales) /// dfa(9) dfp(9) dfc(15) refc(1930) /* refit glm?*/ estimates store lungmale estimates save lungmale, replace ***Save the reference Cohort** scalar define refc=`r(refcoh)' gen ym=refc gen one=1 ***Using apcfit for just females*** apcfit if sex==2, age(A) period(P) poprisktime(Y) cases(D) nper(100000) /// agef(ageffemales) perf(perffemales) cohf(cohffemales) /// dfa(9) dfp(9) dfc(15) refc(1930) estimates store lungfemale estimates save lungfemale, replace ***Save the reference Cohort** scalar define refc=`r(refcoh)' gen yf=refc ****DRAW THE GRAPHS FOR THE RESULTS**** twoway (rarea perfmales_uci perfmales_lci P, sort pstyle(ci) color(eltgreen) fintensity(inten50)) /// (line perfmales P, sort lc(emerald) clpattern(solid)) /// (rarea cohfmales_uci cohfmales_lci C, sort pstyle(ci) color(eltblue) fintensity(inten50)) /// (line cohfmales C, sort lc(edkblue) clpattern(solid)) /// (scatter one ym, msymbol(Oh) mcolor(edkblue)) /// , name(PCmales,replace) legend(off) /// xtitle("Calendar Time") ytitle("Rate Ratio") yscale(alt) yscale(log) ylabel(,angle(h)) scheme(sj) twoway (rarea agefmales_uci agefmales_lci A, sort pstyle(ci) color(orange) fintensity(inten50) ) /// (line agefmales A, sort lcolor(cranberry) clpattern(solid)) /// , yscale(log) name(Amales,replace) legend(off) /// ylabel( 1 2 5 10 50 200, angle(h)) /// xtitle("Age") ytitle("Rate per 100,000 person-years") scheme(sj) ***Combine the two Graphs** graph combine Amales PCmales, nocopies imargin(3 0 0 3) scheme(sj) title("APC MALES") ***Draw the female graphs*** twoway (rarea perffemales_uci perffemales_lci P, sort pstyle(ci) color(eltgreen) fintensity(inten50)) /// (line perffemales P, sort lc(emerald) clpattern(solid)) /// (rarea cohffemales_uci cohffemales_lci C, sort pstyle(ci) color(eltblue) fintensity(inten50)) /// (line cohffemales C, sort lc(edkblue) clpattern(solid)) /// (scatter one yf, msymbol(Oh) mcolor(edkblue)) /// , name(PCfemales,replace) legend(off) /// xtitle("Calendar Time") ytitle("Rate Ratio") yscale(alt) yscale(log) ylabel(,angle(h)) scheme(sj) twoway (rarea ageffemales_uci ageffemales_lci A, sort pstyle(ci) color(orange) fintensity(inten50) ) /// (line ageffemales A, sort lcolor(cranberry) clpattern(solid)) /// , yscale(log) name(Afemales,replace) legend(off) /// ylabel( 1 2 5 10 50 200, angle(h)) /// xtitle("Age") ytitle("Rate per 100,000 person-years") scheme(sj) ***Combine the two Graphs** graph combine Afemales PCfemales, nocopies imargin(3 0 0 3) scheme(sj) title("APC FEMALES") /*Question 5*/ ***Because Stata saves things as variables the females and males results*** ***are in different observations*** ***This is a quick and ugly fix to ensure that they are not*** ***Must be a better way of doing it*** ***Probably using reshape somehow would be better*** foreach j in age per coh { gen `j'ffemalesshift=`j'ffemales[_n+5400] gen `j'ffemalesshift_uci=`j'ffemales_uci[_n+5400] gen `j'ffemalesshift_lci=`j'ffemales_lci[_n+5400] gen `j'lnRR=log(`j'fmales)-log(`j'ffemalesshift) gen `j'RR=exp(log(`j'fmales)-log(`j'ffemalesshift)) gen `j'diff_se=sqrt(((ln(`j'ffemalesshift_uci)-ln(`j'ffemalesshift_lci))/(3.92))^2 /// +((ln(`j'fmales_uci)-ln(`j'fmales_lci))/(3.92))^2) gen `j'RR_uci=exp(`j'lnRR+1.96*`j'diff_se) gen `j'RR_lci=exp(`j'lnRR-1.96*`j'diff_se) drop `j'diff_se drop `j'lnRR drop *shift* } line ageRR ageRR_lci ageRR_uci A, sort yscale(log) line cohRR cohRR_lci cohRR_uci C, sort yscale(log) line perRR perRR_lci perRR_uci P, sort yscale(log) /*Question 6*/ ***Fit apcfit to the whole dataset*** ***Could define the knots as the Male knot placements if needed?*** apcfit, age(A) period(P) poprisktime(Y) cases(D) nper(100000) /// dfa(8) dfp(8) dfc(15) refc(1930) estimates store lungboth ***Start fitting interactions between sex and the spline terms*** /*Question 7*/ glm D i.sex#c._spA* i.sex#c._spP* i.sex#c._spC*, f(p) lnoffset(Y) nocons baselevels estimates store fullint glm D i.sex#c._spA* i.sex#c._spP* i.sex#c._spC1_ldrft _spC2-_spC15, f(p) lnoffset(Y) nocons estimates store intper glm D i.sex#c._spA* c._spP* i.sex#c._spC*, f(p) lnoffset(Y) nocons baselevels estimates store intcoh ***Check which models are best*** estimates stats * lrtest fullint intper lrtest fullint intcoh ***Fitting different interactions between sex and the spline terms*** glm D c._spA* i.sex#c._spP* i.sex#c._spC* i.sex sex#c.A, f(p) lnoffset(Y) nocons estimates store age glm D c._spA* i.sex#c._spP* i.sex#c._spC* i.sex, f(p) lnoffset(Y) nocons estimates store ageln ***Check which models are best*** lrtest fullint age lrtest fullint ageln lrtest age ageln ***recode sex for convenience*** recode sex (1=0)(2=1) ***Generate interactions by hand to make predictions easier using xpredict*** quietly gen sexC1=sex*_spC1_ldrft forvalues i = 2/15 { quietly gen sexC`i'=sex*_spC`i' } forvalues i = 1/9 { quietly gen sexA`i'=sex*_spA`i' } /*Question 8*/ ***RR1 and RR2 are the same models fitted one using handmade interactions*** ***and the other factor variable specification for the interactions*** glm D c._spA* c._spP* c._spC* i.sex#c._spA* i.sex#c._spC*, f(p) lnoffset(Y) nocons estimates store RR1 partpred rrtester, for(i.sex#c._spA*) eform glm D c._spA* c._spP* c._spC* sexA* sexC*, f(p) lnoffset(Y) nocons estimates store RR2 ***Using partpred is nicer than predictnl*** partpred agemal if sex==0, for(_spA*) partpred agefem if sex==1, for(_spA* sexA*) partpred rrage, for(sexA*) ci(rrage_lci rrage_uci) eform **Test both models give same results - depends if you prefer to clutter** **up the dataset with your own interaction terms or not *** su rrtester rrage **Make it M/F ratio instead of F/M ratio*** gen rrageMF=1/rrage gen rrageMF_uci=1/rrage_lci gen rrageMF_lci=1/rrage_uci ***Plot the results*** line rrageMF rrageMF_lci rrageMF_uci A if sex==1, sort yscale(log) **Estimate log RR using xpredict*** partpred rrCoh, for(sexC*) ci(rrCoh_lci rrCoh_uci) eform **Make it M/F ratio instead of F/M ratio*** gen rrCohMF=1/rrCoh gen rrCohMF_lci=1/rrCoh_uci gen rrCohMF_uci=1/rrCoh_lci ***Plot the results*** line rrCohMF rrCohMF_lci rrCohMF_uci C if sex==1, sort yscale(log) ***QUICK SHIFT to put the results in the right place*** gen rrAges=rrageMF[_n+5400] gen rrCohs=rrCohMF[_n+5400] ***Plot the two different results from the two methods*** line rrAges ageRR A, sort yscale(log) line rrCohs cohRR C, sort yscale(log) ****POSSIBILITY OF REDUCED INTERACTIONS****