TRUNCATION ISSUES
(April 27, 2010)
The patterns by which standard measures of difference between outcome rates tend to be affected by the overall prevalence of an outcome described on the Scanlan’s Rule page of jpscanlan.com and in the various references made available on the Measuring Health Disparities page (MHD) of the same site involve normal and near normal distributions. The approach to measuring differences unaffected by the overall prevalence of an outcome described on the Solutions sub-page is also based on the assumption that the underlying distributions are normal or close to normal.
Often, however, research will deal with situations where the underlying distributions are known not to be normal because they are truncated portions of normal or near normal distributions. Obvious examples involves research concerning rates at which persons needing to control some condition are able to control it, as in the case of rates of hypertension control among persons deemed to be hypertensive. The population deemed hypertensive will be a truncated part of the overall populations – or, more precisely, truncated parts of the distributions of for each group being analyzed. For example, the figures underlying Figure 7 of 2008 International Conference for Health Policy Statistics presentation (ICHPS 2008) show that white and black men 45-64 likely to be deemed hypertensive (on the basis of systolic blood pressure (SBP) above 139) are comprised of approximately the highest (in terms of SBP) 21% and 38% of the two groups. There are many similar situations.
Figures 6 and 8 (with regard to perfectly normal data) and figures 7 and 10 (with regard to actual data on systolic blood pressure of black and white men) of the referenced presentation illustrate some of the differences between (1) the patterns of correlations of overall prevalence and standard measures of differences between outcome rates in two groups with normal or near normal distributions and (2) the patterns one of correlations of overall prevalence with those measures of differences when the populations examined are truncated part of those groups.
Generally, relative differences within the truncated population exhibit the same correlations within the truncated population that they exhibit within the overall population. That is, the rarer an outcome the greater will tend to be the relative difference in experiencing it and the smaller will tend to be the relative difference in avoiding it. Thus, for example, just as general reductions in SBP will tend to increase relative differences in hypertension rates while reducing relative differences in rates of avoiding hypertension, such reductions will tend to increase relative differences in rates of failing to control hypertension while reducing relative differences in rates of control of hypertension.
Within the truncated population absolute differences also exhibit patterns of correlations with overall prevalence similar to those within the overall population. However, there are a few differences. For example, in the overall population the maximum of the absolute difference will always occur within the point defined by a rate of 50% for either group (for either outcome); within the truncated population there will be some departures from that pattern. It should be recognized, however, that within a truncated population like persons deemed hypertensive the rates of achieving certain blood pressure levels will be much lower than the rates in the overall populations. Thus, as reflected in the aforementioned tables 7 and 10, bringing the blood pressure of everyone systolic blood pressure between 140 and 149 below 140 will tend to reduce absolute differences between rates of hypertension but increase absolute differences between rates of control of hypertension. This occurs because the rates of falling below 140 SBP in the overall population are such that further increases tend to reduce absolute differences between rates, while the rates of falling below 140 SBP in the hypertensive population are such that further increases tend to increase absolute differences between rates.
Within the truncated population, the difference measured by the odds ratio does not exhibit a pattern of correlation with overall prevalence simile to that in the overall population.
The solutions page discusses the fact that its methodology does work for truncated portions of overall populations, as had been previously illustrated in Table 7 of Comment on Boström.
Two extensive tables have been posted on this site to illustrate these issues. Table 1, which provides information more comprehensive than that reflected in Table 6 and 8 of the ICHPS presentation, shows the way (1) relative differences in experiencing or avoiding an outcome, (2) absolute differences between rates, and (2) differences measured by odds ratios tend to be affected by the changes in the overall prevalence of an outcome within truncated populations defined by various proportions of the advantaged group experiencing the adverse outcome (in other words, for example, everyone falling below a point that 20%, 30%, or 40% etc. of the advantaged group falls below). Table 1 also provides that information for various differences between means in the overall population (e.g., .5, .4, .3 standard deviations etc.).
Table 2 shows the values that would be derived from the rates within the truncated population by the approach described on the Solutions page (which I explain there is simply a different way of deriving the results of a probit analysis), showing such values with respect to various levels of truncation and various differences. Table 2 is thus an expansion of the illustration in Table 7 of Comment on Boström. By way of clarification, I note that I have in various places pointed out that for a measure to effectively appraise the size of a difference between outcome rates it must remain constant when there occurs a change in overall prevalence akin to that effected by the lowering of a cutoff. Only the Solutions (probit) approach satisfies that criterion. But such approach does not satisfy the criterion within the subpopulation. Further, not only does the value change when prevalence changes, the value is different from that which would be derived from the rates in the overall population (which latter value is the true value).
Tables 1 and 2 are based on situations where the distributions for the overall populations are normal. While Figure 10 of ICHPS 2008 and Figure 8 of Comment on Boström provide similar illustrations for NHANES SBP data, there is no particular purpose served in presenting more elaborate versions of those illustrations using actual data. The points are best made with normal data and adding illustrations with irregular data would only add confusion.
Immediately below I provide excerpts from the two tables, setting out as well the meanings of the columns.
In Table1 the columns as a follows:
EES estimated effect size for the entire population (difference between means of the distributions of advantaged and disadvantaged groups in terms of percentage of a standard deviation)
Limiter proportion of the advantaged group population below point that defines the truncated population (e.g., .3 indicates that the truncated universe is made up of the advantaged and disadvantaged group populations falling below the point below the point that defines the bottom 30% of the advantaged group population)
RefPoint a rounded reference point defined by the proportion of the advantaged group within the truncated population falling above the point
AGP proportion of advantaged group within the total population experiencing the favorable outcome
DGP proportion of disadvantaged group within the total population experiencing the favorable outcome
AGTP proportion of advantaged group within the truncated population experiencing the favorable outcome (i.e., falling above the point) (closest match to reference point)
DGTP proportion of disadvantaged group within the truncated population experiencing the favorable outcome corresponding to the rate for the advantaged group
TSR truncated success ratio (ratio of advantaged group’s favorable outcome rate to disadvantaged group’s favorable outcome rate within the truncated population)
TFR truncated failure ratio (ratio of disadvantaged group’s adverse outcome rate to advantaged group’s adverse outcome rate within the truncated population)
TAbDf absolute difference between outcome rates of advantaged and disadvantaged group within truncated population
An excerpt from Table 1 follows. It reflects the situation where the means of two total populations differ by .5 standard deviations and the truncated universe is defined at the point where 30% of the advantaged group falls below the point.
Table 1 Excerpt: Patterns of Measures of Differences Between Outcome Rates in Truncated Populations (b0417 c 2)
|
EES
|
Limiter
|
RefPoint
|
AGP
|
DGP
|
AGTP
|
DGTP
|
TSR
|
TFR
|
TAbDf
|
TOR
|
0.5
|
0.3
|
1.00%
|
70.88%
|
51.99%
|
1.17%
|
0.82%
|
1.42
|
1.00
|
0.35
|
1.43
|
0.5
|
0.3
|
3.00%
|
71.57%
|
52.79%
|
3.48%
|
2.47%
|
1.41
|
1.01
|
1.01
|
1.43
|
0.5
|
0.3
|
5.00%
|
71.91%
|
53.19%
|
4.63%
|
3.29%
|
1.41
|
1.01
|
1.34
|
1.43
|
0.5
|
0.3
|
10.00%
|
73.24%
|
54.78%
|
9.15%
|
6.57%
|
1.39
|
1.03
|
2.59
|
1.43
|
0.5
|
0.3
|
20.00%
|
76.42%
|
58.71%
|
19.97%
|
14.69%
|
1.36
|
1.07
|
5.28
|
1.45
|
0.5
|
0.3
|
30.00%
|
79.39%
|
62.55%
|
30.04%
|
22.63%
|
1.33
|
1.11
|
7.40
|
1.47
|
0.5
|
0.3
|
40.00%
|
82.12%
|
66.28%
|
39.31%
|
30.33%
|
1.30
|
1.15
|
8.99
|
1.49
|
0.5
|
0.3
|
50.00%
|
85.31%
|
70.88%
|
50.15%
|
39.85%
|
1.26
|
1.21
|
10.30
|
1.52
|
0.5
|
0.3
|
60.00%
|
88.30%
|
75.49%
|
60.28%
|
49.37%
|
1.22
|
1.27
|
10.91
|
1.56
|
0.5
|
0.3
|
70.00%
|
91.15%
|
80.23%
|
69.96%
|
59.16%
|
1.18
|
1.36
|
10.79
|
1.61
|
0.5
|
0.3
|
80.00%
|
94.06%
|
85.54%
|
79.84%
|
70.13%
|
1.14
|
1.48
|
9.71
|
1.69
|
0.5
|
0.3
|
90.00%
|
97.06%
|
91.77%
|
90.03%
|
83.01%
|
1.08
|
1.70
|
7.02
|
1.85
|
0.5
|
0.3
|
95.00%
|
98.54%
|
95.35%
|
95.03%
|
90.40%
|
1.05
|
1.93
|
4.64
|
2.03
|
0.5
|
0.3
|
97.00%
|
99.11%
|
96.93%
|
96.98%
|
93.65%
|
1.04
|
2.10
|
3.33
|
2.18
|
0.5
|
0.3
|
99.00%
|
99.70%
|
98.78%
|
98.99%
|
97.48%
|
1.02
|
2.50
|
1.51
|
2.53
|
In Table 2 the columns are as follows
EES - estimated effect size for the populations entire population (difference between means of the distribution in terms of percentage of a standard deviation)
TEES - estimated effect size derived from the rates within the truncated population
Limiter - proportion of the advantaged group population below point that defines the truncated population (e.g., .3 indicates that the truncated universe is made up of the advantaged and disadvantaged group populations falling below the point below the point that defines the bottom 30% of the advantaged group population)
RefPoint a rounded reference point defined by the proportion of the advantaged group within the truncated population falling above the point
AGTP proportion of advantage group within the truncated population experiencing the favorable outcome (i.e., falling above the point) (closest match to reference point)
DGTP proportion of disadvantaged group within the truncated population experiencing the favorable outcome corresponding to the rate for the advantaged group
An excerpt from Table 2follows. It reflects the situation where the means of two total populations differ by .5 standard deviations and the truncated universe is defined at the point where 30% of the advantaged group falls below the point. That the TEES differs from the EES as well as the fact that the TEES changes at each references indicate why the Solutions (probit) approach to measuring differences between rates does not work within a truncated populations.
Table 2 Excerpt: Patterns of Estimated Effects Sizes Derived Within a Truncated Population(b0416 c 1)
|
EES
|
TEES
|
Limiter
|
RefPoint
|
AGTP
|
DGTP
|
0.5
|
19
|
0.3
|
0.1
|
9.15%
|
6.57%
|
0.5
|
21
|
0.3
|
0.2
|
19.97%
|
14.69%
|
0.5
|
23
|
0.3
|
0.3
|
30.04%
|
22.63%
|
0.5
|
24
|
0.3
|
0.4
|
39.31%
|
30.33%
|
0.5
|
28
|
0.3
|
0.5
|
50.15%
|
39.85%
|
0.5
|
31
|
0.3
|
0.6
|
60.28%
|
49.37%
|
0.5
|
30
|
0.3
|
0.7
|
69.96%
|
59.16%
|
0.5
|
32
|
0.3
|
0.8
|
79.84%
|
70.13%
|
0.5
|
34
|
0.3
|
0.9
|
90.03%
|
83.01%
|