National Education Policy Center Study
(May 10, 2012, rev. May 20, 2012)
Prefatory note added March 6, 2016: This page has not been materially updated since May 20, 2012. Thus, the following subsequent developments warrant mention:
1. This subpage discusses, among other things, the mistaken belief in the in subject study that generally reducing discipline rates would tend to reduce relative racial/ethnic differences in discipline rates and that the study may influence legislation regarding discipline standards. In 2012 Colorado did enact legislation relaxing discipline standards while mistakenly believing that doing so would tend to reduce relative racial ethnic differences in discipline rates. As indicating in the Colorado Disparities, and Denver Disparities subpages of the Discipline Disparities page, recent general reductions in discipline have been accompanied by increased relative racial/ethnic differences in discipline rates. As reflected in the following subpages (with jurisdiction indicated in the title of the subpage) the same pattern is being observed across the country: California Disparities, Maryland Disparities, Connecticut Disparities, Minnesota Disparities, Beaverton, OR Disparities, Los Angeles SWPBS, Minneapolis Disparities, Montgomery County, MD Disparities, St. Paul Disparities, Henrico County, VA Disparities, Portland, OR Disparities. The DOE Equity Report subpage addresses a Department of Education study showing that relative racial differences in expulsions are smaller in districts with zero tolerance policies than in districts without zero tolerance policies. The Suburban Disparities and Preschool Disparities subpages addresses the fact that relative racial differences in discipline rates tend to be greater in suburbs than in central cities, and in preschool than K-12, simply because discipline rates tend to be lower in suburbs than in central cities and preschool than K-12.
2. Articles since May 2012 addressing this subject include Misunderstanding of Statistics Leads to Misguided Law Enforcement Policies, ” Amstat News (Dec. 2012), “The Paradox of Lowering Standards,” Baltimore Sun (Aug. 5, 2013), “Things government doesn’t know about racial disparities,” The Hill (Jan. 28, 2014), and “Race and Mortality Revisited,” Society (July/Aug. 2014). University methods workshops addressing this issue are found in a note.[i]
3. Also pertinent to this page are the letters to the following entities advising that contrary to views expressed by the recipient entities or communicated to the recipient entities by others, reducing the frequency of adverse discipline actions tend to increase (a) relative differences in discipline rates and (b) the proportion more susceptible groups make up of persons disciplined: Houston Independent School District (Jan. 5, 2016), Boston Lawyers’ Committee for Civil Rights and Economic Justice (Nov. 12, 2015), McKinney, Texas Independent School District (Aug. 31, 2015), Department of Health and Human Services and Department of Education (Aug. 24, 2015), Texas Appleseed (Apr. 7, 2015), Senate Committee on Health, Education, Labor and Pensions (Mar. 20, 2015), Vermont Senate Committee on Education (Feb. 26, 2015), Portland, Oregon Board of Education (Feb. 25, 2015), Senate Committee on Health, Education, Labor and Pensions (Apr. 1, 2013), United States Department of Justice (Apr. 23, 2012) United States Department of Education (Apr. 18, 2012). A letter to the IDEA Data Center (Aug. 11, 2014), while not addressing a mistaken view on the part of the recipient, addresses the entities failure to recognize the referenced pattern in guidance it provides on the measurement of disproportionality in suspensions and other matters. Letters to American Statistical Association (Oct. 8, 2015) and Chief Data Scientist of the Office of Science and Technology Policy (Sept. 8, 2015) urge the recipient entities to explain the federal government that reducing the frequency of adverse discipline actions tend to increase (a) relative differences in discipline rates and (b) the proportion more susceptible groups make up of persons disciplined.
4. On September 28, 2015, I created the Intermediate Outcomes subpage the Scanlan’s Rule page addressing the subject of Section A of this subpage, though without referring to this subpage.
***
In April 2012, the National Education Policy Center (NEPC) of the School of Education of the University of Colorado Boulder issued a report styled “Colorado Disciplinary Practices 2008-2010; Disciplinary Actions, Student Behaviors, Race and Gender.”
Like a high proportion of the research on demographic differences in rates of experiencing an outcome, the report is fundamentally flawed for reliance on the rate ratio as a measure of association. See generally the Measuring Health Disparities, Scanlan’s Rule, and Measures of Association pages of this site. In addition, the study reflects the misperception, which is a key subject of the main Discipline Disparities page, that reducing discipline rates generally will reduce relative racial differences in discipline rates. In fact, as explained in Section A of the page, reducing discipline rates generally will tend to increase racial differences in discipline rates. That is, just as lowering test cutoffs tend to result in smaller differences in pass rates but larger differences in failure rates, less stringent discipline policies tend to result in smaller differences in rates of avoiding discipline but larger differences in discipline rates.
Given that the report may influence legislation under consideration in Colorado, it should be withdrawn or modified to take into account the reasons to believe more lenient policies will in fact yield larger relative racial difference in discipline rates.
The report also has certain methodological problems that provide additional reason for its withdrawal or modification. One of these problems causes the report to be misleading concerning an issue of substantial importance to policymakers. That problem is treated in Section B. The matter is easier to understand, however, if one first reads Section A.
A. Rates of Experiencing Particular Outcomes Versus Rates of Experiencing Outcomes Falling Above or Below a Certain Level of Severity
Table 8 of the report (at page 13) shows rates at which different ethnic groups experience certain types of discipline and Table 9 (at page 14) shows the ratios of each minority group’s rate to the white rate for each type of discipline. For example, the rates for in-school suspension are 11.2% for blacks and 3.3% for whites and the rates for out-of-school suspension are 21.0% for whites and 5.5% for blacks. Table 9 then shows black-white ratios as 3.3 and 3.8, and the latter figures form the basis for the statement on page 14 of the report that black students were almost four times as likely as white students to receive out-of-school suspensions.
The problem with this manner of presentation (which is also touched upon in the first note of the main Discipline Disparities page) may be most easily illustrated with reference to studies of demographic differences in self-rated health. Self-rated health is commonly divided into four or five categories. In the latter case, the categories often are (1) excellent, (2) very good, (3) good, (4) fair, and (5) poor. When issues concerning demographic differences in self-rated health are studied, they are generally studied in terms of a dichotomous variable where categories (1) and (2) or (1), (2), and (3) are grouped together to reflect the favorable outcome and the other categories are considered the adverse outcome. Studies then look, for example, at demographic differences in experiencing “health less than good” or “health less than very good.” Such studies do not look at rates of experiencing, say, good health or fair health, because the proportions of each population falling into those categories are variously influenced by the proportions falling into better and worse categories. Apart from the complexity of drawing conclusions about rates of falling into particular categories, it is not clear that any conclusions would make much sense. (See the Reporting Heterogeneity sub-page of the Measuring Health Disparities page concerning a number of measurement issues that are related to the broad subject of the Discipline Disparities page, but not the particular point of this section).
Similarly, in an analysis of demographic difference in performance on a test, researchers might well look at rates of falling below (or above) a score of 50% correct or 40% correct or 30% correct (or all such rates). But they would not look at rates of falling between 50% correct and 40% correct or between 40% correct and 30% correct. The point might also be illustrated in Table 1 of “Can We Actually Measure Health Disparities” (Chance 2006) (which also appears as Table 1 of the Income Illustrations sub-page of the Scanlan’s Rule page). The method of the NEPC report would be akin to comparing black and white rates of falling between 150% and 125% of the poverty line rather than rates of falling below (or above) either of those figures.
The points of the above paragraphs would apply even if rate ratios were valid measures of association. But of greater importance is the fact that the valid measure of association described on the Solutions sub-page of Measuring Health Disparities and employed in Tables 1 and 2 of the main Discipline Disparities page is only useful when applied to dichotomous variables.
I assume that the categories in Table 8 (apart from “Other action taken”) reflect the following increasing levels of severity: (1) classroom suspensions, (2) in-school suspensions, (3) out-of-school suspension, (4) expulsion, (5) referral to law enforcement. “Other actions taken” could include some actions less severe than classroom suspension or actions akin to referral to law enforcement. But there are few enough such actions (3.4% of the total) that we can ignore them without concern that doing so would affect any point made here.
Thus, in analyzing racial differences one could be interested in the combined rates at which different demographic groups fell into (or out of) categories (1) through (4), (2) through (4), or (3) through (4), as well as (4) alone. I do not include (5) because I assume that instances where students were referred to law enforcement are usually in some other category as well. But (5) could still be analyzed separately.
An approach along the lines described in the last paragraph is reflected in Tables 1 and 2 of the Discipline Disparities page (and I may eventually treat the figures in Table 8 of the NEPC report in such manner). This is not so say that it is necessary to analyze the data in all the ways suggested, but simply that the categorizations would make sense for purposes of deriving rates of experiencing some outcome (or group of outcomes) that can be used to appraise the comparative status of the various demographic groups with respect to discipline.
The above observations would pertain as well to the gender analysis in Tables 10 and 11. I question, however, whether devoting resources to analyzing gender differences in discipline rates in the same manner as racial differences is a useful undertaking in any event.
B. The Seemingly High Rates of Suspension for Discretionary Conduct – the Conflation of (a) the Proportion Suspensions for Certain Types of Conduct Comprise of all Severe Discipline for Those Types of Conduct with (b) the Rates at which Students Engaging in Certain Types of Conduct are Suspended.
The Report (at 2, 17) makes a point of the fact that a large proportion of the impositions of the studied types of discipline are for what are termed “discretionary behaviors,” which include (a) “detrimental behavior,” (b) “disobedient/defiant,” and “(c) “other code of conduct violations.” Presumably, discretionary in this context means that discretion is involved in identifying the conduct and that, more so than in the case of other violation, discretion is involved in determining the type of discipline to impose. More broadly, these types of conduct might be deemed facially less serious than the other types of conduct for which the discipline levels examined in the study were imposed.
In this context, the report states (at 17, emphasis added):
“Student incidents involving drugs, alcohol, and tobacco together comprise about 7% of annual student-discipline-warranting behaviors. Yet, while the percentage of serious behavior infractions is low, evidence shows that students who exhibit behaviors falling into discretionary categories receive in- and out-of-school suspensions at rates comparable to those of students involved in those more serious infractions.”
The highlighted statement suggests that students who engage in discretionary conduct are just as likely to be suspended as those who engage in the more serious conduct. Further, a Denver Post account of the study described stated that “among its findings, [the study’s authors] noted that 63.5 percent of disobedient or defiant behavior ends in out-of-school suspension — and out-of-school suspensions make up more than half of all disciplinary actions.”
These are quite important interpretations because a good deal of the sentiment in favor of relaxing school discipline is based on perceptions about high rates of suspensions for conduct that does not on its face appear to involve extremely serious misconduct. But the highlighted statement in the report and the statement in the Denver Post are based on the conflating of the (a) the proportion suspensions for certain types of conduct comprise of all severe discipline for those types of conduct with (b) the rates at which students engaging in certain types of conduct are suspended.[ii]
1. The Proportion Suspensions for Certain Types of Conduct Comprise of all Severe Discipline for Those Types of Conduct
The statement in the report about comparable rates of suspensions for the different types of conduct is evidently based on Table 4 (at page 9). The universe examined in Table 4 is comprised of instances of what we may call severe discipline (SD) for various actions (allowing that we do not know exactly what the 3.5% of total actions termed “other actions taken” typically means). The proportion of each race’s discipline actions falling into each SD category naturally sum to 100%, as shown in the table. As indicated in section A, least severe of these discipline types are “classroom suspension” (an extremely small category), “in-school suspension,” and “out-of-school suspension.” And the proportion of total SD comprised of these categories for the less serious conduct, rather than being comparable to those of the more serious conduct, is actually greater than that for the more serious conduct, which is exactly what one should expect. Thus, for example, in the least severe category (classroom suspension), the figures are 5.8% for disobedient/defiant compared with 0.1% for dangerous weapon; in the next least severe (in-school suspension), these figures are 43.7% and 30.8%; in the next least severe (out-of-school suspension), these rates are 45.1% and 46.8%. These figures sum to 94.6% for disobedient/defiant and 51.9% for dangerous weapon. By contrast, the extreme sanction of expulsion comprises 0.5% of the discipline actions for disobedient/defiant and 45.3% of the discipline actions for dangerous weapons.[iii]
The 63.5% out-of-school suspension figure in the Denver Post – which I stress again is the proportion that each type of discipline action comprises of total SD actions for each type of behavior, not, as reported, the rate at which students engaging in the type of behavior are disciplined – is actually for “detrimental” behavior rather than “disobedient/defiant.” The figures for the category are 1.5% for classroom suspension, 30.7% for in-school suspension, and 63.5% for out-of-school suspension, summing to 95.7%, compared with the 51.9% for dangerous weapons. And 1.3% of SD actions for this behavior are expulsions compared with the 45.3% figure for dangerous weapons. Thus, as with “disobedient/defiant,” compared with more serious conduct, the punishments are disproportionately in the less severe of the universe of severe punishments, just as one would expect.
That high figures for in-school suspension and/or out-of-school suspension are not rates at which students engaging in discretionary behavior receive those types of discipline actions, or that they reflect concentrations among the less severe of the discipline actions, might be more evident if the very small category of “classroom suspensions”[iv] were not included and more so if the comparatively small category of “in-class suspension” were not included as well. And if the study were limited to a category “out-of-school suspension or more severe,” or “expulsion” it would be evident why the figures for the less serious conduct would be comparable to that for the most serious conduct – that is, because they would all be 100%.
From another perspective, if the statement for the Denver Post had, like the statement at page 17 of the report, included figures for in-school suspension as well as out-of-school suspension (hence, summing to 94.2%) that the figures would not reflect the rates at which students engaging in the behavior received one of these forms of suspension might have been evident. The same might be said of a statement (ignoring again the small number of “other actions”) that 100% of detrimental behavior ended in some form of suspension or more severe action. On the other hand, as reflected in the item referenced in the first endnote, facially implausible statements commonly escape scrutiny.
I add that, while I am not too sure what useful information might be derived from analyzing the breakdown of SD into its components for various offenses by type of offense, I hope that the discussion above makes clear that, in accord with the points made in Section A, such analysis should be conducted by the use of dichotomized variables – e.g., expulsion versus everything else, as done above or, say, expulsion and out-of-school suspension versus everything else. As in the case of self-rated health, comparisons of rates of falling into intermediate categories are problematic whatever the purpose.
2. The Rates at which Students Engaging in Certain Types of Conduct are Suspended
The figures we are concerned with regarding whether students are too readily suspended for less serious conduct either generally or in comparison with more serious conduct are the rates at which persons who engage in each type of conduct receive severe discipline. One assumes that students who are found to have brought dangerous weapons to school receive some type of severe discipline in a very high proportion of cases. But the proportion of cases in which students who engage in some form of behavior that makes teaching more difficult for teachers and learning more difficult for other students are deemed sufficiently serious to result in severe discipline of the type examined in the NEPC study is unknown. That proportion – presumably limited to those situations where one or more teachers and administrators concluded that the situation could not otherwise be addressed – could well be, and probably is, quite small.
And the proportion of such cases where a majority of school administrators and parents possessed of all the facts would conclude that the discipline imposed was justified is anyone’s guess, though there may well be administrators sufficiently familiar with disciplinary patterns in Colorado schools to make an informed guess. But the figures in Table 4 of the NEPC report provide nothing useful regarding either the question of the rates at which the discretionary behaviors receive severe discipline or the question of how often such discipline is justified.
As reflected by the Denver Post interpretation, however, the table and its treatment could seriously mislead policy makers concerning these important questions.
[i] “The Mismeasure of Health Disparities in Massachusetts and Less Affluent Places,” Department of Quantitative Health Sciences, University of Massachusetts Medical School (Nov. 18, 2015); “The Mismeasure of Discrimination,” Center for Demographic and Social Analysis, University of California, Irvine (Jan. 20, 2015); “The Mismeasure of Demographic Differences in Outcome Rates” Public Sociology Association of George Mason University (Oct. 18, 2014); “Rethinking the Measurement of Demographic Differences in Outcome Rates,” Maryland Population Research Center of the University of Maryland (Oct. 10, 2014); “The Mismeasure of Association: The Unsoundness of the Rate Ratio and Other Measures That Are Affected by the Prevalence of an Outcome” Minnesota Population Center and Division of Epidemiology and Community Health of the School of Public Health of the University of Minnesota (Sept. 5, 2014); “The Mismeasure of Group Differences in the Law and the Social and Medical Sciences,” Institute for Quantitative Social Science at Harvard University (Oct. 17, 2012); “The Mismeasure of Group Differences in the Law and the Social and Medical Sciences,” Department of Mathematics and Statistics of American University (Sept. 25, 2012).
[ii] The conflation of (a) the proportion suspension for certain types of conduct comprise of all severe discipline for those types of conduct with (b) the rates at which students engaging in certain types of conduct are suspended is somewhat akin to the conflation of (a) the proportion blacks and whites comprise of all college students and (b) the proportion of blacks and whites who attend college that is the subject of the Journalists and Statistics sub-page of the Vignettes page of this site. In the situation discussed there, that black males and white males respectively comprised 4% and 39% percent of college students was interpreted to mean that what males were almost ten times as likely as black males to attend college. In fact, whites were about 1.4 times as likely as blacks to attend college.
[iii] These figures do not sum to 100% for either type of conduct because of the ignoring of “other actions taken,” for which the proportions of total SD actions are 4.7% for disobedient/defiant and 1.2% for danger weapons. The comparative size of those figures suggests that the other actions taken category commonly involves a less severe type of discipline.
[iv] As one might reasonably expect, an extremely high proportion (96.2%) of these most lenient of the severe discipline categories were imposed for the three discretionary behavior categories. That 96.2% figure is based on the figures in Tables 3 and 4, which show 2678 of a total of 2785 classroom suspensions imposed for the three discretionary behaviors. Table 1, however, shows a total of 3,045 classroom suspensions. So there is some inconsistency in the tables.