Predicting Recidivism Risk: New Tool in Philadelphia Shows Great Promise
by Nancy Ritter
Tool uses random forest modeling to identify probationers likely to reoffend within two years of returning to the community.
© Veer (see
reuse policy). Image is used for illustrative purposes only and any person depicted is a model.
For some time to come, our cities, counties and states will face the tremendous challenge of trying to do their work with fewer resources. That challenge is perhaps no more pressing than in the nation's corrections system, where fiscal realities demand the downsizing of prison populations.
In 2009, the Pew Center on the States estimated that 1 in 45 adults in the U.S. was under some form of community correctional supervision. As ever-increasing numbers of offenders are supervised in the community — witness the massive "realignment" of prisoners in California — parole and probation departments must find the balance between dwindling dollars and the lowest possible risk to public safety. The good news is that researchers and officials in Philadelphia, Pa., believe they have developed a tool that helps find that balance.
Seven years ago, criminologists from the University of Pennsylvania and officials with Philadelphia's Adult Probation and Parole Department (APPD) teamed up to create a computerized system that predicts — with a high degree of accuracy — which probationers are likely to violently reoffend within two years of returning to the community.
"We were asked to develop a new risk-forecasting tool to help the financially strapped probation department tailor their officers' caseloads to the risk level of probationers," said Geoffrey Barnes, who, with fellow researcher Jordan Hyatt from Penn's Jerry Lee Center of Criminology, created and evaluated the tool. "The goal was to ensure that officers who were supervising probationers with a high risk of recidivating would have a smaller caseload than officers who were supervising folks with a lower risk."
The tool — which has been successfully used in Philadelphia for four years — assesses each new probation case at its outset and assigns the probationer to a high-, moderate- or low-risk category. Although this is not a new concept, what is unique is that the tool uses "random forest modeling," a sophisticated statistical approach that considers the nonlinear effects of a large number of variables with complex interactions. Historically, corrections officials — in Philadelphia and elsewhere around the country — have used simpler statistical methods, such as linear regression models, to try to get a handle on the risk that a probationer may pose to the community.
See "What Is Random Forest Modeling?"
Random forest modeling, as applied to criminal justice, was pioneered by criminologist Richard Berk, also at Penn, who acted as a consultant on the NIJ-funded project.
Pre-Random Forest Times
Before the creation and implementation of the new risk-forecasting tool, Philadelphia — like many of the nation's parole and probation departments — used a one-size-fits-all supervising strategy. Every offender saw his or her probation officer about once a month for 20–30 minutes.
"Most of APPD's probationers were supervised under a strategy that mandated only two and a half hours of interaction per year," said Barnes. "When they contacted us, the department's leaders expressed a strong desire to reform this policy — to focus more supervision on those with the largest risk of future violence and devote far fewer resources on those who presented little or no risk of reoffending."
To develop the tool they had in mind, Barnes and Hyatt were actually "embedded" in APPD. Over the next few years, they built three iterations of a model that makes virtually instantaneous forecasts regarding offenders who are due to be released to the community.
Since APPD began using on-demand risk forecasting, the agency has handled well over 120,000 new "case starts," referring to the time when an offender begins probation. (Note that about one-third of the offenders have had more than one probation "case start," so this number actually reflects about 72,000 individual offenders.)
In 10-15 seconds, the tool assigns a new probationer to one of three categories. The lowest level of risk is assigned to those who are predicted to not commit any new offense in the next two years. The moderate-risk level identifies those who are likely to commit a crime, but not a serious one. The high-risk level is for those who are most likely to commit a serious crime, which APPD defines as murder, attempted murder, aggravated assault, rape and arson.
Community supervision is based on the determined risk level. Probation officers who are supervising high-risk individuals are given the smallest caseloads.
Although the random forest model developed in Philadelphia can be adapted by other jurisdictions, it is not an off-the-shelf tool. Obviously, the data are unique to the probationers who are under APPD supervision. And the "outcomes," or risk-level assignments, are also unique to Philadelphia because APPD officials set their own parameters based on resources and every manner of policy, operational and political reality that the tool is asked to consider.
Hyatt offers this analogy: You could take the engine out of a custom sports car, but it probably wouldn't work the same way in another car — and it might not even work at all. Therefore, another jurisdiction using random forest modeling to build a risk-prediction tool would need its own statisticians, computer whizzes and agency officials working in concert. Although many of the same questions would be asked, the "answers" — specifically, the outcomes around which the tool would be designed — would be different.
The first thing a jurisdiction interested in creating a random forest risk-forecasting tool must do is determine what data already exist in electronic form. It is very possible, say Barnes and Hyatt, that a jurisdiction will discover it has far more data than it realizes — criminal histories in the court system, local prison records and separate police records.
"Every jurisdiction probably has access to data that they haven't even thought about," said Barnes. "We capture so many types of information as a matter of course, as part of the day-to-day routine. I suspect few people realize how enormously powerful it could be to — with just a few manipulations — convert it into numbers that could forecast future behavior."
As they developed the risk prediction tool in Philadelphia, Barnes and Hyatt mined raw data from six different databases. The team then tested hundreds of different predictors using many different approaches, all the while fine-tuning the delicate balance between APPD's resources and the forecasting accuracy that was achievable. Eventually, three models went live. The third, Model C, has been in operation since November 2011 and uses 12 of the strongest predictors of risk of reoffending, including prior jail stays, the probationer's ZIP code and the number of years since the last serious offense.
Every jurisdiction would be looking at its own very unique data set that reflects decisions, made by people who have long since retired, about what should and should not be rolled over into their next system. Therefore, it would be especially helpful for one of the team members to understand what data were taken from an older system — be they paper records from jails, courts or police, or old computer records — and used in newer systems.
"You definitely need a computer professional on the team from the beginning," said Barnes. "Ideally, this would be someone familiar with the way the jurisdiction has kept its records."
Finally, it is important to be mindful of simple geography. It will come as no surprise that data-sharing among jurisdictions in the U.S. is quite limited, particularly in terms of the kind of instantaneous forecasting that this tool is designed to perform. For example, the APPD tool uses criminal history data only from Philadelphia; data from other states, and even other parts of Pennsylvania, are not used, which means that the forecasts do not necessarily indicate each probationer's universal level of risk.
"Offenders who represent a serious danger outside the city of Philadelphia could very easily be forecasted as low risk within these boundaries, particularly if they usually live, work and offend elsewhere," Hyatt explained.
The bottom line is that, as in every scientifically based endeavor, data are paramount.
"The key," added Barnes, "is to ensure that all of the data sources are immediately available through the agency's data network, although it is important to note that the data do not need to be up-to-the-minute accurate to be useful."
Forecast Begin- and End-Points
After dealing with the availability of data, the next step is to determine when the forecasting begins (called the "unit of prediction") and when it ends (the "time horizon"). The beginning point can be any moment in the lifespan of an offender's case — when bail is set, when charges are filed, at sentencing, when the offender enters the correctional system or when the offender first reports for probation.
In Philadelphia, officials chose the start of probation and a time horizon of two years. The APPD tool therefore predicts the likelihood of a probationer committing a violent crime within two years of returning to the community. Although any time period can be used, it is important to understand that the accuracy of forecasting a longer period depends on the depth of data available.
"If, for example, you want to forecast what is going to happen over the next five years, you have to use data from at least five years ago and before," Hyatt said.
Once the unit of prediction and time horizon are determined, the next step is to decide what "forecasting outcomes" the tool should be set up to predict. Researchers such as Barnes and Hyatt can guide practitioners through this process, but the practitioners themselves must ultimately make the decisions because resources, personnel, operational and even political realities must be considered. In Philadelphia — after weeks of examining caseloads and staffing levels — officials decided that approximately 15 percent of their probation population should be classified as high risk, 25–30 percent as moderate risk, and 55–60 percent as low risk.
Barnes and Hyatt acknowledge that someone picking up the final report they submitted to NIJ at the end of the grant could be a bit overwhelmed by random forest modeling. The forecasting tool now being used in Philadelphia, for example, looks at 500 decision "trees" (hence random "forest") that contains approximately 8.74 million decision points as it runs a risk assessment of a new probationer. But, they insist, there is no reason that criminal justice practitioners should shy away from the technology.
"If you think about it," said Barnes, "private companies do this every day — they crunch data to decide who's likely to buy peanut butter, for example, and they send coupons to those folks."
Of course, both researchers are quick to point out that forecasting criminal behavior is not coupon clipping, but the principles of data analysis, they say, are the same.
Determining an Acceptable Error Rate
No prediction tool is perfect. Anyone who has watched a weather forecaster predict 8 inches of snow — then dealt with crying children who have to go to school when only a dusting falls — knows that predictions are occasionally wrong.
The key in building a random forest prediction tool for any aspect of the criminal justice system is balancing the risk of getting it wrong. This process involves determining, in advance, an acceptable error rate. And this demands intensive collaboration between researchers and practitioners, one in which agency officials — not statisticians — must make crucial policy decisions. In particular, this means determining prespecified levels of "false positives" to "false negatives."
A false negative is an actual high-risk person who was mistakenly identified as moderate or low risk. A false positive is an actual low- or moderate-risk person who was identified, and therefore supervised, as high risk.
As the practitioners work side-by-side with the researchers to set these parameters, they will inevitably encounter the need to make tradeoffs they can live with. This is referred to as the "cost ratio." Before the risk prediction tool can be built, the numerical ratio of these costs must be approximated. It is not enough to simply say that false negatives are generally more costly than false positives. Rather, an actual
value must be provided.
Here is how Hyatt explained the process in Philadelphia: "Basically, we had to determine precisely how much more costly it would be to mistakenly classify a probationer in a lower-risk category who then went on to commit a serious crime than it would be to intensely supervise someone who is actually a low-risk probationer because the tool had assessed him as high risk."
Most jurisdictions that contemplate building a random forest risk-prediction tool would likely do what they did in Philadelphia: set a higher relative cost for false negatives than for false positives. Philadelphia's APPD decided on a cost ratio where false negatives were 2.6 times more costly than false positives. But any jurisdiction that wishes to design and implement a similar tool would have to determine its own cost ratio or error rate.
As Barnes and Hyatt noted, there is no single 'right answer' in choosing the unit of prediction, the time horizon, the definition of outcomes or the cost ratio. Every jurisdiction that wants to build a random forest model prediction tool must commit to this very delicate balancing act — one in which researchers can assist, but that, in the end, requires practitioners to do the heavy lifting.
"I cannot emphasize this enough," Barnes added. "Balancing these different types of errors with the model's overall accuracy rate is not the job of the team's statisticians. Because an agency's leadership has to live with the consequences of any error that occurs once the forecasting tool goes live, they must decide what level of accuracy they can live with and the balance of potential errors they prefer."
The model that has been used in Philadelphia for just over a year (Model C) has an accuracy rate of 66 percent when considering all three (high-, moderate- and low-risk) categories. In their final report to NIJ, Barnes and Hyatt offer a detailed account of the development and accuracy of the three generations of risk-prediction models, including much more detail about the separate accuracy rates for the three risk categories; for example, probationers who were categorized as high risk are 13 times more likely to commit a new serious offense within the two-year forecast period than either low- or moderate-risk probationers.
Read the NIJ report (pdf, 64 pages).
All three iterations of the Philadelphia model were validated using a sample of probation cases from 2001, which gave the researchers a 10-year period in which to assess the long-term offending of the probationers. That said, of course, any forecasting tool, including this one built using random forest modeling, will make mistakes.
But, said the researchers, when it comes to figuring out how to be more effective in using corrections system dollars, everyone should understand that choices will always have to be made — and the goal is to make the most accurate choices in as cost-effective a manner as possible.
As Barnes put it, "The real achievement of the final model in Philadelphia is not that it is right two-thirds of the time but that it produces this accuracy by balancing the relative costs of the different kinds of errors."
"The point," Hyatt added, "is that random forest modeling allows you to add different variables without sacrificing your ability to make accurate predictions. By working hand-in-hand with their practitioner and policymaker partners, researchers can come up with the right ratio of variables that work in their own unique jurisdiction, both from a practical standpoint in terms of the data that are available and from a standpoint of political and policy exigencies which decision-makers are comfortable putting into a forecast tool."
It is important to understand that the NIJ-funded project discussed in this article looked only at the creation and effectiveness of the prediction tool itself — not at the effectiveness of the subsequent supervision or treatment of APPD probationers. In other words, the project did not, for example, consider whether (and to what extent) intense supervision and exposure to more aggressive interventions may have caused a high-risk probationer to not commit another serious crime.
The Benefits of Random Forest Modeling
One of the most compelling attributes of random forest modeling is that — unlike linear regression analyses — it is not necessary to know in advance what data will be useful in predicting behavior or which variables will affect the predictive power. In more traditional statistical procedures, only a limited number of predictors are used to try to forecast future behavior. But random forest modeling does not require users to be so choosy.
The tool can be programmed to simply not consider a factor based on other variables. In other words, data can be "over-included," and the tool will simply filter them out. For example, the tool may say, "I don't see much of a juvenile record for this individual, but I do see, from an earlier branch in the tree, that this person is 60 years old, so I wouldn't expect to find much of a juvenile record; but, regardless, now that he is 60, this is probably not a very important factor now."
This is not the case with regression equations, where every time another variable or predictor is added, something is lost. With random forest modeling, variables can be added without losing predictive capacity. Indeed, it is this feature that can help bring researchers, practitioners and even politicians to the same table while the tool is being developed. It helps garner buy-in, as it were, from skeptics.
"Adding variables that individual stakeholders cared about — even if we, as criminologists, didn't think they would have much predictive power — helped our APPD partners feel that we were hearing them and responding to their concerns," said Barnes. "This feature helped them get behind what we were trying to do as we built the forecasting tool, and, importantly, it helped everyone understand the risks that the policymakers, in particular, faced."
The bottom line is that any data can be used in a random forest tool, depending on the wishes of officials and other key players. Data that may be statistically unimportant — but politically important — can be built into the tool. For example, a jurisdiction might want to consider the number of a probationer's violent co-offenders; although APPD ended up not using that data in its tool, another jurisdiction may find such data predictive.
Another advantage of random forest modeling is its ability to identify highly nonlinear effects for each individual predictor. Consider, for example, the bivariate relationship between a soon-to-be-probationer's age and the likelihood that the tool would forecast him to be high risk. It is not surprising that the youngest probationers in Philadelphia were forecast to present the greatest danger of a serious-crime reoffending. However, the random forest analysis also showed something else.
"A bit more surprising is how quickly the probability of a high-risk forecast dropped as the offender got just a few years older," said Hyatt. "By the time the incoming probationer turns 27, the likelihood of receiving a high-risk forecast is not appreciably different from that of a 40-year-old — and, after the age of 40, the amount of risk seems to drop once again until it reaches a level that is effectively zero at age 50 and beyond."
Resources, Equitability and Fairness
Why should officials in the criminal justice system think about building a risk analysis prediction tool using random forest modeling? Proponents say one of the most compelling reasons is the simple matter of fiscal resources.
"We just do not have the ability to pay for the most intensive level of supervision for every probationer," said Barnes. "We don't have the ability to sentence every prisoner to life. We have to be very careful about how we allocate precious resources and, for public-sector workers — be they probation officers, police officers or corrections officers — the most precious resource is time."
The random forest model prediction tool, he said, allows agencies to base their personnel and policy decisions on a scientifically proven method.
Another reason to consider constructing such a sophisticated prediction tool is that, quite frankly, "prediction," in some form or another, is already occurring. Everyone involved in the criminal justice system — from judges to probation officers, from police leaders to politicians who write the laws and determine budgets — is making judgments, essentially predictions, about the relative risk of an offender.
Researchers Hyatt and Barnes believe that by using random forest modeling to build the actuarial risk-assessment tool for Philadelphia's APPD, they have ensured that those predictions are being made in the fairest, most equitable way possible.
"Using random forest modeling gave us the assurance that we made use of the best science available to identify the most dangerous offenders," said Barnes. "It has ensured that we're preserving resources and that the people who are subject to the policy decisions based on those risk assessments are being treated in a fair and consistent way."
"You may not like being on high-risk probation," he added, "but from a procedural justice standpoint, you at least know that the decision was made the same way for everybody."
Under one-size-fits-all procedures being used in many jurisdictions around the country, probation officers are given an enormous amount of discretion. This means that probationers who actually have a similar risk of reoffending could be — and therefore likely are — treated in disparate ways based on who their probation officer is and any number of other factors.
However, in addition to ensuring that offenders are assessed consistently in terms of their risk level, the tool being used in Philadelphia — and the policy decisions that APPD has put into place to operationalize the results — ensures that offenders who are identified as being at a certain risk level are all treated the same. Every probationer whom the tool scores as high risk is treated under the same high-risk protocol; this standardizes both their reporting requirements and the rules that they have to follow — including, of course, any likelihood that they will be sanctioned for a technical violation.
This equitability is something that researchers Barnes and Hyatt — and the probation professionals who have been successfully using the tool — believe in.
"Because every probationer is put into the same model, the same decision points will be hit as the model produces the risk-category analysis," Barnes said. "Two offenders with the same data values — even if they come from different parts of the city, even if they are different kinds of people — will go through the same scoring process in the same way."
"And that," Barnes argued, "is a far sight more equitable than a probation officer perhaps taking a dislike to you and deciding that you need to come in more frequently because you remind him of somebody who victimized a close relative a few months ago."
This is not to say, however, that human judgments don't play a role.
"Human judgments are important," Hyatt added. "But one thing that has been consistently found every time that this sort of technology has been used to forecast human behavior is that these actuarial decision-making models do a better job — and produce more accuracy in a more consistent fashion — than human gut reactions ever could."
As with any kind of new technology-based tool, however, there is an inevitable intersection of science and human nature — including ethics — that must be grappled with.
For example, some have argued that using some variables, such as an offender's ZIP code — particularly in a city as highly segregated as Philadelphia — can be a proxy for race. Others note that individuals who are categorized as high risk and therefore more intensely supervised are probably going to incur more technical violations of the terms of their parole. Certainly, just as any policy decision that has moral and ethical ramifications (and most do), it is important that these issues are clearly understood and squarely addressed.
The Key: A Strong Partnership
Barnes and Hyatt emphasize that building the random forest prediction tool in Philadelphia was a tremendously iterative process — and one that required day-to-day collaboration with APPD.
"You don't put all the data into the computer the first time and hit the button and say, 'OK, we're done,'" Barnes said. "The model comes out and you look at it. Everyone sits down around a table and discusses it. The statisticians describe the problems they faced. The database guys look at it and say, 'Well, yes, but you are using this variable in the wrong way,' and the practitioners look at it and say, 'We really can't have 35 percent of our caseload on high-risk supervision. It's not going to work. That number has got to come down.'"
This gets at the constantly evolving nature of the random forest tool.
"You constantly are building new things to try to deal with changes in the environment, changes in the data, changes in what people think are predictive, changes in chronological theory over time," Hyatt noted.
Recommendations from the Research
Given the need to balance fiscal realities with an overarching mission to protect public safety, criminal justice professionals are beginning to look — with the same creativity and vigor as private-sector professionals — at sophisticated statistical tools to solve problems. Therefore, it is likely that risk-prediction tools using random forest modeling may play an important role in the future of our criminal justice system.
A tool like the one developed in Philadelphia provides an opportunity to advance the capabilities of the criminal justice system to protect communities, particularly for jurisdictions with large probation populations that must be managed with fewer dollars. Indeed, as the Philadelphia project demonstrated, the random forest-based risk-prediction tool has helped probation officials manage cases more efficiently. For nearly four years now, they have been able to concentrate resources on a small number of probationers who require more active supervision, rather than on those who are unlikely to reoffend regardless of how they are supervised.
In their final report, Barnes and Hyatt recommend 12 steps that could serve as a blueprint for a jurisdiction that is considering building a random forest model risk prediction tool:
- Obtain access to reliable data that are consistently and electronically available.
- Define the unit of prediction and time horizon.
- Define the outcome risk categories.
- Consider the practical implications for a risk-based supervision strategy and ensure adequate resources based on the distribution of risk scores.
- Choose the predictor variables to be used, based on theoretical, practical and policy considerations.
- Build a single database file.
- Estimate the relative costs of false positives and false negatives, keeping in mind that agency leadership must value the relative weight of these inaccuracies.
- Build an initial model and evaluate the results.
- Adjust the model to reflect policy-based concerns regarding accuracy and proportional assignment to risk categories; construct additional test models where required.
- Produce forecasts for offenders already in the agency's caseload.
- Create the user interface and back-end software to produce live forecasts.
- Continuously monitor the results of the live forecasts.
Again, it is important to understand that the Philadelphia tool was based on probationers who live in Philadelphia. Needless to say, people in other jurisdictions may be different in key ways — and crime trends vary in different parts of the country and even in different parts of a state. Therefore, a tool that uses random forest modeling must be based on the best available data about the population whose behavior is being predicted.
Finally, say proponents, because random forest modeling can be tailored to specific needs, researchers and practitioners should not limit their thinking to urban probationers, such as those with whom the team worked in Philadelphia. Random forest modeling may prove useful in managing prison populations, for example. Or, said Barnes, perhaps officials in another jurisdiction are interested in looking at the pretrial behavior of people who have merely been charged with an offense.
These would present entirely different environments, of course.
"But," Barnes noted, "the chances are that a jurisdiction has the data to build other kinds of prediction models."
"You just have to make the contact with somebody with reasonable statistical skills, use the database professionals who you almost certainly have already employed, convert the data into a usable format, and go ahead and build the model," he added. "Give it a shot."
NIJ Journal No. 271, February 2013
About the Author
Nancy Ritter is a writer and editor at NIJ.
Back to the top.
[note 1] Pew Center on the States,
One in 31: The Long Reach of American Corrections, Washington, D.C.: Author, 2009.
Date Created: February 27, 2013