​​​​​​​​​​​​​​​​​​​​​​​  

Real-Time Crime Forecasting Challenge: Questions and Answers

On this page you will find questions and answers related to the Real-Time Crime Forecasting Challenge prior to February 17, 2017.

Q-1: What are the submissions supposed to be like? Specifically what qualifies as a "hot spot"?
A: The term hot spot is an arbitrary term. The simplest definition of hot spots is, “the cells that have the highest number of calls for service compared to the other cells.” There is no single threshold for being a hot spot. The threshold will change depending on the definition of ​time and place. Additionally, the number of cells forecasted to be hot spots will vary depending on the size of the areal size of the cell, and because of the restriction put on the range of the total forecasted area. The “product” that is to be submitted for the competition is a shapefile, using cells, covering the entire Portland Police Bureau. Within that shapefile contestants should include a binary variable “hotspot” where the contestant indicates if that cell is forecasted to be a hot spot. Contestants need to follow the requirements outlined in Table 2. Specifically that the area of each cell is between 62,500 – 360,000 sq.ft. (except those cells that need to be trimmed due to boundaries) and that the total forecasted area be between 0.25 – 0.75 sq.mi.
Q-2: Can weather forecast data be used as part of a prediction?
A: From the Challenge:
"Contestants should be aware that other entities may make other data available through free or fee-based services (e.g., cloud and data sharing sites) that may or may not also be useful in developing their algorithms. Contestants are permitted, but not required, to use any other data sets or services."
"...other data sets..." could include weather data.​
Q-3: What is the projection of the x y coordinate system?
A: From table 2 in the Challenge: Projection: NAD_1983_HARN_StatePlane_Oregon_North_FIPS_3601_Feet_Intl
Q-4: What are the dates that we are predicting for (given each of the duration categories)?
A: The dates for predictions in each category are:
  • One week: March 1 - 7
  • Two weeks: March 1-14
  • One month: March 1-31
  • Two months: March 1-April 30
  • Three months: March 1-May 31
Q-5: Who designed the judging/scoring criteria?
A: The PAI was originally proposed to the field by Chainey, Tompson, and Uhlig (2008). They sought to define a measure for testing forecasting accuracy. They said it measures "the hit rate against the areas where crimes are predicted to occur with respect to the size of the study area;" the actual equation is given with the Challenge. The criminology field (specifically those working on forecasting and prediction) have principally relied on this measure since its inception.

The PEI* (and PEI) were proposed by Hunt (2016) as a complimentary measure to the PAI. He sought to define a measure for testing forecasting efficiency. It is meant to measure how well a forecast does compared to how well it could have done (post hoc). It was only recently introduced, however it is the only known plausible alternative at this time to compliment the PAI. The equation is provided in the Challenge for how it is measured.

References: 

Hunt, J. (2016). Do Crime Hot Spots Move? Exploring the Effects of the Modifiable Areal Unit Problem and Modifiable Temporal Unit Problem on Crime Hot Spot Stability. Archived with ProQuest Dissertations & Theses.
Q-6: Is the longitude and latitude intentionally distorted?
A: The only distortion to the data is moving the CFS a few feet from the building footprint to the street segment directly in front of the address. If other distortions are noticed ensure all point and shapefiles are in the correct projected coordinate system.
Q-7: Can you define the variable 'census_tract'?
A: Census tract refers to what federal US Census tract the CFS occurred in.​
Q-8: Can different cells have different shapes as long as the areas are the same or Do all interior cells have to have the same shape?
A: All cells should be the same shape . The only exception is those cells that are trimmed due to boundaries (they will start as the same shape but clearly the trimming of the cell will change its shape).
Q-9: Are cells allowed to overlap in any way?
A: No they are not allowed to overlap. As it would then be impossible to have the cells cover the entire police jurisdiction AND have the total cell area equal the area of the police jurisdiction.
Q-10: The PEI has the following in its definition: "Where n* equals the maximum obtainable n for the amount of area forecasted, a." So what grid/schema are the scorers using to determine the maximum for a forecasted area? The calls themselves are points and have no "area." Are you going to use irregular polygons within the competition rules to determine this (62,500 ft2 – 360,000 ft2)?
A: n* is based off of the grid/schema that is used within that forecast. We will overlay the actual CFS for that time period over the cells/shapefile submitted for that forecast and see which cells (for that amount of forecasted area) provide the greatest number of CFS.
Q-11: The airport is not included in the shapefile for Portland Police District, yet it receives many calls for service. Are we to assume that this analysis should be limited to only area contained within a Portland Police District?
A: Analysis should be limited to those areas included in the PPB shapefile provided. Some provided CFS may fall outside of that region; however, cell schema’s should only cover the area of the PPB jurisdiction.
Q-12: For purposes of this Challenge, what is the definition of a “resident” of the 50 United States, the District of Columbia, Puerto Rico, the U.S. Virgin Islands, Guam, the Northern Mariana Islands, and American Samoa?
A: A resident is defined as a citizen or resident alien of the United States. A resident alien, for purposes of this Challenge, is someone who meets either the green card test or the substantial presence test as defined by the Internal Revenue Service (IRS). See "Alien Residency Examples" from the IRS to help determine if you qualify as a resident.
Q-13: If I am enrolled in 12 or more credit hours worth of courses but ant multiple universities, would I be considered a full-time student for purposes of the Challenge?
A: You would not be eligible for the student category. The Challenge defines the student category as either "enrolled as a full-time student in high school or as a full-time, degree-seeking student in an undergraduate program (associate or bachelor's degree)."
Q-14: Is it possible to provide a feature for the hour or the time of day that the calls came in?
A: No. Hour or time of day for calls for service are not not available from Portland Police Bureau data.​
Q-15: Can you provide an example of how the PAI and PAI* score​s will be calculated?
A: See Example Calculation of Prediction Accuracy Index and Prediction Efficiency Index (pdf , 2 pages).​
Q-16: I am part of a team from a University. Would any prize money be disbursed to the individuals listed on the team roster or would it go to the university?
A: How the money is disbursed depends on which contestant type you submit under.
  • If you submit as a Small Team, any prize money will go to the individuals listed on the team roster that you are required to submit.
  • If the university submits as a Large Business contestant (assuming your university has more than 21 employees), any prize money would be awarded to the university to disburse.​​
Q-17: Are educational institutions and public safety or other public agencies eligible to compete?
A: Yes. The institution or agency would submit as either a small or large business type depending on the number of employees.
Q-18: I work for a public safety agency that has more than 2,000 employees. Would I and a fellow co-worker be eligible to compete in this competition? Would our team be considered to be part of the “small business” category or the “large business” category?
A: Yes, you are eligible to compete. Your agency should submit under the Large Business contestant type (defined as a business with more than 20 employees) under the name of the agency. Alternatively, you and your co-worker may submit as a Small Team if you are using your own free time to develop your submission.
Q-19: Are all of the individuals who are part of a team, small business, or large business entering the Real-Time Crime Forecasting Challenge required to be U.S. residents?​
A: All members of a team entering this Challenge must be U.S. residents. The members of a small business or large business (which includes other legal entities such as educational institutions) need not be U.S. residents, so long as (1) the business itself is the Challenge contestant; (2) the business is legally domiciled in the 50 United States, the District of Columbia, Puerto Rico, the U.S. Virgin Islands, Guam, or American Samoa. Any contest prize awarded to a business will be made to that business, not to any individual members of that business.
Q-20: Hi, can you please provide us with a tool to test the validity of our submissions according to Table 2? It would be great if the tool can validate not only the require variables but also the Individual cell area, the Total forecasted area, and the total area of all cells. Thank you!
A: We cannot provide such a tool. It is the responsibility of the contestant to ensure that the submission meets the requirements.
Q-21: Is it possible to provide a feature for the hour or the time of day that the calls came in?
A: No. Hour or time of day for calls for service are not available from Portland Police Bureau data. ​
Q-22: Why are hot spots are included in the submission criteria but crimes per cell are not?​​​
A: Contestants are forecasting where calls for service are likely to cluster (form hotspots). In order to test the effectiveness and efficiency of these forecasts NIJ will test the number of CFS that these cells forecast by overlaying the forecast over the actual CFS.
Q-23: The final submission should be hot spots map (shape files) which cover the area included in the PPB shapefile provided, is this right?
A: The shapefile should consist of cells that cover the entire PPB jurisdiction. You are then to indicate which cells are forecasted to be hotspots/have the highest CFS counts.
Q-24: How many hot spots maps I can submit
A: You may only submit one map for each combination of category and time frame (e.g., only one submission for street crime 1 week, one for street crime 2 weeks, one for all calls-for-service 1 week, one for all calls-for service 2 weeks).
Q-25: Can I submit one hot-spots map for the first-week (3/1-3/7, 2017) evaluation, and another hot-spots map for the two-weeks (3/1-3/14, 2017) evaluation?
A: Your one-week and two-week (one-month, two-month, and three-month) forecasts (for any crime type) can be the same, or they can be different. See also question 24 and the answer provided above.
Q-26: Can I use the data of the first-week (3/1-3/7, 2017) to train my model and then submit another hot spots map for the two-weeks (3/1-3/14, 2017) evaluation after February 22, 2017?
A: All submissions are due by 02/28/17 by midnight. All data obtained up to that point (02/28/17) may be used in any/all of the models. Forecasts for different crime types and different time frames are allowed to be different.
Q-27: In the PAI equation, "a equals the forecasted area, and A equals the area of the entire study area". Here the "forecasted area" means the total area of all hotspots in the final submission, and the "area of the entire study area" means the total area included in the PPB shapefile provided, are these right?
A: That is correct. A should equal 147.71 sqmi +/1 0.02 sq mi and a should be between 0.25 and 0.75 sqmi.
Q-28: In the PEI* equation, does n* include the cells that provide the greatest number of CFS?
A: Yes. See also the response to question 10 above and an Example Calculation of Prediction Accuracy Index and Prediction Efficiency Index (pdf , 2 pages)
Q-29: What software is readily available to work with the various file types used by Portland PD for the challenge data set?"
A: The data is provided as a "shapefile," which consists of collection of files using a common filename prefix stored in the same directory. An internet search of "shapefile" will provide greater details on the format and applications that can open shapefiles.
UPDATE: ESRI has agreed to provide fully functional evaluation ArcGIS software licenses to Challenge contestants for the duration of the Challenge. If you are interested in utilizing evaluation software from Esri to support your research and development efforts for this Challenge,​ visit Esri to request a copyExit Notice.​
Q-30: Should the submission to include the predicted numbers of CFS per cell?
A: No. Your submission must only indicate which cells are forecasted to have the greatest CFS counts.
Q-31: If I am submitting for the "One Week: March 1-7" category, should the submission simply include the number of CFS for that entire week without differentiating between the days? Or, should I submit a prediction for the CFS in each cell for March 1st, a prediction for the CFS in each cell for March 2nd, a prediction for the CFS in each cell for March 3rd, etc.? i.e. am I predicting for 7 individual days, or for the week as a whole?
A: These are aggregate forecasts. You will not forecast what happens at a day level; you will do a forecast for an entire week, 2 weeks, etc. You do not need to provide your forecasted count; only a variable to indicate if that grid is forecasted to be a hotspot (i.e., have a high CFS count).
Q-32: It is announced in the competition description and the FAQ that all the proposed cells (excluding the border ones) have to have the same shape and area. What do you exactly mean by the "shape"? Must all the cells be congruent in the geometrical sense? Can any cells be rotated and/or reflected with respect to the other ones? Can any cells be more "stretched" than other ones, e.g., can we simultaneously use rectangles with various aspect ratios?
A: The shape of the cells must be congruent. They may not be stretched, rotated, or reflected. The only difference between cells is a translation (horizontal/vertical) equivalent to the corresponding dimension of the cell.
AMENDMENT: Triangular cells may be rotated (posted 12/21/2016)
Q-33: Can NIJ validate the grids I created before my final submission?
A: No. NIJ is unable to prevalidate any submissions.
Q-34: (Follow-up to Q-32)​ If I rotate the regular rectangular grid then formally it will not satisfy the above condition since neither horizontal nor vertical translation will be equivalent to the corresponding dimension of cell. Therefore such a rotated grid is not allowed. Am I right?
A: Yes, that would not be allowed.
Q-35: (Follow-up to Q-32) If I shift down a bit one column in the regular rectangular grid then formally it will not meet the above condition since the horizontal translation will be equivalent to the corresponding dimension of cell, but the vertical one will not. Thus such a grid with a shifted column is not allowed. Correct?
A: Correct, that would not be allowed.
Q-36:In table 2, can you clarify what the "total forecasted area = .25-.75 sq. miles" means, and how does it relate to the footnote "the total area of all cells equals 147.71 square miles"? I take it to mean that all of the police districts areas are equal to 147.71 square miles. However, we are not supposed to forecast all of that, we are only supposed to forecast .25-.75 sq. miles. If this is correct, which .25-.75 miles are ​supposed to be forecasted? I also thought that all of the cells of the 147.71 square miles have a binary hot spot attribute which is the forecast. If that's the case, then we are forecasting 147.71 sq. miles not .25-.75 sq. miles as specified. If this is an incorrect understanding, how is it to be understood?
A: The total area of all cells equals 147.71 square miles, which is the total area of all cells that you should submit in your shapefile. The total forecasted area (.25-.75 sq. miles), is the total area of all cells assigned a hotspot variable of "1." In another words, you need to forecast for a total area of 147.71 square miles but only assign "1" to cells which total area sum up to a value between .25 and .75 sq. miles. As for the remaining cells, they must have a value of "0" for the hotspot variable.
Q-37: Is the area of all cells (147.71 square miles) supposed to be the same as the sum of the areas of the Portland police districts?
A: Yes, that should be the sum of the areas of the Portland police districts and the sum of the cells they use to grid out the districts.
Q-38: Can students work in a team or are student submissions made on an individual basis?
A: The student category is open only to individuals. A student team should enter as a Small Team/Business. If your team represents an educational institution, the institution can enter as a Large Business contestant (assuming it has more than 21 employees).
Q-39: Are you sure that the sample data set is correct and adequate for machine learning purposes?
A: We cannot comment on the correctness or adequacy of the crime data provided by the prospective applicant’s state. That data set is correct and adequate for the purposes of this Challenge. The data presented is the calls for service received by the Portland Police Bureau as described in the Challenge. Contestants submissions will be judged against the calls for service that the Portland Police Bureau actually receives during the relevant period(s) specified in the Challenge.
Q-40: As I understand, I am allowed to submit a grid consisting of very stretched rectangles, for example with height and width equal to 10 and 10000, respectively. It is written in the rules that "The Director of NIJ or their designee will make the final award determination. If the Director of NIJ or their designee determines that no entry is deserving of an award, no prizes will be awarded." Do submitted grids have to be reasonable sized to qualify for an award? If so, can NIJ provide guidance on the minimum size of the grids?
A: We have added the requirement that the minimum grid height or width is 125 feet (the minimum of the smaller dimension needs to be 125 feet). Since the total cell area must be equal a minimum of 62,500ft2, two example shapes with minimum size would be:
  1. A rectangular cell with the minimum cell height or width of 125ft would have to have a corresponding width or height of at least 500ft.
  2. An equilateral triangle cell would have sides of 416.35 feet (the height of the tri​angle would be 357.64).
    Note: We have revised an earlier requirement that prohibited the rotation of cell to allow now for the rotation of triangular cells only.  ​
Q-41: What should a submission include and how should the files be structured?
A: Forecasts must be submitted as a .zip file containing up to four folders (one for each crime type). Each of those four folders may contain up to 5 folders inside (one for each time period). Each of those files must contain a shapefile, which includes all of the files for that forecast. The file structure should look like:
  • Zip file
    • Folder for each crime type
      • Folder for each timeframe
        • Shapefile
View a sample submission prepared by NIJ (zip, 19.2 MB).
Q-42: Does the relevant area that we must consider extends beyond Multnomah county.
A: The submissions will be judged against how well they forecast actual calls for service in Portland for the crime categories and time periods specified in the Challenge. A few minor parts of the Portland Police Bureau's (PPB's) jurisdiction fall outside of Multnomah County, but within the Multnomah County census track 41051000100. Also, the PPB does not have jurisdiction in all of the county, or of the entirety of the census tract.

Contestants are free to consider factors outside Multnomah County if they believe that those factors may increase the accuracy of their submissions. It is up to the contestant to best decide how to join any additional data they would like to use.
Q-43: If each of the 20 possible categories and that you will evaluate the same submission with respect to both PAI and PEI. Analysis of the example submission shows that in every fixed category a good PAI and PEI values are achieved for completely different grids. Do we have to decide if we want to win the PAI or PEI category before the submission? May we submit 40 grids, one calibrated to PAI and PEI for each category?
A: The question makes the assumption that a model is 100 percent effective or 100 percent efficient, thus creating a situation where they are inversely related to one another. However, we expect that the current state of crime forecasting is neither 100 percent effective nor 100 percent efficient. Additionally, by constraining grid cell size we have also limited the amount of tradeoff that is possible.

We feel comfortable only allowing one submission per combination of crime type and time period.
Q-44: For teams, does each individual get a submission, and the team wins if any of the submissions is the best for a given prize? This would seem to dramatically favor larger teams, since they simply get more entries.
A: No, each individual in a team does not get a submission. The submission is from the team.
Q-45: Is it possible to create a grid using triangles that satisfies the condition "the only difference between cells is a translation (horizontal/vertical) equivalent to the corresponding dimension of the cell" with rotations added? Even for the equilateral triangles, the translation is equal to the half of the respective dimension.
A: Yes, it is as demonstrated in image 1 below. Additionally this could be adjusted by moving every other row by ½ the length of the base such that vertices meet up in pairs and sets of 4 instead of sets of 3.

Image 1

A grid of equilateral triangles with the top vertex of each triangle centered on the bottom side of the triangle above.
Q-46: Would a grid of equilateral triangles in which only vertices of three triangles at a point be allowed?
A:Image 1 shows three vertices in one point. While it is possible to align triangles such that six vertices align at one point, it is unclear how such a result could be replicated over the entire study area such that all of the jurisdiction is covered with no overlap or "ungridded" areas.
Q-47: Given all of the requirements, would only the following two types of grids be allowed? 1. A rectangular grid with 4 vertices at a point and sides parallel/perpendicular to the North-South axis2. Equilateral triangular grid with 6 vertices at a point and bases parallel to the East-West axis.
A: No, those are not the only two grid types allowed. Any parallelogram that meets the minimum width/height and size requirements would be allowed along with in addition to the two grid types you have The Challenge does not stipulate that cells be oriented along the North-South or East-West axes. In response to an earlier question, we did stipulate that cells "may not be stretched, rotated, or reflected," but that only applies to the cells in relation to on​e another. For example, a grid of rectangles each rotated off the North-South axis would be allowable as long as each cell is rotated to the same degree.

In addition, a grid of triangular cells (which may be rotated in relation to one another) also would not have to align along the North-South or East-West axes.
Q-48: Why is the only deliverable the shapefile? What about the methodology?
A: Through this Challenge NIJ seeks to gain a better understanding of the potential for crime forecasting in America. The Challenge offers the opportunity for a comprehensive comparative analysis between current "off-the-shelf" forecasting products and innovative forecasting methods to inform NIJ's research investments in this area. Requiring contestants to submit and disclose their intellectual property might discourage some contestants from submitting entries. NIJ may, in fact, contact selected contestants at a later date concerning their methodology. NIJ may, in fact, contact selected contestants at a later date concerning their methodology.
Q-49: Why could an applicant not simply use esri's hot-spot analysis tool and call it a day.
A: Contestants may not simply submit an entry that uses esri’s analysis tool because by entering the Challenge, each contestant warrants that (a) he or she is the author and/or authorized owner of the entry; (b) that the entry is wholly original with the contestant (or is an improved version of an existing solution that the contestant is legally authorized to enter in the Challenge); (c) that the submitted entry does not infringe any copyright, patent, or any other rights of any third party; and (d) that the contestant has the legal authority to assign and transfer to NIJ all necessary rights and interest (past, present, and future) under copyright and other intellectual property law, for all material included in the Challenge proposal that may be held by the contestant and/or the legal holder of those rights. Each contestant agrees to hold the Released Parties harmless for any infringement of copyright, trademark, patent, and/or other real or intellectual property right, which may be caused, directly or indirectly, in whole or in part, from contestant’s participation in the Challenge.

However, if you are using a unique machine learning algorithm that you or your team or business has the intellectual property rights to, that is acceptable.
Q-51: Are we "using" machine learning algorithms to predict crimes or "developing/creating" algorithms?
A: If you hold the intellectual property rights to an existing algorithm, you may use that algorithm. Otherwise, you would have to first develop and then use your own algorithm.
Q-52: Since not all forecast cells have the same area (thinking particularly about trimmed border cells), then following the definition in the original paper, PEI is not equal to n/n*. Using the original formulation, PEI is equal to: [ (n/N) / (a/A) ] / [ (n*/N) / (a*/A) ], where a* is the area of the post-hoc, "ground truth" hotspots. Could you clarify whether we will be judged according to the (n/n*) definition or according to the definition involving a*?​
A: Submissions will be judged by PEI* according to the (n/n*) definition as stated in the Challenge. The PEI* is based on measuring n*. Where n* is equal to the maximum number of CFS that could be forecast for that amount of area. The automated program to calculate PEI* does the following. It first calculates the "density" of crimes per unit (including trimmed cells) by dividing the number of CFS by the area. It then sorts the cells in descending order by density. It then takes cells in order of density and sums the area and the CFS. It does this until the area = a, or such that the next cell will be greater than a. In the latter case it calculates what fraction of that cell could be used (and the corresponding fraction of CFS to account for in that cell). This does make a minor assumption that the CFS in that final cell are evenly distributed. This is likely to make an extremely small error in the grand scale of the measure; however, this is the best approximation while allowing for trimming.
Q-52: I am a student working with a few other students on the Real-Time Crime Forecasting Challenge. Is it acceptable for each of us to submit independent submissions to the competition if we have worked together on the problem?
A: If you have worked together with other students who want to submit individual entries, there will need to a decision on who has the intellectual property rights behind the submission(s). If it is determined that the students collectively have the intellectual property rights then the entry should be submitted as a Small Team. Please refer to the "Intellectual Property" section of the Challenge.
Q-53: Table 1 Crime Category Definitions has several discrepancies from the actual data posted. Specifically, the following Codes are listed as OTHER in the data, but as one of the three named categories in Table 1: BURG, VEHST, ASSLT, DIST, ROB, SHOOT, STAB. For purposes of prediction, should those codes be included in OTHER (as shown in the data) or in one of the three named categories (as indicated by Table 1)?
A: Those codes should be included in the “Other” category. Table 1 has been updated to reflect this change. All codes that include “Cold” in the translation were originally to be included in the corresponding crime type category. However, after discussions with the Portland Police Bureau, NIJ determined that it was not advantageous to include those codes in forecasts of future crimes. Those codes have been re-categorized within the shapefile as “Other.” Applicants may use the cold case locations to aide in their forecasts but “Cold” calls will not be included in the scoring calculations for the individual categories.
Q-54: The database has several categories that do not appear in Table 1. Namely, the database includes THRETP and THRETW under Street Crimes, yet these are omitted from Table 1. For purposes of prediction, should those codes be included or not? 
A: Yes, THRETP and THRETW calls-for-service should be included in the “Street Crimes” category. Those codes were omitted in error and have been added to Table 1.
Q-55: Is a hexagon an allowable cell shape?
A: As long as the cell shape and resulting grid meet the minimum requirements, that shape is allowed.
Q-56: Do the small partial cells along the trim of the grid need to comply with the minimum cell height requirements?
A: No, the trimmed cell do not need to comport with the minimum cell requirements.
Q-57: In the data, what does the "cold" call mean?
A: A "cold call" is when an officer called in that they were working on an old report. Checking to see if a vehicle was a stolen vehicle, re-interviewing someone about an old case, etc.
Date Modified: July 31, 2017