DENVER (AP) — Just last year, Denver police reportedly extended its contract with ShotSpotter, a $4.7 million deal for an additional five years. A new report from the Associated Press unveils what the company’s employees are doing behind the scenes that could be overriding its artificial intelligence.
In more than 140 cities across the United States, ShotSpotter’s artificial intelligence algorithm and intricate network of microphones evaluate hundreds of thousands of sounds a year to determine if they are gunfire, generating data now being used in criminal cases nationwide.
But a confidential ShotSpotter document obtained by The Associated Press outlines something the company doesn’t always tout about its “precision policing system” — that human employees can quickly overrule and reverse the algorithm’s determinations, and are given broad discretion to decide if a sound is a gunshot, fireworks, thunder or something else.
Such reversals happen 10% of the time by a 2021 company account, which experts say could bring subjectivity into increasingly consequential decisions and conflict with one of the reasons AI is used in law-enforcement tools in the first place — to lessen the role of all-too-fallible humans.
“I’ve listened to a lot of gunshot recordings — and it is not easy to do,” said Robert Maher, a leading national authority on gunshot detection at Montana State University who reviewed the ShotSpotter document. “Sometimes it is obviously a gunshot. Sometimes it is just a ping, ping, ping. … and you can convince yourself it is a gunshot.”
Marked “WARNING: CONFIDENTIAL,” the 19-page operations document spells out how employees in ShotSpotter’s review centers should listen to recordings and assess the algorithm’s finding of likely gunfire based upon a series of factors that may require judgment calls, including whether the sound has the cadence of gunfire, whether the audio pattern looks like “a sideways Christmas tree” and if there is “100% certainty of gunfire in reviewer’s mind.”
ShotSpotter said in a statement to the AP that the human role is a positive check on the algorithm and the “plain-language” document reflects the high standards of accuracy its reviewers must meet.
“Our data, based on the review of millions of incidents, proves that human review adds value, accuracy and consistency to a review process that our customers—and many gunshot victims—depend on,” said Tom Chittum, the company’s vice president of analytics and forensic services.
Chittum added that the company’s expert witnesses have testified in 250 court cases in 22 states, and that its “97% aggregate accuracy rate for real-time detections across all customers” has been verified by an analytics firm the company commissioned.
Another part of the document underscores ShotSpotter’s longstanding emphasis on speed and decisiveness, and its commitment to classify sounds in less than a minute and alert local police and 911 dispatchers so they can send officers to the scene.
Titled “Adopting a New York State of Mind,” it refers to New York Police Department’s request of ShotSpotter to avoid posting alerts of sounds as “probable gunfire” — only definitive classifications as gunfire or non-gunfire.
“End result: It trains the reviewer to be decisive and accurate in their classification and attempts to remove a doubtful publication,” the document reads.
Experts say such guidance under tight time pressure could encourage ShotSpotter reviewers to err in favor of categorizing a sound as a gunshot, even if some evidence for it falls short, potentially boosting the numbers of false positives.
“You’re not giving your humans much time,” said Geoffrey Morrison, a voice-recognition scientist based in Britain who specializes in forensics processes. “And when humans are under great pressure, the possibility of mistakes is higher.”
ShotSpotter says it published 291,726 gunfire alerts to clients in 2021. That same year, in comments to AP appended to a previous story, ShotSpotter said more than 90% of the time its human reviewers agreed with the machine classification but the company invested in its team of reviewers “for the 10% of the time where they disagree with the machine.” ShotSpotter did not respond to questions on whether that ratio still holds true.
ShotSpotter’s operations document, which the company argued in court for more than a year was a trade secret, was recently released from a protective order in a Chicago court case in which police and prosecutors used ShotSpotter data as evidence in charging a Chicago grandfather with murder in 2020 for allegedly shooting a man inside his car. Michael Williams spent nearly a year in jail before a judge dismissed the case because of insufficient evidence.
Evidence in Williams’ pretrial hearings showed ShotSpotter’s algorithm initially classified a noise picked up by microphones as a firecracker, making that determination with 98% confidence. But a ShotSpotter reviewer who assessed the sound quickly relabeled it as a gunshot.
The Cook County Public Defender’s Office says the operations document was the only paperwork ShotSpotter sent in response to multiple subpoenas for any guidelines, manuals or other scientific protocols. The publicly traded company has long resisted calls to open its operations to independent scientific scrutiny.
Fremont, California-based ShotSpotter acknowledged to AP it has other “comprehensive training and operational materials” but deems them “confidential and trade secret.”
ShotSpotter installed its first sensors in Redwood City, California, in 1996, and for years relied solely on local 911 dispatchers and police to review each potential gunshot until adding its own human reviewers in 2011.
Paul Greene, a ShotSpotter employee who testifies frequently about the system, explained in a 2013 evidentiary hearing that staff reviewers addressed issues with a system that “has been known from time to time to give false positives” because “it doesn’t have an ear to listen.”
“Classification is the hardest element of the process,” Greene said in the hearing. “Simply because we do not have … control over the environment in which the shots are fired.”
Greene added that the company likes to hire ex-military and former police officers familiar with firearms, as well as musicians because they “tend to have a more developed ear.” Their training includes listening to hundreds of audio samples of gunfire and even visits to rifle ranges to familiarize themselves with the characteristics of gun blasts.
As cities have weighed the system’s promise against its price tag — which can reach $95,000 per square mile per year — company employees have explained in detail how its acoustic sensors on utility poles and light posts pick up loud pops, booms or bangs and then filter the sounds through an algorithm that automatically classifies whether they’re gunfire or something else.
But until now, little has been known about the next step: how ShotSpotter’s human reviewers in Washington, D.C., and the San Francisco Bay area decide what is a gunshot versus any other noise, 24 hours a day.
“Listening to the audio downloads are important,” according to the document written by David Valdez, a former police officer and now-retired supervisor of one of ShotSpotter’s review centers. “Sometimes the audio is compelling for gunfire that they may override all other characteristics.”
One part of the decision-making that has changed since the document was written in 2021 is whether reviewers can consider if the algorithm had a “high confidence” the sound was a gunshot. ShotSpotter said the company stopped showing the algorithm’s confidence rating to reviewers in June 2022 “to prioritize other elements that are more highly correlated to accurate human-trained assessment.”
ShotSpotter CEO Ralph Clark has said that the system’s machine classifications are improved by its “real-world feedback loops from humans.”
However, a recent study found humans tend to overestimate their abilities to identify sounds.
The 2022 study published in the peer-reviewed journal Forensic Science International looked at how well human listeners identified voices compared to voice-recognition tools. It found all the human listeners performed worse than the voice system alone, saying the findings should lead to the elimination of human listeners in court cases whenever possible.
“Would that be the case with ShotSpotter? Would the ShotSpotter system plus the reviewer outperform the system alone?” asked Morrison, who was one of seven researchers who conducted the study.
“I don’t know. But ShotSpotter should do validation to demonstrate that.”
Burke reported from San Francisco.