A couple of weeks ago, David wrote a piece on Poverty Insights criticizing the Los Angeles City Homeless Services Authority (LAHSA) for failing to report margins of error around its estimate of the size of the Los Angeles Homeless population.
Most likely, as David argued, LAHSA failed to report the statistical error bounds around its estimate because doing so would have revealed that LAHSA could not statistically determine whether homelessness had increased or decreased since its last estimate, collected in 2009.
LAHSA felt pressure to show that its efforts led to a decrease in homelessness and, therefore, reported a point estimate that did exactly that. In so doing, the organization robbed homeless services providers of valuable data that can be used to provide better programming.
The reality is that reporting 100% accurate statistics on social phenomena is extremely difficult and is only advisable if the planning benefits outweigh the data collection costs. But quantifying our uncertainty can be useful.
In his book, “Identification for Prediction and Decision”, Charles Manski gives a relevant example. He refers to a study that randomly sampled members of the shelter population in Minneapolis one winter.
The study’s objective was to learn the probability that a randomly selected member of the homeless population would have secured housing six months later. Unfortunately, when it came time to do follow-up interviews, the researchers could only locate 64 of the 106 men originally sampled.
The researchers had no idea whether the 42 men who could not be located had exited homelessness or not. Given the situation, what could we say about probability of exiting homelessness?
Just like the exact number of homeless persons living in Los Angeles county in 2011, we can’t say what that probability is for sure. But we do know that the probability of exiting homelessness would be the lowest possible if none of the 42 men secured housing.
And the probability would be highest if all 42 men exited homelessness. Combined with the knowledge that 21 of the 64 men who were located did leave homeless, we know with certainty that the true probability of a man in this sample exiting homelessness must lie between 20% and 59%.
That’s a pretty big range. Is it useful?
It can be. Subject to sampling error and the time frame of the study, a homeless services agency operating in this area can expect that, under the current service regime, 20% of its clients will have exited homelessness in six months. The maximum improvement that a new initiative can achieve is to ensure that the other 80% secure housing. Bearing this in mind, the agency can trade off the maximum potential impact with the cost of the new intervention.
If the agency decides that it needs a better estimate of the probability of securing housing in six months, it can commission a new study that works harder to locate sample members six months after the initial survey. But, if the original study only reported a point estimate between 20% and 59% (say 40%), as LAHSA did, our agency would not know that it needed a new study.
This example shows that even simple studies yielding very imprecise estimates can provide useful content for programming and decision making. In order to unleash that content, we have to take statistics out of the realm of propaganda, where the LAHSA study is firmly ensconced, and into the program planning discussion.
Michael Gechter is the co-founder of Idealistics Inc. At Idealistics, Michael focuses on developing quantitative and computational methods to increase outcomes for social sector organizations.