Say I tell you that a parade has 25 floats. You watch 10 floats go by, and I ask you what portion of the parade you’ve seen. Well, that’s easy. 40%.
Say you’re at another parade (we love festivities), and you watch 10 floats go by. Again, I ask you what portion of the parade you’ve seen. However, I don’t tell you how many total floats there are. This time, it’s not possible to get the exact percentage, unless you get lucky. However, you can get close to the answer, depending on what other information is available for you to infer from.
For example, say you know that the average parade has 100 floats. Then you might guess 10%. If you know that parades with small floats tend to have more floats than parades with large floats, you might be able to adjust your guess. You might even use the fact that the other parade had 25 floats to inform your decision.
You can do even better if you have the ability to keep track of some section of the population whose size you’re guessing. Say we go into the woods one year, and capture a bunch of deer—fifty, to be exact. We tag each one that we capture, and come back next year. This year, out of fifty deer we capture, 40 have tags and 10 are new to this whole process. We now have information about the deer we haven’t seen at all—we’ve likely seen the majority of the deer there are to see, otherwise we’d have seen far fewer tagged deer in this go around.
This is actually a fairly surprising fact, on the face of it. We’ve learned not only about the presence of things we haven’t interacted with whatsoever, but we’ve also learned approximately how many of them there are.
Since 80% of the deer are ones we’ve seen before, and we tagged fifty deer, we might estimate that there are about 62.5 deer in total. The assumptions going into this don’t have to be perfect for this model be useful. For example, what if deer that we tagged are just slower, and so are more likely to be caught again? We’d also expect some amount of deer shrinkage—tagged deer dying, or just moving to other woods. We can either adjust for these in ways that seem reasonable, or acknowledge them as flaws in what our model captures, or simply represent them in terms of uncertainty in the outcome.
With more years of accumulated data, we can eventually get a very good handle on exactly how many deer exist in these woods, even though we never see the entire population in any given year. We may never see some lucky deer at all, although we can infer they exist with great probability.
Our yearly deer-counting journey continues without a hitch, until one year we discover something bizarre. We find a camodeer, a certain kind of deer which, like a chameleon, blends in with its environment almost perfectly. Even in the many years after finding this camodeer, we never find another one.
This is an example of the kind of object we can’t statistically infer the presence of. If there’s some entire class which, for whatever reason, evades our observing, there’s no good way for us to guess about its presence ahead of time. Our guess of 62.5 deer may have completely ignored the thousands of camodeer that we just never got to observe.
Even after observing this camodeer, and proving that such things exist, it’s hard to know how many others there might be. Could this be the only one, a genetic fluke? Or might there be herds of thousands, all expertly hidden from view?
So we know about things we can’t see, but can infer the presence of, and of course things we can’t see, so can’t infer anything about. What about things we can see, but are uninformative in some special way?
In a similar way to knowing how many deer we’ve seen out of a population, we know about how many asteroids we’ve seen out of the total near us in space. As of this writing, we’ve found just over 900 near-earth asteroids with diameters larger than 1km.1 With our abilities to track them, we also know that any real risk of an asteroid impact on earth sometime soon would come from the asteroids we don’t know about. While the space near earth may seem crowded (thousands of objects!), remember that space is big, trajectories are predictable, and an object can only occupy one position along its trajectory at once.
Let’s imagine not that there’s a camo-asteroid, but that there’s some kind of asteroid we could detect if it were near earth, but not if it were farther away. Maybe some kind of lone space traveler, that if it misses hitting anything in our solar system, would then zoom off into deep space, never to be seen again.
Unlike the camodeer, the chances are high that we see one of these deep space asteroids if it comes close to us. However, its presence doesn’t necessarily imply the presence or absence of other deep space asteroids, since each time one of them passes by, it’s a kind of one-time event.
This (very hypothetical) setup means that there could be some untracked object hurtling towards us, and if we ever became reasonably confident we “knew all the nearby asteroids”, and turned our telescopes to other activities, we might miss it.
I’ve tried to raise the stakes here higher than mere deer-counting for a reason. How can we be sure that we’ve seen all there is to see, when we haven’t seen everything? What would make us confident enough to turn off the telescopes (assuming there were some at-least-mildly appealing reason to do so)?
In one fundamental sense, we can’t ever know what we don’t know. There could be a freak accident of nearly any kind that we just didn’t see come up in the data before. However, this is pretty unfulfilling. Just as there are reasons to suspect the presence of unseen deer, there can be reasons to suspect the absence of other objects based on what we’ve seen. We can’t say the risk is zero, but we can certainly bound risk to some reasonably low standard. There are ways to count on the lack of objects we can’t ever see.