Identification: What is it?
Identification and identifiable are two commonly used words in Statistics and Economics. But, have you ever wondered, what does it really mean to say that a quantity is identifiable from the data? Statisticians seem to agree on a definition in the context of parametric models — calling a parameter identifiable if distinct values of the parameter correspond to distinct members of a parametric family. Intuitively, this means that if we had an infinite amount of data, we could learn the actual model parameters used to generate the data — it’s kinda neat!
So this definition makes sense for a parametric model, but what about nonparametric models? And what if you don’t have an explicit model that you want to use? For instance, in the context of an experiment or an observational study, we might want to ask whether the average treatment effect of an intervention is identifiable from the data. Unfortunately, the simple definition of parametric identifiability does not apply if we take a nonparametric or randomization-based approach.
This post will build on the parametric intuition to describe a more general notion of identifiability that works for all types of settings. Our goal is to explain the definition and provide examples of how to apply it in a range of different contexts. The next few posts in the series will dive deeper into the role of identification in the causal context.
A general notion of identifiability
Basic framework
The identification framework we will consider consists of three elements:
- A statistical universe that contains all of the objects relevant to a given problem .
- An estimand mapping that describes what aspect of the statistical universe we are trying to learn about .
- An observation mapping that tells us what parts of our statistical universe we observe .
We then define identification by studying the inherent relationship between the estimand mapping and the observation mapping using the induced binary relation . Intuitively, the induced binary relation connects “what we know” to “what we are trying to learn” through the “statistical universe” in which we operate.
Binary relation interlude: If you are not familiar with binary relations, or it’s been a while since you’ve seen them, they are basically a generalization of a function. Mathematically, a binary relation from a set to is a subset of the cartesian product . For and , we say that is -related to if and write .
For example, let be the set of prime numbers, be the set of integers, and the “divides” relation such that if divides (e.g., , , but 3 is not in relation with 2).
Definition of Identification
Specifying the three elements of the framework for a given problem gives rise a special binary relation.
- Definition (Induced binary relation)
- Consider and , where is the set of all function with domain and let and . We define the binary relation induced by as the subset .
With this in place, we say that , the estimand mapping, is identifiable from if the induced binary relation is injective. In words, if there is a 1-1 relationship between what we are trying to estimate and what we observe, then the estimand mapping is identifiable from the observation mapping. This can be formalized mathematically as follows:
- Definition (Identifiability)
- Consider and as in
the previous definition. For a given , let
. The
function is said to be -identifiable at
if there exists , such
that, for all , we have that
.
The function is said to be -identifiable everywhere from if it is -identifiable at for all . We will usually simply say that is identifiable.
This definition might seem really abstract, but it’s exactly the intuitive definition we gave in the introduction. It says that if the part of the statistical universe that is coherent with the observed data uniquely corresponds to a single estimand of interest — i.e. — then that estimand is identifiable!
Examples
Below we describe how the above definition applies to parametric and nonparametric models as well as finite population settings.
Parametric models
Consider a parametric model where is a distribution indexed by .
The statistical universe is then . Notice that it contains both the model and the parameters that determine the model.
The estimand mapping is . This defines the quantity of interest: in the simplest case, it is just the parameters of the model but it can also be any function or subset of the parameters — i.e. .
The observation mapping is . This is simply the model for a given parameter.
Remember, identification is about understanding the limits of our estimation with an infinite amount of data, and so the observations correspond exactly to the full distribution. If , the induced binary relation reduces to a simple function and the abstract definition we gave above is exactly equivalent to the textbook definition of identification for parametric statistical models.
- Linear Regression Example
- Consider a -dimensional random vector for some
distribution .
Let , where
is the normal distribution with mean
and variance , and let .
Example question: Are the regression parameters and identifiable?
Identification setup: We can study the identifiability of the parameter from the joint distribution using our framework, by letting , where , , and . In this case, the induced binary relation is reduces to a function and so is identifiable iff the function is injective. It is easy to verify that this is the case if and only if the matrix has full rank. If in contrast, we take , then the induced binary relation is no longer a function, but our general definition of identifiability still applies.
Nonparametric identification
Applying the mathematical definition of identification to nonparametric models is relatively straight forward.
- Missing Data Example
- If is a random variable representing a response of interest, let
be a missing data indicator that is equal to if the response
is observed, and otherwise; that is, the data we actually observe
is drawn from .
Example question: Is the distribution of the missing outcomes identifiable from that of the observed outcomes combined with that of the missing data indicator ?
Identification setup: Let be a family of joint distributions for and , and define , and . The question can be answered by studying the injectivity of the induced mapping . We let that question unanswered for now, but will revisit it in a subsequent installment of this series.
Notice that the same definition worked in both nonparametric and parametric models without having to be adjusted or tweaked.
Finite population identification
In the previous two sections, we assumed an infinite amount of data—that is the usual starting point for identification. But what do you do if you plan to take a finite population perspective? Clearly, the assumption that you have an infinite amount of data doesn’t make sense! It turns out that we can again use the mathematical definition of identification by defining the appropriate statistical universe, estimand mapping, and observation mapping.
- Causal Inference Example
- With units, let each unit be assigned to one of two treatment interventions, for treatment and for control. Under the stable unit treatment value assumption each unit has two potential outcomes and , corresponding to the outcome of unit under treatment and control, respectively. For each unit , the observed outcome is . Let and be the vectors of potential outcomes and .
Example question: Is identifiable from the observed data .
Identification setup: Let be the set of all possible values for , be the set of all possible values for , and .
Take and as the estimand and observation mapping, respectively. The question is then answerable by studying the injectivity of the induced binary relation .
Once again, we will revisit this example in more details in a future post — our goal here is simply to show how identification questions can be formulated in a unified fashion within a simple framework.
Further reading
We dive deeper into some of the formalism behind identification in a recent paper. A lot of the pioneering work on identifiability was done by the economist Charles Manski: see for instance his 2009 monograph ‘Identification for prediction and decisions’ for a very accessible introduction. For a recent in-depth survey of the area, see Lewbel 2019: ‘The identification Zoo’. Finally, as mentioned in the introduction, this is the first installment in a series dedicated to identifiability — stay tuned for the next installments!
References
Lewbel, A. (2019). The identification zoo: Meanings of identification in econometrics Journal of Economic Literature.
Manski, C. (2009) Identification for prediction and decisions. Harvard University Press.