Martin Schneider on Multiple Priors Preferences and Financial Markets
Martin Schneider is Professor of Economics at Stanford University. His research interests lie in Financial and Monetary Economics. Schneider’s RePEc/IDEAS entry.
The Ellsberg paradox suggests that people behave differently in risky situations — when they are given objective probabilities — than in ambiguous situations when they are not told the odds (as is typical in financial markets). Such behavior is inconsistent with subjective expected utility theory (SEU), the standard model of choice under uncertainty in financial economics. The multiple priors model of Gilboa and Schmeidler (1989) can accommodate ambiguity averse behavior in atemporal choice situations. This article reviews recent work that has extended the multiple priors model to accommodate intertemporal choice and learning, as well as work that has applied the model to portfolio choice and asset pricing. A more formal discussion, including a comparison of alternative models of ambiguity aversion, is available in Epstein and Schneider (2010).The multiple priors model is attractive for finance applications because it allows uncertainty to have first order effects on portfolio choice and asset pricing. As a result, its qualitative predictions are different from those of subjective expected utility theory in a way that helps understand observed asset positions. In particular, in contrast to SEU, the multiple priors model robustly generates selective participation in asset markets, optimal underdiversification, and portfolio inertia. In heterogeneous agent models, portfolio inertia can give rise to endogenous incompleteness of markets and the “freezing up” of markets in response to an increase in uncertainty.
The multiple priors model can also help to quantitatively account for position and price behavior that is puzzling at low levels of risk aversion. This is because multiple priors agents tend to choose more conservative positions, and, in equilibrium, command additional “ambiguity premia” on uncertain assets. When agents learn under ambiguity, the arrival of new data affects their confidence, with first order implications for asset demand. In equilibrium, changes in confidence — for example due to changes in information quality — will change uncertainty premia observed in markets.
The foundations for dynamic applications of the multiple priors model are by now relatively well understood. We also have tools for tractable modeling of intertemporal choice and learning. So far, however, quantitative dynamic applications of the theory have been largely confined to representative agent asset pricing. This is somewhat unfortunate: first order effects of uncertainty are particularly interesting in models that have nontrivial extensive margins, as discussed below. Building quantitative models of portfolio choice and market participation, as well as equilibrium trading and pricing, can thus be a fruitful area of research in the future.
2. The Ellsberg Paradox and Multiple Priors
Ellsberg’s (1961) classic experiments motivate the study of ambiguity. In a variant of one of his experiments, you are told that there are 100 balls in an urn, and that each ball is either red or blue. You are not given further information about the urn’s composition. Presumably you would be indifferent between bets on drawing either color (take the stakes to be 100 and 0). However, compare these bets with the risky prospect that offers you, regardless of the color drawn, a bet on a fair coin, with the same stakes as above. When you bet on the fair coin, or equivalently on drawing blue from a second risky urn where you are told that there are 50 balls of each color, then you can be completely confident that you have a 50-50 chance of winning. In contrast, in the original “ambiguous” urn, there is no basis for such confidence. This difference motivates a strict preference for betting on the risky urn as opposed to the ambiguous one.Such preference is incompatible with expected utility. Indeed, suppose you had in mind a subjective probability about the probability of a blue draw from the ambiguous urn. A strict preference for betting on the fair coin over a bet on a blue draw would then reveal that your probability of blue is strictly less than one half. At the same time, a preference for betting on the fair coin over a bet on a red draw reveals a probability of blue that is strictly greater than one half, a contradiction. It follows that Ellsberg’s choices cannot be rationalized by SEU.
When information is scarce and a single probability measure cannot be relied on to guide choice, it is intuitive that the decision maker thinks in terms of a set of probability laws. The multiple-priors model assumes that agents act as if they evaluate plans using a worst case probability belief drawn from a given set. For example, a decision maker might assign the interval [(1/3),(2/3)] to the probability of drawing a red ball from the ambiguous urn in the Ellsberg experiment. Being cautious, he might then evaluate a bet on red by using the minimum probability in the interval, here (1/3), which would lead to the strict preference to bet on the risky urn. Similarly for blue. In this way, the intuitive choices pointed to by Ellsberg can be rationalized.
Ellsberg type behavior violates the independence axiom of SEU. To see this, consider a lottery that promises either a bet on a red draw from the ambiguous urn or a bet on a blue draw from the ambiguous urn, each with probability one half. Such a bet is equivalent to a bet on the risky urn (or a fair coin). Ellsberg’s choices thus require strict preference for randomization between indifferent acts, whereas the independence axiom implies indifference between bets on the risky and ambiguous urns. In the Ellsberg choice situation, randomization can be valuable because it can smooth out, or hedge, ambiguity that is present in the bets on red or blue from the ambiguous urn.
The axiomatic foundations of the multiple priors model replace the independence axiom of SEU with two alternative axioms. First, uncertainty aversion refers to weak preference for randomization over indifferent plans. Second, it is assumed that randomization can be valuable only if it helps hedge ambiguity. In particular, the certainty independence axiom assumes that randomization with a constant — which provides no hedging — can never be valuable. GS show that those two axioms, together with other axioms typically imposed to derive SEU in an Anscombe-Aumann framework, imply a multiple priors representation of preferences. There are a number of other models of ambiguity aversion that also satisfy the uncertainty aversion axiom, but relax certainty independence. Those models do not share the feature that uncertainty is a first order concern of decision makers (see Epstein and Schneider (2010) for a detailed comparison of alternative models and their implications.)
3. Intertemporal Choice and Learning
Most applications to financial markets involve sequential choice. This motivated extending the GS model to intertemporal choice over contingent consumption plans. Epstein and Schneider (2003a) provide axiomatic foundations for a general updating rule for the multiple-priors model, an analog to Bayes’ rule. The key axioms are that (i) conditional preferences at every node in the decision tree satisfy (suitably adapted versions of) the Gilboa-Schmeidler axioms and (ii) conditional preferences at different nodes are connected by dynamic consistency. The main results are that (a) preferences satisfying the axioms can be represented by the multiple-priors model, where the belief set satisfies a restriction called rectangularity, (b) belief sets that represent preferences at later nodes in the decision tree can be updated from those at earlier nodes by applying Bayes’ rule “measure-by-measure” and (c) utility can be represented recursively so standard tools can be applied to solve optimization problems.Epstein and Schneider (2007) consider a model of learning about a memoryless mechanism, an analog to the Bayesian model of learning from conditionally iid signals. As a concrete illustration, consider repeated sampling from a sequence of Ellsberg urns. If the decision-maker perceives the urns to be identical, then after many draws with replacement he will naturally become confident that the observed empirical frequency of blue draws is close to a “true” fraction of blue balls in the urn that is relevant for forecasting future draws. Thus she will eventually become confident enough to view the data as an i.i.d. process. In this laboratory-style situation, one would expect ambiguity to be resolved over time.
More generally, suppose the decision maker believes the draws to be independent, but that he has no reason to be sure that the urns are identical. For example, if she is told the same about each urn but very little (or nothing at all) about each, then she would plausibly admit the possibility that the urns are not identical. In particular, there is no longer a compelling reason why data in the future should be i.i.d. with frequency of blue draws equal to the empirical frequency of blue draws observed in the past. Indeed, in contrast to a Bayesian, he may not even be sure whether the empirical frequencies of the data will converge, let alone expect his learning process to settle down at a single i.i.d. process.
One can view our model as an attempt to capture learning in such complicated (or vaguely specified and poorly understood) environments. The learning process has two distinctive properties. First, confidence changes together with beliefs as new data arrives. Formally, this is captured by a set of beliefs that expands or shrinks in response to new information. As a result, behavior that reflects a lack of confidence (such as the willingness to bet on the ambiguous urn) can become weaker or stronger over time. Second, ambiguity averse agents need not expect that they will ever learn a “true” iid process in the long run. Instead, they may reach a state of iid ambiguity, where learning ceases, but the data are still perceived as ambiguous. The transition to this state may resolve some, but not all initial ambiguity. A version of the multiple-priors model that captures iid ambiguity is studied in more detail in Epstein and Schneider (2003b), where we also provide a version of the law of large numbers.
4. Ambiguous information
Epstein and Schneider (2008) consider a special case of learning under ambiguity with uncertain signal quality. The idea is that, when quality is difficult to judge, investors treat signals as ambiguous. They do not update beliefs in standard Bayesian fashion, but behave as if they have multiple likelihoods in mind when processing signals. A thought experiment shows that the standard Bayesian measure of signal quality, precision, is no longer sufficient to describe signal quality. Moreover, ambiguity-averse behavior can be induced by poor information quality alone: An a priori lack of confidence is not needed.Ambiguous information quality has two key effects. First, after ambiguous information has arrived, agents respond asymmetrically: Bad news affect conditional actions — such as portfolio decisions — more than good news. This is because agents evaluate any action using the conditional probability that minimizes the utility of that action. If an ambiguous signal conveys good (bad) news, the worst case is that the signal is unreliable (very reliable). The second effect is that even before an ambiguous signal arrives, agents who anticipate the arrival of low quality information will dislike consumption plans for which this information may be relevant. This intuitive effect does not obtain in the Bayesian model, which precludes any effect of future information quality on current utility.
5. Portfolio choice and selective participation
In portfolio data, the extensive margin is important. Households and mutual funds do not typically hold diversified portfolios of all assets in the economy, as implied by the typical model with SEU preferences. SEU models that study the extensive margin have relied on ingredients such as per period fixed costs. However, quantitative work has shown that such frictions must be unreasonably large, especially if they are to explain selective participation by wealthy households (for an overview of evidence on participation, see Cao et al. (2007)).The reason why it is hard to generate nonparticipation in SEU models is that uncertainty is a second order concern. Indeed, expected utility implies local risk neutrality: in a standard frictionless portfolio choice problem with one riskless and one risky asset, it is always optimal to take a (perhaps small) position in the risky asset except if the expected returns on the two assets are exactly equal. In other words, nonparticipation is no more than a knife edge phenomenon.
Dow & Werlang (1992) showed that, with multiple priors, nonparticipation is a robust phenomenon. To see why, consider choice between one certain and one ambiguous asset, where ambiguity is captured by a range of mean expected returns. When an ambiguity averse agent contemplates going long in the ambiguous asset, he will thus evaluate the portfolio using the lowest expected return. In contrast, when contemplating a short position he will use the highest expected return. It follows that, if the interval of expected returns contains the riskless return, then it is optimal to invest all wealth in the riskless asset.
In the Dow-Werlang example, ambiguity averse agents exhibit portfolio inertia at certainty. Indeed, consider the response to a small shift in the range of expected returns. As long as the riskless rate remains inside the range, the portfolio position will not change. This is again in sharp contrast to the risk case, where the derivative of the optimal position with respect to shifts in the return distribution is typically nonzero. The key point here is that an increase in ambiguity can be locally “large” relative to an increase in risk. Indeed, a portfolio that contains only the certain asset is both riskless and unambiguous. Any move away from it makes the agent bear both risk and ambiguity. However, an increase in ambiguity about means is perceived like a change in the mean, and not like an increase in the variance. Ambiguity can thus have a first order effect on portfolio choice that overwhelms the first order effect of a change in the mean, whereas the effect of risk is second order.
Nonparticipation and portfolio inertia are important features of portfolio choice under ambiguity beyond the simple Dow-Werlang example. Portfolio inertia can arise away from certainty if agents can construct portfolios that hedge a source of ambiguity. Illeditsch (2009) shows how this can naturally arise in a model with inference from past data. Garlappi et al. (2007) characterize portfolio choice with multiple ambiguous assets. In particular, they show how differences in the degree of ambiguity across assets leads to selective participation. Bossaerts et al. (2010) and Ahn et al. (2009) provide experimental evidence that supports first order effects of uncertainty in portfolio choice.
6. Learning and portfolio dynamics
Epstein & Schneider (2007) and Campanale (2010) study dynamic portfolio choice models with learning, using the recursive multiple-priors approach. Investors learn about the mean equity premium, and treat stock returns as a sequence of conditionally independent ambiguous signals. Updating proceeds as in the urn example described above.Epstein and Schneider (2007) emphasize two new qualitative effects for portfolio choice. First, the optimal policy involves dynamic exit and entry rules. Indeed, updating shifts the interval of equity premia, and such shifts can make agents move in and out of the market. Second, there is a new source of hedging demand. It emerges if return realizations provide news that shift the interval of equity premia. Portfolio choice optimally takes into account the effects of news on future confidence.
The direction of hedging depends on how news affects confidence. In Epstein & Schneider (2007), learning about premia gives rise to a contrarian hedging demand if the empirical mean equity premium is low. Intuitively, agents with a low empirical estimate know that a further low return realization may push them towards nonparticipation, and hence a low return on wealth (formally this is captured by a U-shaped value function). To insure against this outcome, they short the asset.
The paper also shows that, quantitatively, learning about the equity premium can generate a significant trend towards stock market participation and investment, in contrast to results with Bayesian learning. The reason lies in the first order effect of uncertainty on investment. Roughly, learning about the premium shrinks the interval of possible premia and thus works like an increases in the mean premium, rather than just a reduction in posterior variance, which tends to be 2nd order.
Campanale (2010) builds a multiple priors model of learning over the life cycle. Investors learn from experience by updating from signals received over their life cycle. He calibrates the model and quantitatively evaluates its predictions for participation and investment patterns by age in the US Survey of Consumer Finances. In particular, he shows that the first order effects of uncertainty help rationalize moderate stock market participation rates and conditional shares with reasonable participation costs. In addition, learning from experience helps match conditional shares over the life-cycle.
7. Discipline in quantitative applications
In the portfolio choice examples above as well as in those on asset pricing below, the size of the belief set is critical for the magnitude of the new effects. There are two approaches in the literature to disciplining the belief set. Anderson et al. (2003) propose the use of detection error probability (see also Barillas et al. (2009) for an exposition). While those authors use detection error probabilities in the context of multiplier preferences, the idea has come to be used also to constrain the belief set in multiple-priors. For example, Sbuelz & Trojani (2008) derive pricing formulas with entropy-constrained priors. Gagliardini et al. (2008) show how to discipline such sets of priors by applying the idea of detection probabilities. The basic idea is to permit only beliefs that are statistically close to some reference belief, in the sense that they are difficult to distinguish from the reference belief based on historical data.A second approach to imposing discipline involves using a model of learning. For example, the learning model of Epstein & Schneider (2007) allows the modeler to start with a large set of priors in a learning model — resembling a diffuse prior in Bayesian learning — and then to shrink the set of beliefs via updating. A difference between the learning and detection probability approach is that in the former the modeler does not have to assign special status to a reference model. This is helpful in applications where learning agents start with little information, for example, because of recent structural change. In contrast, the detection probability approach works well for situations where learning has ceased or slowed down, and yet the true model remains unknown.
8. Representative agent asset pricing
Epstein & Wang (1994, 1995) first studied representative agent asset pricing with multiple priors (Chen & Epstein (2002) characterize pricing in continuous time). A key insight of this work paper is that asset prices under ambiguity can be computed by first finding the most pessimistic beliefs about a claim to aggregate consumption, and then pricing assets under this pessimistic belief. A second point is that prices can be indeterminate. For example, suppose there is an ambiguous parameter in the distribution of asset payoffs, but selecting a worst case belief about the consumption claim does not pin down the parameter. Then a whole family of prices is consistent with equilibrium.While indeterminacy is an extreme case, the more general point is that a small change in the distribution of consumption can have large asset pricing effects. Ambiguity aversion can thus give rise to powerful amplification effects. For example, Illeditsch (2009) shows how updating from ambiguous signals can give rise to amplification of bad news.
Epstein & Schneider (2008) consider the effect of learning, with a focus on the role of signals with ambiguous precision. They show that such signals induce an asymmetric response to news — bad news is taken more seriously than good news — and contribute to premia for idiosyncratic volatility as well as negative skewness in returns. Williams (2009) provides evidence that in times of greater uncertainty in the stock market the reaction to earnings announcements is more asymmetric.
Another key property of ambiguous signals is that the anticipation of poor signal quality lowers utility. As a result, a shock that lowers the quality of future signals can lower asset prices. In contrast, in a Bayesian setting the anticipation of less precise future signals does not change utility or prices as long as the distribution of payoffs has not changed. Epstein & Schneider (2008) use a quantitative model to attribute some of the price drop after 9/11 to the discomfort market participants felt because they had to process unfamiliar signals. These results are related to the literature on “information uncertainty” in accounting. For example, Autore et al. (2009) consider the failure of Arthur Anderson as an increase in (firm-specific) ambiguity about AA’s clients and document how the price effect of this shock depended on the availability of firm-specific information.
There are now a number of quantitative studies that apply the recursive multiple-priors model to different asset markets. Trojani & Vanini (2002) revisit the equity premium puzzle. Sbuelz and Trojani (2008) consider predictability of excess stock returns. Jeong et al. (2009) estimate a model of stock returns, also with an emphasis on time variation in equity premia. Drechsler (2008) studies the joint behavior of equity returns and option prices. Both Jeong et al. and Drechsler use a general specification of RMP with separate parameters for risk aversion and substitution as in Epstein & Zin (1989) and thus allow for the interaction of ambiguity and “long run risk”.
Ilut (2009) addresses the uncovered interest parity puzzle in foreign exchange markets using a model of regime switching under ambiguity. Gagliardini et al. (2008) and Ulrich (2009) consider the term structure of interest rates, focusing on ambiguity about real shocks and monetary policy, respectively. Boyarchenko (2009) studies credit risk in corporate bonds.
9. Heterogenous agent models of trading and valuation
Recent work has explored heterogeneous agent models where some agents have multiple-priors. Epstein & Miao (2003) consider an equilibrium model in which greater ambiguity about foreign as opposed to domestic securities leads to a home-bias. Several models center on portfolio inertia as discussed above. Mukerji & Tallon (2001) show that ambiguity can endogenously generate an incomplete market structure. Intuitively, if ambiguity is specific to the payoff on a security, then no agent may be willing to take positions in a security with sufficiently ambiguous payoffs. Mukerji & Tallon (2004) build on this idea to explain the scarcity of indexed debt contracts with ambiguity in relative prices. Easley & O’Hara (2009) consider the welfare effects of financial market regulation in models where multiple-priors agents choose in which markets to participate.A shock to the economy that suddenly increases ambiguity perceived by market participants can drive widespread withdrawal from markets, that is, a “freeze”. This is why the multiple-priors model has been used to capture the increase in uncertainty during financial crises (Caballero & Krishnamurthy 2008, Guidolin & Rinaldi 2009, Routledge & Zin 2009). Uhlig (2010) considers the role of ambiguity aversion in generating bank runs.
In heterogeneous agent models, prices generally depend on the entire distribution of preferences. An important point here is that if only some agents become more ambiguity averse, this may not increase premia observed in the market. The reason is that the more ambiguity averse group might leave the market altogether, leaving the less ambiguity averse agents driving prices (Trojani and Vanini 2004, Cao et al. 2005, Chapman & Polkovnichenko 2009, Ui 2009). Condie (2008) considers conditions under which ambiguity averse agents affect prices in the long run if they interact with SEU agents.
A number of papers have recently studied setups with ambiguity averse traders and asymmetric information. Condie & Ganguli (2009) show that if an ambiguity averse investor has private information, then portfolio inertia can prevent the revelation of information by prices even if there is the same number of uncertain fundamentals and prices. Ozsoylev & Werner (2009) and Caskey (2009) study the response of prices to shocks when ambiguity averse agents interact with SEU traders and noise traders. Mele & Sangiorgi (2009) focus on the incentives for information acquisition in markets under ambiguity.
Q&A: Fabio Canova on the Estimation of Business Cycle Models
Fabio Canova is Professor of Economics at Universitat Pompeu Fabra and Research Professor at ICREA. His research encompasses applied and quantitative macroeconomics, as well as econometrics. Canova’s RePEc/IDEAS entry.
EconomicDynamics: What is wrong with using filtered data to estimate a business cycle model?
Fabio Canova: The filters that are typically used in the literature are statistical in nature and they do not take into account the structure of the model one wants to estimate. In particular, when one employs statistical filters he/she implicitly assumes that cyclical and non-cyclical fluctuations generated by the model are located at different frequencies of the spectrum — this is what has prompted researchers to identify cycles with 8-32 quarters periodicities with business cycles. But a separation of this type almost never exists in dynamic stochastic general equilibrium (DSGE) models. For example, a cyclical business cycle model driven by persistent but stationary shocks will have most of its variability located in the low frequency of the spectrum, not at business cycle frequencies and all filters which are currently in used in the literature (HP, growth filter, etc.) wipe out low frequency variability — they throw away the baby with the water. Similarly, if the shocks driving the non-cyclical components have variabilities which are large relative to the shocks driving the cyclical component, the non-cyclical component may display important variability at cyclical frequencies — the trend is the cycle, mimicking Aguiar and Gopinah (2007).Thus, at least two types of misspecifications are present when models are estimated on filtered data: important low frequency variability is disregarded; the variability at cyclical frequencies is typically over-estimated. This misspecification may have serious consequences for estimation and inference. In particular, true income and substitution effects will be mismeasured, introducing distortions in the estimates many important parameters, such as the risk aversion coefficient, the elasticity of labor supply, and the persistence of shocks. The size of these distortions clearly depends on the underlying features of the model, in particular, on how persistent is the cyclical component of the model and how strong is the relative signal of the shocks driving the the cyclical and the non-cyclical components. One can easily build examples where policy analysis conducted with the estimated parameters obtained with filtered data can be arbitrarily bad.
This point is far from new. Tom Sargent, over 30 years ago, suggested that rational expectations models should not be estimated with seasonally adjusted data, because at seasonal frequencies (which are blanked out with seasonal adjustment filters) may contain a lot of non-seasonal information and, conversely, because the seasonal component of a model may have important cyclical implication (think about Christmas gifts: their production is likely to be spread all over the year rather than lumped just before Christmas). For the same reason we should not estimate models using data filtered with arbitrary statistical devices that do not take into account the underlying cross frequency restrictions the model imposes.
As an alternative, rather than building business cycle models (stationary models which are log-linearized around the steady state) and estimating them on filtered data, several researchers have constructed models which, in principle, can account for both the cyclical and non-cyclical portions of the data. For example, it is now popular in the literature to allow TFP or investment specific shocks to have a unit root while all other shocks are assumed to have stationary autoregressive structure; solve the model around the balanced growth path implied by the non-stationary shocks; filter the data using the balanced growth path implied by the model and then estimate the structural parameters of the transformed model using the transformed data. While this procedure imposes some coherence on the approach — a model consistent decomposition in cyclical and non-cyclical components is used — and avoids arbitrariness in the selection of the filter, it is not the solution to the question of how to estimate business cycle models using data which, possibly, has much more than cyclical fluctuations. The reason is that the balanced growth path assumption is broadly inconsistent with the growth esperience of both developed and developing countries. In other words, if we take data on consumption, investment and output and filter it using a balanced growth assumption, we will find that some of the filtered series will still display upward or downward trends and/or important low frequency variations. Since in the transformed model these patterns are, by construction, absent the reverse of the problem mentioned above occurs: the low frequency variability produced by the model is over estimated; the variability at cyclical frequencies underestimated. Once again income and substitution effects will be mismeasured and policy analyses coinducted with estimated parameters may be arbitrarily poor.
ED: Does this mean we should not draw stylized facts from filtered data?
FC: Stylized facts are summary statistics, that is a simple way to collapse multidimensional information into some economically useful measure, easy to report. I see no problem with this dimensionality reduction. The problems I can see with collecting stylized facts using filtered data are of two types. First, because filters act on the data differently, stylized facts may be function of the filter used. I have shown this many years ago (Canova, 1998) and I want to reemphasize that it is not simply the quantitative aspects that may be affected but also the qualitative ones. For example, the relative ordering of the variabilities of different variables may change if a HP or a growth filter are used. Since the notion of statistical filtering is not necessarily connected with the notion of model-based cyclical fluctuations, there is no special reason to prefer one set of stylized facts over another and what one reports is entirely history dependent (we use the HP filter because others have used it before us and we do not want to fight with referees about this). To put this concept in another way: none of the statistical filters one typically uses enjoys certain optimaility properties given the types of models we use in macroeconomics.Second, it is not hard to build examples where two different time series with substantially different time paths (say, one produced by a model and one we find in the data) may look alike once they are filtered. Thus, by filtering data, we throw away the possibility to recognize in which dimensions our models are imperfect description of the data. Harding and Pagan (2002, 2006) have forcefully argued that interesting stylized facts can be computed without any need of filtering. These include location and clusting of turning points, the length and the amplitude of business cycle phases, concordance measures, etc. Their suggestion to look at statistics that you can construct from raw data has been ignored by the profession at large (an exception here is some work by Sergio Rebelo) because existing business cycle models have hard time to replicate the asymmetries over business cycle phases that their approach uncovers.
ED: How should we think about estimating a business cycle model?
FC: Estimation of structural models is a very difficult enterprise because of conceptual, econometric and practical difficulties. Current business cycle models, even in the large scale version now used in many policy institutions, are not yet suited for full structural estimation. From a classical perspective, to estimate the parameters of the model we need it to be the data generating process, up to a set of serially uncorrelated measurement errors. Do we really believe that our models are real world? I do not think so and, I guess, many would think as I do. Even if we take the milder point of view that models are approximations to the real world, the fact that we can not precisely define the properties of the approximation error, makes classical estimation approaches usuited for the purpose. Bayesian methods are now popular, but my impression is that they are so because they deliver reasonable conclusions not because the profession truely appreciate the advantages of explicitly using prior information. In other words, what it is often called Bayesian estimation is nothing more than interval calibration — we get reasonable conclusions from structural estimation because the prior dominates.I think one of the most important insights that the calibration literature has brought to the macroeconomic profession is the careful use of (external) prior information. From the estimation point of view, this is important because one can think of calibration as a special case of Bayesian estimation when the likelihood has little information about the parameters and the prior has a more or less dogmatic format. In practice, because the model is often a poor approximation to the DGP of the data, the likelihood of a DSGE has typically large flat area (and in these areas any prior meaningful prior will dominate), sharp multiple peaks, cliffs and a rougged appearance (and a carefully centered prior may knock out many of these pathologies). Thus, while calibrators spend pages discussing they parameter selection, macroeconomists using Bayesian estimation often spend no more than a few lines discussing their priors and how they have chosen them. It is in this framework of analysis that the questions of how to make models and data consistent, whether filtered data should be used, whether models should be written to account for all fluctuations or only the cyclical ones should be addressed.
My take on this is that, apart for few notable exceptions, the theory is largely silent on the issues of what drives non-cyclical fluctuations, whether there are interesting mechanisms transforming temporary shocks into medium term fluctuations, whether non-cyclical fluctuations are distinct from cyclical fluctuations or not and in what way. Therefore, one should be realistic, and start from a business cycle model since at least in principle we have some ideas of what features a good model should display. Then, rather than filtering the data or tagging on to the model an arbitrary unit root process, one should specify a flexible format for the non-cyclical component, jointly estimate the structrual parameters and the non-structrual parameters of the non-cyclical component jointly using the raw data and let the data decide what it is cyclical and what it is not given the lenses of the model. Absent any additional information, estimation should be conducted using uninformative priors and should be precedeed by a careful analysis of the identification properties of the model. If external information of some sort is available it should be carefully documented and specified and the trade-off between sample and non-sample information clearly spelled out. The fact that a cyclical model once it is driven by persistent shocks implies that simulated time series have power at all frequencies of the spectrum can be formalized into a prior on the coefficients of the decision rules of the model and that could in turn be transformed into restrictions on the structural parameters using a change of variables approach. Del Negro and Schorfheide (2008) have shown how to do this formally, but any informal approach which takes that into consideration will go a long way in the right direction.
ED: There a spirited debate about the use of VARs to discriminate between business cycle model classes and shocks driving business cycles. What is your take on this debate?
FC: Since Chris Sims introduced structural VARs (SVARs) now almost 25 years ago, I have seen a paper every 4-5 years showing that SVARs have a hard time in recoving underlying structural models because of identification, aggregation, omitted variables, non-fundamentalness problems. Still SVARs are as popular as ever in the macroeconomic profession. The reason is, I think, twofold. First, the points made have been often over-emphasized and generality has been given to pathologies that careful researchers can easily spot.. Second, SVAR researchers are much more careful with what they are doing, many important criticisms have been into consideration and, in general, users of SVARs are much more aware of the limitations their analysis has.Overall, I do think SVARs can be useful tools to discriminate business cycle models and recent work by Dedola and Neri (2007) and Pappa (2009) show how they can be used to discriminate for example, RBC vs. New Keynesian transmission. But for SVARs to play this role, they need to be linked to classes of dynamic stochastic general equilibrium models currently used in the profession much better than it has been done in the past. This means that identification and testing restrictions should be explicitly derived from the models we use to organize our thoughts about a problem and should be robust, in the sense that they hold regardless of the values taken by the structural parameters and the specification of the details of the model. What I mean is the following. Suppose you start from a class of sticky price New Keynesian models and suppose you focus on the response of output to a certain shock. Then I call such a response robust if, for example, the sign of the impact response or the shape of the dynamic response is independent of the value of the Calvo parameter used and of whatever other feature I stick in the New Keynesian model in adidtion to sticky prices (e.g. habit in consumption, investment adjustment costs, capacity utilization, etc.). Robust restrictions, in general, take the form of qualitative rather than quantitative constraint — the latter do depend on the parameterization of the model and on the details of the specification — and often they involve only contemporaneous responses to shocks — business cycle theories are often silent about the time path of the dynamics adjustments. Once robust restrictions are found, some of them can be used for identification purposes and some to evaluate the general quality of the class of models. Restrictions which are robust to parameter variations but specific to certain members of the general class (e.g. dependent on which particular feature is introduced in the model) can be used in evaluating the importance of a certain feature.
Note that if such an approach is used, the representation problems discussed in the recent literature (e.g. Ravenna (2007) or Chari, Kehoe and McGrattan (2008)) have no bite, since sign restrictions on the impact response are robust to the misspecification of the decision rules of a structural model discussed in these papers. Furthermore, use of such an approach gives a measure of economic rather than statistical discrepancy and therefore can suggest theorists in which dimensions the class of models needs to be respecified to provide a better approximation to the data. Finally, the procedure allow users to test business cycle models without estimating directly structural parameters and I believe this is a definitively a plus, given the widespread identification problems existing in the literature and the non-standard form the likelihood function takes in the presence of mispecification. I have used such an approach, for example, to evaluate the general fit of the now standard class of models introduced by Christiano, Eichenbaum and Evans (2005) and Smets and Wouters (2007), to measure the relative importance of sticky price vs. sticky wages – something which is practically impossible to do with structural estimation because of identification problems – and to evaluate whether the addition of rule of thumb consumers helps to reconcile the predictions of the class of models and the data.