# Valuing New Goods in a Model with Complementarity: Online Newspapers

←

**Page content transcription**

If your browser does not render page correctly, please read the page content below

Valuing New Goods in a Model with Complementarity: Online Newspapers By MATTHEW GENTZKOW* Many important economic questions hinge on the extent to which new goods either crowd out or complement consumption of existing products. Recent methods for studying new goods rule out complementarity by assumption, so their applicability to these questions has been limited. I develop a new model that relaxes this restriction, and use it to study competition between print and online newspapers. Using new micro data from Washington, DC, I estimate the relationship between the print and online papers in demand, the welfare impact of the online paper’s introduction, and the expected impact of charging positive online prices. (JEL C25, L11, L82) The effect of new goods on demand for ex- cited Business Week article anticipated that isting products is often uncertain. Convinced computers would create a “paperless office.” that radio broadcasts were crowding out music Instead, the spread of information technology sales, record companies in the 1920s waged a has sharply increased consumption of paper series of court battles demanding high royalties (Abigail J. Selen and Richard H. R. Harper for songs, leading some networks to stop play- 2002). Debate continues in the economics liter- ing major-label music altogether (Christopher ature about the relationships between free file- H. Sterling and John M. Kittross 2001, 214; sharing services and recorded music (Alejandro Paul Starr 2004, 339). It soon became apparent, Zentner 2003; David Blackburn 2004; Felix however, that radio airplay dramatically in- Oberholzer and Koleman Strumpf 2007; Rafael creased record sales, and by the 1950s record Rob and Joel Waldfogel 2004), file-sharing ser- companies were paying large bribes to get their vices and live concerts (Julie Holland Mortimer songs onto disk jockeys’ playlists (Sterling and and Alan Sorensen 2005), public and private Kittross 2001, 294).1 More recently, a much- broadcast channels (Steven Berry and Waldfo- gel 1999; Andrea Prat and David Stromberg 2005), and online and offline retailing (Austan * Graduate School of Business, University of Chicago, Goolsbee 2001; Todd Sinai and Waldfogel 5807 South Woodlawn Avenue, Chicago, IL 60637 (e-mail: 2004). gentzkow@chicagogsb.edu). I would like to offer special thanks to Bob Cohen and Jim Collins of Scarborough Re- Measuring the impact of new goods in such search for giving me access to the data for this study. I thank settings is important for several reasons. First, it two anonymous referees for insightful comments. I am also directly affects firm decisions. A record compa- grateful to John Asker, Richard Caves, Gary Chamberlain, ny’s decision to start licensing music for sale Karen Clay, Liran Einav, Gautam Gowrisankaran, Ulrich online, a publisher’s decision to sell the film Kaiser, Larry Katz, Julie Mortimer, Jesse Shapiro, Andrei Shleifer, Minjae Song, and especially to Ariel Pakes for rights to a novel, a discount retailer’s decision advice and encouragement. Jennifer Paniza provided out- to open a new line of more upscale stores, and standing research assistance. I thank the Social Science many other choices about entry, product posi- Research Council and the Centel Foundation/Robert P. Re- tioning, and pricing depend critically on the uss Faculty Research Fund at the University of Chicago Graduate School of Business for financial support. demand-side relationships between new and old 1 A similar example concerns the introduction of movies. products. Estimating these relationships is thus An 1894 article in Scribners predicted that the availability of motion-picture and audio versions of novels would lead to the disappearance of printed books (Octave Uzanne 1894). Today, film adaptations and novels are widely per- companied by order-of-magnitude increases in sales of the ceived to be complements, and film releases are often ac- associated book (Kera Bolonik 2001). 713

714 THE AMERICAN ECONOMIC REVIEW JUNE 2007 important for both firms themselves and econ- ship between each pair of products to be omists seeking to understand firm behavior. freely estimated from the data. Second, new goods are a major component of Second, I apply the model to study the impact increases in the standard of living, and their of online newspapers, a good whose relation- omission is a leading source of bias in standard ship with affiliated print newspapers has been price indices (Timothy F. Bresnahan and Robert hotly debated.4 I estimate the model using new J. Gordon 1997). Correcting these biases re- individual-level data on the print and online quires accurate estimates of the effect of new newspaper readership of consumers in Wash- goods on consumer welfare, which cannot be ington, DC, and look at the interaction among constructed without knowing the relevant de- the Washington Post, the Post’s online edition mand elasticities. Finally, the degree of substi- (the post.com), and the city’s competing daily tutability between old and new products is an (the Washington Times). I then use the fitted important input to many policy debates, includ- model to ask whether the print and online news- ing those surrounding cable price regulation papers are substitutes or complements, and how (Goolsbee and Amil Petrin 2004), deregulation the introduction of online news has affected the of local phone markets (Robert G. Harris and C. welfare of consumers and newspaper firms. I Jeffrey Kraft 1997), and the allowability of also address a question of immediate interest to cross-media mergers (Federal Communications firms: how profits would change if they were to Commission 2001). charge positive prices for online content that is This paper has two goals. First, I extend currently free. existing techniques for estimating the impact A central empirical challenge in evaluating of new goods to allow for the possibility that the impact of a new good is separating true goods could be either substitutes or comple- substitutability or complementarity of goods ments. Although a large recent literature stud- from correlation in consumer preferences. ies the effect of new goods,2 it has been built Observing that frequent online readers are on discrete-choice demand models whose also frequent print readers, that file sharers starting assumption is that consumers choose buy more CDs, or that computer users con- exactly one product from the set available.3 sume large volumes of paper might be evi- This means that all goods are restricted a dence that the products in question are priori to be perfect substitutes at the individ- complementary. It might also reflect the fact ual level. Although this is a reasonable that unobservable tastes for the goods are starting point for looking at demand for au- tomobiles or satellite television, it makes these techniques inappropriate for cases such 4 According to the Wall Street Journal, “Newspaper as those described above, where the degree executives are increasingly debating whether free Web ac- cess [to their papers’ content] is siphoning off readers from of substitutability or complementarity among their print operations” (Mike Esterl, “New York Times Sets products is a key parameter of interest. The an Online Fee,” Wall Street Journal, May 17, 2005). See new discrete demand model I develop permits also Leslie Walker, “News Groups Wrestle with Online consumers to choose multiple goods simulta- Fees,” Washington Post, May 26, 2005; Katharine Q. neously and allows the demand-side relation- Seelye, “Can Papers End the Free Ride Online?” New York Times, March 14, 2005; and Julia Angwin and Joseph T. Hallinan, “Newspaper Circulation Continues Decline, Forc- ing Tough Decisions,” Wall Street Journal, May 2, 2005. 2 See, for example, Jerry A. Hausman (1997) on the Others have argued that an online edition need not crowd effect of Apple Cinnamon Cheerios, Shane M. Greenstein out its affiliated print edition and could even complement it (1997) on the effect of PCs, Petrin (2002) on the effect of (Rob Runnett 2001, 2002). The print-online relationship has minivans, and Goolsbee and Petrin (2004) on the effect of been central to the debate surrounding online pricing: “A direct broadcast satellites. big part of the motivation for newspapers to charge for their 3 Several existing papers do estimate discrete choice online content is not the revenue it will generate, but the models in which consumers can choose multiple goods. The revenue it will save, by slowing the erosion of their print model developed here differs by allowing goods to range subscriptions” (Seelye 2005). The print-online relationship freely from substitutes to complements, and also allowing a also looms large in the debate about the long-run viability of flexible form of unobserved consumer heterogeneity. Exist- print newspapers (Dan Okrent 1999; Gates 2000; David ing models and their relationship to the present model are Henry, “Is Buffet too Quick to Write off Newspapers?” USA discussed in detail below. Today, May 4, 2000).

VOL. 97 NO. 3 GENTZKOW: PRINT AND ONLINE NEWSPAPERS 715 correlated—for example, that some consum- make welfare statements, we also need to know ers just have a greater taste for news or music how consumers trade off these utils of news con- overall. In the first section below, I analyze sumption against dollars. This would be straight- this identification problem in the context of a forward to estimate if we could observe how simple two-good model. I show that the key demand responds to exogenous variation in prices. elasticities are unidentified with data on con- I propose an alternative strategy that exploits in- sumer choices and characteristics alone. I formation from the supply-side of the market and then point out two natural sources of addi- is valid in the absence of price variation. It is tional information that can aid identification. based on a simple observation: the less sensitive The first is variables that can be excluded a consumers are to prices, the higher the price a priori from the utility of one or more goods. profit-maximizing firm would set for its products. In many settings, price is the obvious candi- Given observable data on marginal costs and ad- date. The identification argument is also valid vertising revenue, and a model of the firm’s ob- for nonprice variables, however, and so can jective function, I can therefore calculate the value be applied where prices do not vary or where of the price elasticity that would equate the profit- the variation is not exogenous. This is the maximizing price of the print newspaper with the case in the newspaper market I study, where price we actually observe.6 This strategy depends the price of the online paper is zero through- on strong assumptions about the form of the firm’s out the sample. In the estimation, I exploit profit function, as well as the accuracy of the variables, such as whether consumers have observed cost data. But sensitivity analysis con- Internet access at work or a fast connection at firms that the qualitative conclusions are robust to home, which shift the utility of the online reasonable alternative assumptions. edition without affecting the utility of the The results show that properly accounting for print edition.5 The second potential source consumer heterogeneity changes the conclu- of identification is panel data. If correlated sions substantially. Both reduced-form OLS re- unobservables such as taste for news are gressions and a structural model without constant for a given consumer over time, ob- heterogeneity suggest that the print and online serving repeated choices by the same con- editions of the Post are strong complements, sumer can allow us to separate correlation and with the addition of the post.com to the market complementarity. For example, a consumer increasing profits from the Post print edition by who views the content of two papers as com- $10.5 million per year. In contrast, when I es- plementary would tend to read both of them timate the full model with both observed and on some days and neither on other days. A unobserved heterogeneity, I find that the print news junkie who views the papers as substi- and online editions are significant substitutes. I tutes, on the other hand, would also read both estimate that raising the price of the Post by with high frequency, but would be more $.10 would increase post.com readership by likely to read them on alternate days. In the about 2 percent, and that removing the post.com application, I have data on which newspapers from the market entirely would increase read- consumers read in the last 24 hours, and also ership of the Post by 27,000 readers per day, or in the last five weekdays, a limited form of 1.5 percent. The estimated $33.2 million of rev- panel data I exploit in the estimation. enue generated by the post.com comes at a cost A further challenge is how to translate the util- of about $5.5 million in lost Post readership. ity estimates from the demand model into dollars. For consumers, the online edition generated a Intuitively, data on consumer choices (combined per-reader surplus of $.30 per day, implying a with exclusion restrictions and panel data) allow total welfare gain of $45 million per year. us to estimate how consuming one good affects The model also informs the debate about the the marginal utility of consuming another. To sustainability of free online content (see foot- note 4). I take two approaches to this question. 5 Zentner (2003) also uses broadband connections as a 6 shifter of Internet use in studying the impact of file sharing Howard Smith (2004) uses a related technique in study- on music sales. ing consumer shopping behavior.

716 THE AMERICAN ECONOMIC REVIEW JUNE 2007 The first is to assume that the Post Company fication I exploit constitutes an ideal natural ex- may be setting the price of the online edition periment. Taken together, however, they provide a suboptimally, and ask whether profits could be substantial improvement on the information avail- increased by charging positive prices.7 I find able in the raw data, lead to sharply different that, for the period under study, the optimal conclusions than would be obtained from naive price is indeed positive, at $.20 per day, and that analysis, and allow us to make progress in under- the loss from charging the suboptimal price of standing a market where the lack of price variation zero is about $8.8 million per year. The second limits the applicability of standard tools. approach is to suppose that the zero price is The next section analyzes the general prob- optimal and ask how large transactions costs lem of identifying substitution patterns in a would have to be to rationalize it. I show that a discrete demand model with multiple choices, zero price would be optimal for any transaction and provides a brief discussion of related discrete- cost greater than or equal to $.13 per day. I also choice methods. Section II introduces the data show that because of growth in online advertis- and presents reduced-form results on the re- ing demand, the gain to raising online prices lationship between print and online demand. was virtually eliminated by 2004. This suggests Section III specifies the empirical model and that the zero price may have been part of a estimation strategy, Section IV presents the rational forward-looking strategy and is approx- results, and Section V concludes. imately optimal today. Estimating a structural model of the newspa- I. Substitution Patterns and Identification per market is not, of course, the only possible approach to studying the impact of online news- A. An Illustrative Model papers. I show below that valuable information can be gleaned by looking at both time series of In this section, I use a simple example to ex- aggregate newspaper circulation and reduced- amine identification of substitution patterns in a form regressions using micro data.8 There are discrete-choice setting where consumers can two major benefits to estimating the complete choose multiple goods. Suppose there are two model, however.9 First, because the model is goods, labeled A and B, and that consumers can derived from utility maximization, it takes on choose at most one unit of each. We observe the all of the restrictions implied by consumer the- choices of a large population of consumers. For ory. This means that the estimated parameters simplicity, I will not write the dependence of the can be used to calculate welfare effects. It also model on observable characteristics, assuming allows us to obtain meaningful answers to coun- that all the consumers in the data are ex ante terfactual experiments, such as changing the identical from the econometrician’s point of view. online price, that are outside the variation ob- The terms below can easily be rewritten as func- served directly in the data. Second, the model tions of a vector of observables, and the identifi- allows multiple forms of identification to be cation arguments interpreted as identification of brought to bear and combined efficiently in a parameters conditional on this vector. single estimate. None of the sources of identi- We can potentially measure three quantities: PA (the probability of choosing A but not B); PB (the probability of choosing B but not A); and 7 Note that the method for calculating the price elasticity PAB (the probability of choosing both). The final described above is based on the assumption that the price of probability— choosing neither—is linearly de- the print edition is set optimally. The alternative assump- pendent so does not provide any additional tions I entertain are then (a) that only the zero online price is suboptimal and (b) that all prices are set optimally. These information. assumptions are discussed in more detail below. The goal is to estimate the various own- and 8 In particular, linear instrumental variables alone pro- cross-price elasticities. These may in turn be in- vide strong evidence that the print and online papers are puts into the analysis of the welfare from new substitutes rather than complements (as the raw correlations goods, the effect of a merger, or the change in would suggest). 9 See also Nevo (2000) and Peter C. Reiss and Frank A. profits from offering a different mix of products. Wolak (2005) for a general discussion of the advantages of Denote the prices of discrete goods A and B structural demand models. by p A and p B . Income not spent on A or B is

VOL. 97 NO. 3 GENTZKOW: PRINT AND ONLINE NEWSPAPERS 717 used to purchase a continuous composite (2) commodity. Utility from q units of this com- modity is ␣ q which enters overall utility lin- early. Denote the utility of consuming a bundle r by u⬘r . A natural quantity to define is PA ⫽ 冕u I共uA ⱖ 0兲I共uA ⱖ uB 兲I共uA ⱖ uAB 兲 dF共u兲, the double difference: ⌫ ⫽ 共u⬘A B ⫺ u⬘B 兲 ⫺ 共u⬘A ⫺ u⬘0 兲. This is the discrete analogue of the cross-partial PB ⫽ 冕u I共uB ⱖ 0兲I共uB ⱖ uA 兲I共uB ⱖ uAB 兲 dF共u兲, 冕 of utility, and measures the extent to which the added utility of consuming good A increases if good B is consumed as well. P AB ⫽ I共uAB ⱖ 0兲I Normalizing utility by u⬘0, we can define: u 共uAB ⱖ uA 兲I共uAB ⱖ uB 兲 dF共u兲. (1) u 0 ⫽ 0, A central focus of this paper will be estimat- uA ⫽ ␦A ⫺ ␣pA ⫹ A , ing the degree of substitutability or complemen- tarity among products. Throughout the analysis, uB ⫽ ␦B ⫺ ␣pB ⫹ B , I will use the standard modern definition of complements (substitutes): a negative (positive) u AB ⫽ u A ⫹ u B ⫹ ⌫. compensated cross-price elasticity of demand. Note that the definition is not based directly on Here, ur ⫽ u⬘r ⫺ u⬘0, ␦A and ␦B are mean utilities, properties of the utility function (see Paul A. and A and B represent unobservable variation Samuelson 1974 for an extended discussion). I in utility. I assume that ␦A, ␦B, and ⌫ are all show in this section, however, that in the simple constant across consumers. Note that these ex- model with two goods there is an intuitive re- pressions use the fact that the difference be- lationship between complementarity and the tween the utility from the composite commodity sign of the interaction term, ⌫. when good j is purchased (␣( y ⫺ pj)) and when Denote expected demand per consumer for neither good is purchased (␣y) is just ⫺␣pj. goods A and B by QA ⫽ PA ⫹ PAB and QB ⫽ To make the discussion concrete, I assume PB ⫹ PAB. Because the quasilinear specification the unobservables are distributed as of utility causes income to drop out, there are no wealth effects. The elements of the Slutsky ma- 冋 册 ⬃ N冉 0, 冋 1 1 册冊 . trix are then just the cross-derivatives of de- A mand, and so by the standard definition: B DEFINITION 1: Goods A and B are substi- The normalization of one of the variance terms tutes if ⭸QA/⭸pB ⬎ 0, independent if ⭸QA/⭸pB ⫽ to one is without loss of generality, since we can 0, and complements if ⭸QA/⭸pB ⬍ 0. divide all utilities by a constant and not change any of the choice probabilities. The normaliza- Figure 1 shows demand for the goods as re- tion of the other is purely to simplify exposition. gions of (uA, uB) space. The first panel shows the case of ⌫ ⫽ 0, the second panel shows the case of ⌫ ⬎ 0, and the third panel shows the case of ⌫ ⬍ B. Substitution Patterns 0. To see how the model determines the cross- price derivatives, observe first that increasing pB is Let F(u) be the distribution of u ⫽ (uA, uB, equivalent to shifting probability mass downward. uAB) implied by the assumptions above. Assum- That is, for any point (a, b) in this space, it ing consumers maximize utility, choice proba- increases the probability that uB ⱕ b given that bilities will be given by: uA ⫽ a.

718 THE AMERICAN ECONOMIC REVIEW JUNE 2007 FIGURE 1. ILLUSTRATION OF SUBSTITUTION PATTERNS IN A MODEL WITH TWO GOODS Notes: Figures show the regions of UA–UB space in which the consumer would choose the bundles A and B, B alone, A alone, or neither good. The first panel shows the case where the interaction between the two goods in utility is zero, the second panel the case where it is positive, and the third panel the case where it is negative. Consider the first panel. Increasing p B Next, consider the second panel. Increasing causes marginal consumers such as m to pB causes consumers m and n to switch as switch from buying the bundle AB to buying before. There will now be consumers such as o, A alone. It also causes marginal consumers however, who will switch from buying the bun- such as n to switch from buying B alone to dle AB to buying nothing. This means that the buying neither good. Neither of these changes drop in PAB will be larger than the increase in has any effect on the demand for good A, PA, and so ⭸QA/⭸pB ⬍ 0. In the case of ⌫ ⬎ 0, however—the increase in P A is exactly offset therefore, the goods are complements. by a decrease in P AB . This implies that when In the third panel, there are no consumers ⌫ ⫽ 0, the cross-derivatives of demand for the indifferent between buying AB and buying nei- products will be ⭸Q A /⭸p B ⫽ 0, and they are ther good, but consumers such as o are indiffer- therefore independent. ent between buying A alone and buying B alone.

VOL. 97 NO. 3 GENTZKOW: PRINT AND ONLINE NEWSPAPERS 719 Increasing pB causes them to switch from buy- bility mass in the figure upward just as reducing ing B to buying A, so that the increase in P A pB would, and the effect of such a change on QA is larger than the drop in P AB . We therefore will be determined by ⌫. find that ⌫ ⬍ 0 implies the goods must be For clarity of exposition, this example was substitutes. restricted to the case of two goods. Gentzkow This discussion suggests the quite intuitive (2005) shows how the intuition extends to the result that the interaction term ⌫ is the key multi-good case. The situation becomes more parameter for determining the substitutability of complex in a way analogous to standard (con- goods in a multivariate discrete choice model. tinuous) demand theory, but an intuitive link Formally, we can substitute into the definition between interaction terms in utility such as ⌫ of QA and take the derivative with respect to pB and substitution patterns continues to hold. to show that C. The Outside Option (3) ⭸Q A ⭸p B ⫽ 冕 u 关I共uA ⫽ uB 兲I共⫺⌫ ⱖ uA , uB ⱖ 0兲 The interpretation of the outside good in this setting is different from its interpretation in the standard multinomial model. In the standard case, the utility of consuming none of the ⫺ I共uA ⫹ uB ⫽ ⫺⌫兲I共uA ⱕ 0兲I共uB ⱕ 0兲兴 dF共u兲. modeled goods—typically indexed as choice zero—is implicitly maximized over all goods The first term inside the integral represents excluded from the model. If we are modeling points on the dark diagonal line segment in the demand for cars, for example, the utility of good third panel of Figure 1, along which consumers zero for consumer i would capture the utility of are indifferent between buying A alone and B that consumer’s best non-car transportation op- alone. The second term represents points on the tion. It would be the maximum of utility from dark diagonal segment in the second panel, taking the bus, riding the subway, walking, and along which consumers are indifferent between so forth. the bundle AB and buying neither good. In a model where choosing multiple goods Inspection of equation (3) immediately im- simultaneously is possible, on the other hand, plies the following result. all choices in the model include such an implicit maximization. In the newspaper application, the PROPOSITION 1: Goods A and B are substi- data do not include consumers’ consumption of tutes if ⌫ ⬍ 0, independent if ⌫ ⫽ 0, and many news sources, such as cable television, complements if ⌫ ⬎ 0. radio, Yahoo! news, and so forth. When a con- sumer in the data is observed to have read the While I motivate this result in terms of the Washington Post on a particular day, it may be thought experiment of changing prices, the ap- that the Washington Post was her only source of plication below will be to a situation in which news on that day, or it may be that she both read the price of one product—the online paper—is the Washington Post and watched half an hour fixed at zero. This does not cause any problems of CNN. What the econometrician observes is in terms of Definition 1, since a price change that the maximum utility of bundles that include around zero is well defined. Furthermore, be- only the Post is greater for this consumer than cause utility is quasi-linear, the sign of the the maximum utility of bundles that include any cross-price derivatives will be the same as the other combination of the observed goods. cross-derivatives with respect to other compo- One might ask how these unobserved goods nents of utility. This means we could run will affect the estimated substitution patterns. Sup- through the same intuition from Figure 1 for a pose, for example, that having watched CNN dra- shift in nonprice dimensions of utility. Suppose, matically reduces the marginal utility of reading for example, that good A is a print paper and the post.com (so that the two are never consumed good B is an online paper. Increasing the utility together) and dramatically increases the mar- of B by improving connection speed or making ginal utility of reading the Post print edition (so the Internet available at work will shift proba- that the two are always consumed together).

720 THE AMERICAN ECONOMIC REVIEW JUNE 2007 Suppose, further, that reading the Post has no D. Identification effect on the marginal utility of reading the post.com. From the discussion above, we Under the assumptions made so far, the know that if the Post and the post.com were model is not identified. There are three observ- the only two goods in the market, they would able data points and five independent parame- be independent in demand. If CNN is present ters: ␦A, ␦B, ⌫, ␣, and . but unobserved, however, we would never see The price coefficient ␣ is identified from the Post and the post.com consumed together, choice data alone if and only if there is variation and so would estimate that they are strong in prices. To see this, note that all predicted substitutes. probabilities would be the same if we replace What is important to recognize is that the the parameters (␦A, ␦B, ␣) by (␦A ⫹ ␣pA, ␦B ⫹ model’s answer in both cases would be correct. ␣pB, 0). With two observed price vectors, on the In a world without CNN, increasing the price of other hand, we gain three additional moments— the Post would have no effect on demand for the any one of these would be sufficient to identify post.com. In a world with CNN, on the other ␣ given the other parameters of the model. hand, increasing the price of the Post would re- In situations where there is no usable varia- duce consumption of both it and CNN, which in tion in prices, ␣ must be inferred by introducing turn would increase consumption of the post.com. an additional moment from some other source. The fact that the true substitutability of a pair of In the application below, this comes from one products will depend on both their direct inter- firm’s first-order condition. Although only the action in utility and their indirect interaction via sums ␦A ⫹ ␣pA and ␦B ⫹ ␣pB are identified other goods in the market has long been recog- from demand data, there will be a unique ␣ such nized in classical demand theory (Samuelson that the first-order condition is satisfied at the 1974; Masao Ogaki 1990). The data on con- observed price. sumption of the Post and the post.com will The remaining issue is how to separately allow us to estimate accurately their relation- identify the interaction term, ⌫, and the covari- ship in demand, whether or not we have data on ance of the unobservables, . Intuitively, the consumption of other related goods. These es- mean utilities ␦A and ␦B will be identified by the timates, however, will still be conditional on the marginal probabilities QA and QB. The remain- set of alternative goods available in the market. ing moment in the data will be how often the The estimates provide the correct quantity for goods are consumed together (whether PAB is evaluating the effect of a price change on firm high relative to PA and PB). A high value of PAB profits. The estimated response to removing the can be explained by either a high value of ⌫ or post.com from the choice set will also be cor- a high value of , and there is nothing left in the rect. The effects could change, however, if the data to separate these. prices or characteristics of important unob- Furthermore, Proposition 1 shows that this served goods changed dramatically, and the leaves the substitution patterns in the model data will of course allow us to say nothing about severely unidentified. Without some additional the relationship between the observed and the information, the same data could be fit by as- unobserved products. Note that these latter lim- suming that the goods are nearly perfect substi- itations are shared by all discrete-choice de- tutes (⌫ ⬇ ⫺⬁ and high) or nearly perfect mand models.10 complements (⌫ ⬇ ⬁ and low). A model that “solves” the problem by imposing an ad hoc 10 restriction on one of these two parameters will A more subtle issue is how the correct functional form of equation (1) will change in the presence of unobserved third goods. Suppose, for example, that there are three goods A, B, and C, but that only consumption of A and B is information about the functional form u⬘A than we do about observed. If the underlying utilities u⬘A, u⬘B, etc., are linear in max{u⬘A, u⬘AC}. Also, it is equally true in standard discrete price, the terms such as max{u⬘A, u⬘AC} that will actually be choice models that the “true” functional form of utilities estimated will be linear as well. Beyond this, however, there changes in complex ways as we vary the set of outside is no obvious relationship between the functional form of goods. The question in the current setting as always is utility with and without the implicit maximization over whether the functional form is sufficiently flexible to cap- consumption of C. Of course, we really have no more prior ture the important variation in the data.

VOL. 97 NO. 3 GENTZKOW: PRINT AND ONLINE NEWSPAPERS 721 be unlikely to provide a basis for reliable infer- consumers over time, and an additional time- ence about any quantity in which substitutabil- varying component (A, B), which is assumed ity of the goods plays an important role. to be i.i.d. across products and time. In the There are, of course, many ways that more newspaper application, this model would moments could be added to the data in order to amount to assuming that unobserved correlation identify the model. I will briefly discuss two in the utilities of different papers is driven by that seem likely to arise frequently in practice consumer characteristics such as a general taste and will play a key role in the application. I for news that are constant over the course of a assume that the necessary technical conditions week, and that the additional shocks that lead are satisfied such that the model is identified if consumers to read on Monday but not Tuesday and only if the number of moments is greater are uncorrelated. than or equal to the number of parameters. Now, if we observe each consumer’s choice The first possible source of identification is at two different points in time, we have in- exclusion restrictions. Suppose, in particular, creased the number of moments from 3 to 15.12 that there is some variable x which is allowed to Under the assumption that (˜ A, ˜ B) is constant enter the utility of one good, making the mean over time, this is sufficient for formal identifi- utility of good A, say, ␦A(x), but does not enter cation of the model parameters, including the either ␦B or ⌫. One obvious candidate is the full covariance matrix of the random effects. price of good A. In the newspaper application Intuitively, the argument is just a variant of the considered in this paper, there is no price vari- usual one for the identification of random ef- ation, but there are consumer specific observ- fects from panel data. Suppose again that goods ables such as having Internet access at work that A and B are frequently consumed together. If affect the utility of online but not print newspa- this is the result of correlated random effects, pers. Having observations at a second value of we should see some consumers likely to con- such an x (call this new vector x⬘) would add sume both and some consumers likely to con- three new moments (PA(x⬘), PB(x⬘), and PAB(x⬘)) sume neither, but conditional on a consumer’s but only one new parameter (␦A(x⬘)). The model average propensity to consume each good, the would therefore be formally identified. day-to-day variation should be uncorrelated Furthermore, the intuitive basis of the iden- across goods. If it is the result of a high ⌫, on tification is quite strong. Suppose, for example, the other hand, the day-to-day variation should that the goods are frequently consumed together be strongly correlated—a given consumer (PAB is high relative to PA and PB). If this is the might consume both on one day and neither on result of a high ⌫, the goods are complements, another day but would be unlikely to consume and shifting up the utility of good A by moving either one alone. x should also increase the probability of consum- A special case that will be relevant to the ing good B. If ⌫ is zero and the observed pattern application below is one where the data are not is the result of correlation, the probability of con- a true panel but include observations on both a suming good B should remain unchanged.11 single day’s purchases and a summary of pur- The second possible source of identification chases over a longer period of time. For exam- is panel data. Extending the model slightly to ple, suppose that consumers in the two-good allow for repeated choices over time, assume model make choices on two consecutive days. that the observables (A, B) are made up of two Suppose we observe the actual choice made on components—a possibly correlated random ef- day 1, but not on day 2. We also observe two fect term (˜ A, ˜ B), which is constant within dummy variables dA and dB, where dj ⫽ 1 if product j was chosen at least once over the two days. This clearly contains less information than 11 Michael P. Keane (1992) presents Monte Carlo evi- dence on the role of this kind of exclusion restriction in 12 identifying the covariance parameters in a multinomial pro- With observations at two points in time, the moments bit model. Since a multinomial probit model defined over would be the probability of each possible combination of bundles effectively nests the model of equation (1), this choices over the two periods. When there are 4 choices, this evidence is relevant. He shows that including exclusion gives 16 possible combinations. The number of moments is restrictions greatly improves the accuracy of the model. one less than this because the probabilities must sum to one.

722 THE AMERICAN ECONOMIC REVIEW JUNE 2007 a true panel would—if both A and B are chosen consumed alone. I will discuss what these data on day 1, we will have dA ⫽ 1 and dB ⫽ 1 would imply for several existing approaches in regardless of the choice on day 2, and the data the literature. therefore provide no information on the day 2 One approach is the multiple-discrete choice. On the other hand, if neither good was choice model pioneered by Igal Hendel chosen on day 1, dA and dB will tell us what was (1999) and applied by Jean-Pierre Dubé chosen on day 2 exactly. (2004). These models assume that the data are Although this is a more limited form of generated by an aggregation over a number of information about choices over time, it can individual choice problems, or “tasks.” For still separately identify the covariance matrix example, Hendel (1999) estimates demand for of the random effects and thus distinguish PCs by corporations. In this case, a task might true complementarity from correlation. To see represent a single employee’s computing the intuition for this, consider, first, observa- needs. Each agent chooses a single good for tions on consumers who chose neither product each task, which makes the task-level prob- on day 1. The data will allow us to observe lem analogous to equation (1) with ⌫ AB ⫽ exactly what these consumers chose on day 2. ⫺⬁. 13 Because the utility from using a given If the variance of ( ˜ A , ˜ B ) is small, condition- good in one task does not depend on what ing on the fact that they chose neither good on goods were chosen for other tasks, aggregat- the first day does not change their choice ing over a large number of these tasks is probabilities on day 2—we should expect the similar to aggregating over a population of latter to be exactly the same as the choice heterogeneous consumers in a standard multi- probabilities in the sample as a whole for day 1. nomial discrete choice model. The model If the variance of the random effects is large, therefore restricts the goods to be substi- on the other hand, the fact that these consum- tutes.14 ers did not purchase on day 1 would predict A second approach is the multivariate probit that they would also be less likely to purchase (applied, for example, by Angelique Augereau, on day 2. We can therefore think of these Shane Greenstein, and Marc Rysman forthcom- consumers as identifying the variance of the ing). Here, consumption of each good is as- random effects. The correlation term will then sumed to be driven by a separate probit be identified by consumers who chose either equation, with errors possibly correlated across A or B, but not both, on day 1. For a consumer equations. This is exactly equivalent to equation who chose A only on day 1, we will see d B ⫽ (1) with ⌫AB ⫽ 0, and so restricts all goods to be 1 if and only if B was chosen on day 2. If the random effects are strongly positively corre- lated, observing a choice of A on day 1 suggests 13 that the consumer will be relatively more likely to Both papers allow consumers to choose multiple units of each good, so the task-level choice is more complicated choose B on day 2. If they are negatively corre- than a standard multinomial discrete choice problem. But lated, such a consumer should be less likely to the utility specification implies that consumers will choose choose B on day 2. at most one type of good for each task. 14 A different parametric restriction on the ⌫ interaction terms underlies the model of Tat Y. Chan (2006). He defines goods to be a bundle of characteristics, and assumes that the E. Relationship to Past Literature utility of a bundle is a function of the sum of each charac- teristic across the different goods. The bundle consisting of The model of equation (1) provides a useful a bottle of Diet Coke and a bottle of Diet Pepsi, for example, consists of two units of the characteristic “cola,” two units starting point for understanding the existing ap- of the characteristic “diet,” and one unit each of the char- proaches in the literature to estimating discrete acteristics “Coke” and “Pepsi.” Because utility is assumed choices when multiple goods are chosen simul- in the main specification to be concave in the total of each taneously. To make the discussion concrete, characteristic, it is subadditive across goods, meaning suppose we have micro data on demand for two ⌫AB ⬍ 0. This would again imply that the products must be substitutes. (Chan does find complementarity among some goods, A and B. Suppose that the frequency with products in a specification with many goods, which appears which the goods are consumed together is high to result from indirect substitution effects as described relative to the frequency with which either is above.)

VOL. 97 NO. 3 GENTZKOW: PRINT AND ONLINE NEWSPAPERS 723 independent in demand (all cross-elasticities are II. A First Look at the Data zero).15 A third approach is to estimate a logit or A. The Scarborough Survey nested logit model defined over the set of all possible bundles. Papers that take this approach include Charles F. Manski and Leonard Sher- The empirical analysis is based on a survey of man (1980) and Kenneth E. Train, Daniel L. 16,179 adults in the Washington, DC, Desig- McFadden, and Moshe Ben-Akiva (1987). Be- nated Market Area (DMA), conducted between cause each bundle’s utility is parameterized sep- March 2000 and February 2003 by Scarborough arately, the ⌫AB term could be estimated freely Research. The Washington, DC, DMA includes (although both of these papers restrict the inter- the District of Columbia itself, as well as neigh- actions as a parametric function of the goods’ boring counties in Virginia, West Virginia, characteristics). The unobservables, on the Pennsylvania, and Maryland. The data include a other hand, are either assumed to be uncorre- range of individual and household characteris- lated (in the case of the logit) or have a corre- tics of the respondents, as well as information lation structure dictated by the nests, which is on various consumption decisions. Most impor- too restrictive to allow the kind of correlation tantly for the current application, these include implied by equation (1) with ⫽ 0. Given the an enumeration of all local print newspapers hypothetical data, we would expect such a read over the last 24 hours and 5 weekdays, as model to find ⌫AB ⬎ 0, implying that the goods well as readership of the major local online would be complements. newspapers over the same periods. The main difference between the current Washington, DC, has two major daily news- framework and those that exist in the literature papers, the Washington Post and the Washing- is thus a more flexible specification of the way ton Times. The former is dominant: average goods interact in utility and the correlation of daily readership of the Post was 1.8 million in unobservable tastes. The functional forms for 2000 –2003, compared to 256,000 for the Times. observable and unobservable utility that have The two papers also differ in their perceived been used in the past impose strong restrictions political stance, with the Times generally on substitution patterns: for a given set of ob- thought to be more conservative than the Post. servations, one could choose models from the The main online newspaper is the post.com, literature that would imply that the goods are which had an average of 406,000 area readers strong substitutes, independent, or strong com- per day.16 plements. In certain settings, such assumptions I will define the goods in the model to be will be justified, and making them has the ob- daily editions of the Post, the Times, and the vious benefit of allowing the researcher to ana- post.com. The outside alternative will include lyze larger choice sets than the one considered other print and online newspapers, other news here. In other settings, the necessary prior infor- sources such as television and radio, and the mation is not available, and it will be critical to choice not to consume news at all. As noted allow a more flexible structure and address di- above, all choices in the model represent an rectly how substitution patterns are identified by implicit maximization over these outside the data. goods—the observed choice to read the Post only, for example, includes consumers who 15 The discrete-continuous framework of Jaehwan Kim, Greg M. Allenby, and Peter E. Rossi (2002) also assumes the equivalent of ⌫AB ⫽ 0 (that the utility of a bundle is 16 Readership figures are based on the Scarborough sur- simply the sum of the utilities of the underlying goods). The vey. Note that these readership numbers are larger than conclusion that the goods must be independent does not circulation figures for the same papers, reflecting the fact hold here, however, because the utility of the outside com- that multiple consumers read each copy. The Times also has posite commodity is allowed to be concave rather than an online edition, the washingtontimes.com, but its reader- linear. This implies that all goods will be substitutes, though ship is very small and there are only 373 readers in my with a single curvature parameter governing all the cross- sample. In practice, this turns out to be too few to accurately elasticities as well as the elasticity of total expenditure on estimate utility parameters for the washingtontimes.com, the inside goods. and so I omit it from the analysis.

724 THE AMERICAN ECONOMIC REVIEW JUNE 2007 TABLE 1—SUMMARY STATISTICS Scarborough Washington, DC, DMA survey (census) N 16,179 4,203,621 Median income $62,500 $60,774 Black 20.6% 23.5% Hispanic 6.4% 7.9% Female 57.9% 52.1% Age distribution: 18–29 17.5% 21.3% FIGURE 2. READERSHIP OF NEWSPAPERS IN WASHINGTON, DC 30–39 22.6% 23.4% (1961–present) 40–49 22.2% 21.7% Notes: Scarborough Research Readership figures are de- 50–59 17.9% 15.8% rived by using historical circulation data and the ratio of 60⫹ 19.8% 17.7% readership to circulation in the 2000 –2003 Scarborough Highest schooling: data. ⬍High school 7.7% 14.4% Source: Audit Bureau of Circulations. High school 47.0% 42.1% College 27.2% 26.3% Graduate 18.0% 17.2% Notes: The Scarborough survey is a randomized sample of B. Reduced-Form Results residents of the Washington, DC, DMA 18 years of age and older. All census figures refer to the population of individ- uals 18 years of age and older, except percent black and Figure 2 displays the daily readership of Hispanic, which are proportions of all residents. Median Washington, DC’s print and online newspapers income is the population-weighted mean of the median since 1961. The first thing to note is that the incomes of counties in the Washington, DC, DMA. rapid increase in post.com readership since its introduction in 1996 has been accompanied by a drop in Post readership. A simple OLS regres- read both the Post and the New York Times, or sion of Post readership since 1984 on post.com read the Post and watch TV news. readership and a time trend gives a significantly Table 1 gives summary statistics for the Scar- negative coefficient, and suggests that it takes borough data along with corresponding census four post.com readers to reduce Post readership figures for the Washington, DC, DMA. The by one. Although it might be tempting to take survey is approximately a 0.4 percent sample, this as direct evidence that the print and online and is broadly representative, with some over- editions are substitutes, several factors make representation of older, more educated, and such a conclusion dubious. First, the downward more wealthy individuals, and some underrep- trend in Post readership begins in 1994, two resentation of minorities. The survey includes years before the post.com was introduced, and it sampling weights to correct for this overrepre- does not accelerate significantly thereafter. Sec- sentation. I will use the unweighted data for ond, newspaper readership has been declining estimation, and use weights when I simulate for many years nationally and there are many aggregate effects.17 demand-side trends that could account for the downward slide of the Post. Finally, the down- ward trend in Post readership coincides with a 17 In addition to including weights, the raw Scarborough series of increases in the Post’s subscription data also correct for respondents who filled out an initial price, and it would be difficult to separate these questionnaire but not the longer survey by filling in a small number of these consumers’ survey responses using the price effects from the effect of the post.com responses of other consumers matched by demographics. using aggregate time series alone. For these These “ascribed” observations are easy to identify because reasons, getting a handle on the impact of the the probability of two respondents with the same sampling post.com will require bringing additional infor- weight matching perfectly on all survey responses by ran- mation to bear on the problem. dom chance is very low. I omit these observations (about 6 percent of the initial sample) in all estimation, but include Figure 2 also provides evidence about the them in the policy simulations in order to get the correct extent of substitutability among different print match to aggregate demographics. papers. The exit of the Washington Star in 1981

VOL. 97 NO. 3 GENTZKOW: PRINT AND ONLINE NEWSPAPERS 725 TABLE 2—CROSS TABULATION OF POST AND POST.COM ing values in the survey.18 Readership of the READERSHIP Post and post.com are significantly positively correlated over both 24-hour and 5-day win- 24-hour: Didn’t read post.com Read post.com dows. Controlling for observable characteristics Didn’t read Post 8,771 622 reduces this correlation by about two-thirds, but Read Post 5,829 877 it remains significant at the 0.1 percent level. 5-day: Didn’t read post.com Read post.com The correlation between readership of the Post Didn’t read Post 6,012 680 and the Times is also significantly positive in the Read Post 7,203 2,204 raw data, but this disappears when controls are added. The partial correlation is zero over a 24-hour window, and significantly negative over a 5-day window. The correlation between the Times and the post.com is never signifi- and the Washington News in 1973 led to in- cantly different from zero. creases in the readership of the remaining pa- What can we conclude from these results? pers, suggesting some substitutability. In both The basic fact in the raw data is that a con- cases, however, the exit led to declines in total sumer who reads any one paper is on average readership, and fewer than half of the readers of more likely to have also read a second paper. the exiting paper appear to have switched to one If all heterogeneity in utilities were uncorre- of the remaining papers. In terms of the Post lated across papers, this would be strong ev- and the Times, the time-series provides no evi- idence that all three are complements. An dence of a negative relationship. A linear re- alternative explanation is that the kind of con- gression of Post readership on Times readership sumers who get a lot of value from reading actually gives a positive coefficient (though in- the Post also get a lot of value from reading significant), even when a time trend is included. the post.com and the Times. The fact that the Of course, these regressions do not distinguish positive correlation decreases dramatically substitutability from changes in demand or when we partial out the effect of observables characteristics of the products over time. provides direct evidence for this. The ques- Turning to the Scarborough micro data, the tion is whether the remaining correlation—in first thing to note is that readership of multiple particular the positive correlation between the papers is common. Forty-eight percent of con- Post and the post.com—represents true sumers reported reading at least one of the Post, complementarity or additional correlation in Times, or post.com in the last 24 hours. Of these tastes which is unobserved. consumers, 18 percent reported reading two of To separate these stories, I will exploit the papers, and 1 percent reported reading all variables that should have a strong effect on three. Over a five-day window, 65 percent of the utility of reading the online newspaper, consumers read at least one of the papers; of but should have no direct effect on the utility these, 27 percent read two papers and 3 percent from reading in print. First, I include a read all three. Table 2 reports the number of dummy variable measuring whether the con- consumers reading the Post and the post.com sumer has Internet access at work. Being able over 24-hour and 5-day windows. It is immedi- to access the Internet at work clearly reduces ately clear from this table that combined read- the time cost of reading online, but should not ership of print and online news is common. In directly affect the utility from reading in fact, the fraction of online readers who read print. Second, I include two dummy variables print is higher than the fraction of those who do indicating whether the consumer uses the In- not read online. ternet for either work-related or education- Table 3 reports raw and partial correlation related tasks. Performing these tasks should coefficients for each pair of papers. The partial correlations control for age, sex, education, in- dustry of employment, employment status, in- 18 These correlations drop consumers for whom either come, political party, date of survey, location of print or online papers were excluded from the choice set as residence within the DMA, and number of miss- discussed in the demand specification below.

726 THE AMERICAN ECONOMIC REVIEW JUNE 2007 TABLE 3—CORRELATION COEFFICIENTS 24-hour 5-day Raw Partial Raw Partial Post-post.com 0.0989** 0.0364** 0.1579** 0.0673** Post-Times 0.0632** 0.0035 0.0450** ⫺0.0623** Times-post.com 0.0146 0.0090 0.0184 0.0066 Notes: The table displays correlation coefficients between dummy variables for reading the Post, post.com, and Times. In the first two columns, the variable is equal to one if a respondent read in the last 24 hours. In the second two columns, the variable is equal to one if a respondent read in the last five weekdays. Partial correlations are correlations in the residuals from regressions of each consumption dummy on controls for age, sex, education (four categories), white-collar worker, computer worker, employment status, income, political party, date of survey, location of residence within the DMA (six categories), and dummy variables for the number of missing values. Observations where either print or online newspapers were not in the choice set (consumer reports that she generally reads no newspaper sections or did not use the Internet in the last 30 days) were dropped. ** Significant at 1 percent. lead consumers to be more familiar with the zero and one, it restricts the cross-derivative be- Internet and spend more time at their comput- tween print and online to be the same for all ers, both of which should decrease the effec- consumers, and it does not use information from tive cost of reading news online, but not the other choice equations. It shows in an intuitive directly affect the utility of print reading. way, however, how the exclusion restrictions con- Finally, I include a dummy variable indicat- tribute to identification. ing whether the consumer has a high-speed The first column of Table 4 shows estimates Internet connection at home. This, too, should from a linear probability model of readership over increase the utility from reading online with- the five-day window, using the same controls as in out directly affecting the utility from reading the partial correlations above. Reflecting the pos- in print. itive correlation noted earlier, the first column Note that an important limitation of the data shows that reading the post.com is positive and is that I do not have variables that could be significant in an OLS regression. The second col- assumed to shift the utility of the Post print umn presents two-stage least squares (2SLS) esti- edition but not the Times print edition, or vice mates using the excluded variables as instruments. versa. I discuss below how the model is identi- The coefficient on online reading becomes signif- fied despite this limitation. In the robustness icantly negative. The magnitude suggests that if section, I also show that the results remain we could do an experiment and randomly assign qualitatively unchanged when the Times is ex- individuals to read the online paper (at zero time cluded from the analysis completely, suggesting cost), they would be on average 40 percent less that limited identification on the print side does likely to read the print paper, though the limita- not bias the estimates of the Post-post.com re- tions of the linear probability model mean this lationship. The reader should bear in mind, magnitude must be interpreted with caution. The however, that the lack of such excluded vari- F statistic on the instruments in the first stage is ables means that the print-print substitution pat- 33.05, suggesting that weak instruments are not a terns should be interpreted with more caution problem (James H. Stock and Motohiro Yogo than the print-online substitution patterns. 2002). The 2 statistic for the overidentification One way to see the effect of these excluded test in this regression is 3.65, with a p value of variables and perform some checks on their valid- 0.302, meaning the validity of the instruments ity is to use them to instrument for online reading cannot be rejected. in a linear probability model of print reading. A possible concern is that even if the excluded There are several problems with such a specifica- variables do not affect the utility of reading print tion: it does not restrict probabilities to be between newspapers directly, they might be correlated with

You can also read