The Valuation of Unprotected Works: A Case Study of Public Domain Photographs on Wikipedia

What is the value of works in the public domain? We study the biographical Wikipedia pages of a large data set of authors, composers, and lyricists to determine whether the public domain status of available images leads to a higher rate of inclusion of illustrated supplementary material and whether such inclusion increases visitorship to individual pages. We attempt to objectively place a value on the body of public domain photographs and illustrations which are used in this global resource. We find that the most historically remote subjects are more likely to have images on their web pages because their biographical life-spans pre-date the existence of in-copyright imagery. We find that the large majority of photos and illustrations used on subject pages were obtained from the public domain, and we estimate their value in terms of costs saved to Wikipedia page builders and in terms of increased traffic corresponding to the inclusion of an image. Then, extrapolating from the characteristics of a random sample of a further 300 Wikipedia pages, we estimate a total value of public domain photographs on Wikipedia of between $246 to $270 million dollars per year.

their copyrights purportedly add to the economy and the losses copyright owners suffer from infringement. 2 This type of evidence, while initially compelling, is reductionist and overestimates the role of exclusive copyright in promoting the health of creative industries. One major flaw in such research is that not all of the inputs to creative production attract copyright or are dependent on permission from a rightsholder. For example, alternative valuation schemes have sought to enumerate the 'fair use' economy, citing realms of economic activity (such as software coding) which overlap traditional definitions of the creative industries. 3 Another source of input to creative production is works and ideas residing in the public domain. This paper represents an effort to place a value on the contribution of public domain works to the world's largest digital encyclopedia project, Wikipedia.
Those advocating a robust public domain and resistance to the expansion of copyright law have a harder time estimating the value of the body of works they believe should be kept freely available. 4   in our definition of the public domain: 1) works whose copyright term has expired: 2) works whose copyrights were extinguished due to the failure by their owners to observe various legal formalities; 3) works never subject to copyright because their creation pre-dated the legal recognition of copyright; 4) works expressly dedicated for free use 5 by their authors, including U.S. government works; 5) objects without authors, such as facts; and 6) objects dedicated to the public by international agreement, such as ideas and concepts. 6 As a matter of law, the domain of items in the above categories may be used by any member of the public without paying a license fee. 7 Putting a monetary value on works in these six categories is difficult. The corpus of Shakespeare's work is in the public domain and few would debate its value, but no one has yet calculated it in economic terms. Many commentators have also made the case that copyrighted works frequently rely on some form of valuable public domain input (think of Disney's use of public domain stories and characters in Cinderella, Snow White, Beauty and the Beast, Pocahontas, etc.), 8 but disentangling the public domain inspiration to those works and putting a monetary value on that input proves elusive. 9 This situation creates a the public domain has huge value without attempting to place a monetary figure on any part of it). 5 For the purposes of this paper, therefore, we include works which may be freely used under a Creative Commons license, even though in many cases the author technically retains title. 6  rhetorical imbalance, as copyright expansionists 10 come to policymakers with seemingly hard figures while public domain advocates fight back with anecdotes and intuition. 11 This article is one of the first attempts to quantify in monetary terms a portion of the public domain. Calculating the entire value of all public domain works is likely overly ambitious, but the value of some of its discrete corners may be partially quantified. 12 We believe that the empirical example we provide can demonstrate to policymakers more precisely how the absence of copyright can add economic value to a set of discrete works. In Part I, we discuss the problems economists face when trying to value both copyrighted and public domain works. So far, most attempts to place a monetary value on copyrights have focused on quantifying private value to owners (usually royalty streams) rather than net social welfare which is the relevant touchstone for policymakers. We also explore the existing literature outlining the circumstances when public domain status increases net social welfare.
In Part II, we examine the use of public domain images (mostly photographs) on Wikipedia pages, one of the largest and most popular common-access websites, and form several hypotheses about the value the images add to the web site. We ask: 1. For a given topic which spans a period of time when images will be both in and out of copyright In Part III, we develop a methodology for estimating the value added by public domain images to Wikipedia pages in terms of costs saved to the page developer and increased traffic to the page. Our study focused on two large samples of Wikipedia pages, one of authors and another of composers and lyricists.
We collected, among other information, data on the birth and death dates of each subject, the date an image (if any) appeared on the subject's Wikipedia page, the legal justification for the inclusion of the image, and changes in page traffic between 2009 and 2014. We also collected pricing data from services like Corbis and Getty Images who license digital reproductions of public domain images for online use. Then, using a random sample of Wikipedia pages, we extrapolate our findings from the author and composer/lyricists data to the entire web site. We conclude that the value of public domain photos on Wikipedia is approximately $246 to $270 million dollars per year. In Part IV, we identify some policy implications of the study with particular reference to proposed orphan works legislation.

I. THEORETICAL CHALLENGES TO VALUING PUBLIC DOMAIN WORKS
Putting a monetary value on a tangible asset, say a copy of the ninth edition of Richard Posner's Economic Analysis of Law (currently $199.50 on Amazon.com), is usually quite straightforward. The market price will serve as an accurate, and sometimes perfect, proxy. Calculating the value of Posner's copyright, a unique and intangible right to exclude others from copying the book, is more challenging. Calculating the value of the free availability of Blackstone's Commentaries on the Laws of England (currently costless on the Project Gutenberg web site 13 and priced at $2.99 on Amazon.com), is even trickier. Before attempting to estimate a value of any item in the public domain, and before we attempt to estimate the value of public domain images on Wikipedia, we examine and account for two basic valuation challenges: the difficulty of differentiating the value of copies and legal rights, and differentiating private value and social welfare. We elucidate these challenges below.

A. Differentiating the Value of Copies and Legal Rights
When valuing a copyright, economists make an essential distinction between the value of tangible works sold by the copyright owner (a book, 10 DVD's, a 100 song downloads) and the right to exclude others from copying or performing the work, which is the fundamental characteristic of the copyright. 14 A copyright, being unique, is hard to value, but sometimes they are sold, and one can then use a market price as an accurate proxy for value. 15 In addition, in the absence of evidence from the direct sale of a copyright, econometricians can infer its value from royalty streams paid by licensees with varying degrees of accuracy. 16 Alternatively, when a copyright is neither sold nor licensed (imagine the copyright in a self-published book), valuation can sometimes be informed by the income stream generated by sales of the tangible work.
However, a valuation based on the income stream generated by sales of the tangible work protected by copyright presents two problems. First, that income stream can only be used to establish a minimum value because one typically cannot know whether the present copyright owner is most highly valued user. The self-published book referenced above might generate far more revenue in the hands of a large traditional publisher capable of more efficiently exploiting the market. Second, calculating the portion of the profits generated by the copyright itself is difficult. Penguin, for example, publishes both copyrighted and public domain books in its selection of Penguin Classic Editions. 17 A study of these "classics" showed that the public domain editions sold on average of $11.10 while the copyrighted editions sold on average for $14.60, suggesting an additional profit to Penguin of $3.50 for each copyrighted volume. 18 Is $3.50 per volume the value of the Penguin copyrights? Probably not. Penguin is obligated to pay its authors a royalty for each copyrighted volume sold, which diminishes its profit margin. If the authors earn 20% per book, 19 Penguin would only earn an extra 4% more on sales of the copyrighted editions ($3.50/14.60 = .24) over the public domain editions. 20 Nonetheless, the income represented by the 4% premium might serve as a proxy in calculating a minimum value of the copyright to the publisher. 15  In one important sense, the task of valuing a public domain work is easier than valuing a copyrighted work. As noted above, valuing a copyright requires valuing the legal right to exclude, and sales of tangible copies of the work are problematic proxies for that exclusive right. However, in the absence of any legal rights surrounding a work, value is much more directly a function of the measurable income stream a work generates. If a public domain work generates $100,000 in profits each year for those selling it, then consumer willingness to pay can establish at least that baseline value. 21 Unfortunately, the nature of the public domain itself complicates matters. Consider the value of the Adventures of Sherlock Holmes, 22 one of the public domain books published by Penguin Classic Editions. 23 Penguin might be coaxed into revealing the amount of profits earned by sales of its edition, but when we queried Amazon for "Sherlock Holmes," we obtained over 5,000 results. Hundreds of different publishers, not to mention movie and television show producers, have taken advantage of the public domain status of the great detective and are currently marketing thousands of versions and adaptations. As a practical matter, it is profoundly difficult to gather the information necessary to directly calculate the total amount of profits earned by a public domain book, song, or fictional character. We offer a more indirect means of measuring value in Part II below.

B. Private Value v. Social Welfare
Changes to copyright law can affect the value of both copyrighted works and works in the public domain. 24 An objective policymaker must determine whether society will be better off after proposed changes to the law come into effect. In doing so, the focus is on the change in the social value of copyrights, not on the value of particular copyrights to private individuals. This point is made concisely by economist Rufus Pollock who imagines a novel that initially sold for £10 in shops while it was 21 Relying solely on the income stream may produce values that are both too low and too high: too low because later entrants may be more efficient at exploiting the work, so its latent value may be underestimated; and too high because some of the revenue may be driving by advantages not inherent in the work itself-the income generated by Sony's choice of a public domain work as a new game platform will be partially driven by the value of the existing network of locked-in Play Station users. 22  under copyright, but that had its price reduced to £5 when it fell into the public domain and became freely available on the internet: Sometimes it is suggested that this results in a reduction in the value of that work for society since before the work was 'worth' £10 but now is 'worth' only £5 or even nothing. [This is] completely false. The value of the work has not changed at all. All that has happened is that the price has dropped. A consumer who previously valued the book at, say, £15 and who paid £10 and was left with £5 of 'consumer surplus', now pays £5 (or £0) and is left with £10 (or £15) of 'surplus'. 25 In Pollock's hypothetical, the copyright owner has suffered a serious loss of profits. At a minimum, it will have to lower its price by £5 in order to compete, and it will surely lose sales. Yet, society is better off because the book is still available and at a lower price to consumers. Although difficult to quantify, the amount of consumer surplus resulting from the change in legal status is the value of public domain in this instance.
Unfortunately, industry estimates of copyright value provided to lawmakers typically only estimate private value. 26 For example, as Congress considered an additional 20-year extension of the copyright term in 1998, the Congressional Record Service, relying on industry estimates, found that revenue to private copyright owners would decrease $330 million through the year 2017 unless the term extension was passed. 27 The report was in essence finding that the law would increase consumer spending by $330 million over a twenty-year period. The report contained no consideration of what consumers would gain from the legally mandated expenditure. Even copyright skeptics have also made the mistake of conflating private value with social value. 28 An illustration might help. Suppose that a property owner owns a strip of land (distant and not visible from its residence) over which people must travel to reach a beach. The property owner charges people $10 to cross his property to get to the beach. We could monetize his right to exclude by looking at the amount of money he gets in payments. Now suppose we want to monetize a public access easement across the property. The property owner might rightfully complain that doing so would decrease the value of his property by the market value of his right to exclude (so if 1,000 people a month crossed his property, by $10,000/yr.). But it would be odd to suggest that creating a public access 25 See Pollock, supra note 4 at 6. 26 See sources cited in notes 1-2. 27 See Rappaport, supra note 2 at 28 See Computer & Communications Industry Association, Fair Use in the U.S. Economy: Economic Contribution of Industries Relying on Fair Use (2010) (finding that "industries relying on fair use," e.g. educational institutions, software developers, internet search and web hosting providers, etc., generate $4.4 trillion in revenue). easement makes $10,000 disappear from the economy. Rather, the $10 stays in the pockets of beachgoers, who then spend it on something else. And in addition, of course, many beachgoers who were not willing or able to pay $10 each for access are now able to go to the beach. So the $10,000 a year is a minimal measure of the amount of public benefit that accrues from the public access easement.
In 2006, two commentators did attempt to directly measure the social value of copyright in terms of consumer surplus. In order to measure the economic effect of illegal music downloading, Rob and Waldfogel surveyed students at Penn on their copying behaviors and how they valued particular musical works. 29 Rob & Waldfogel found that illegal downloading reduced the amount of student expenditures on music on average by $25 per year per student. 30 This constituted revenues lost to the record companies. They also found, however, that consumer surplus increased by over $70 per student due to the inability of the record companies effectively to price discriminate. 31 In other words, when students were willing to pay only $10 for an album that was priced at $15, they illegally downloaded the album for free, generating $10 in consumer surplus without an offsetting revenue loss to the record companies because no sale would have occurred in the absence of the download. This is a rare attempt directly to quantify consumer welfare in the copyright context.
Pollock's hypothetical and Waldfogel's study may seem to suggest that in every case the public is better off when a work becomes freely available. Such a conclusion must be subject to two major assumptions. First, the initial term of copyright must be set long enough and the scope of protection must be robust enough to stimulate the creation of the work in the first place. If the term of protection is too short, for example, then the work may never be produced and society may be worse off. Second, the work must remain available to the public after it falls into the public domain. If the lack of copyright causes the work to disappear, then the public is worse off and we should prefer prolonged copyright protection. 32 29  and music become more available to the public when the fall into the public domain). In addition, some have also argued for a third caveat, asserting that copyright protection might be necessary to prevent the "tarnishment" of the work. This is not a widely accepted argument and existing empirical work suggests tarnishment is unlikely, even when a work is associated with pornography against the wishes of its owner. See Chris Buccafusco & Paul J. Heald, Testing Theories of Tarnishment (on file with the authors).
In the next section, we show a continuing need to develop techniques for quantifying the value of the public domain even when incentives and availability might be impaired by a change in legal status.

C. Incentives and Availability
The continued conflation by policymakers of private value and social welfare creates an urgent need for improving econometric tools for quantifying the value of the public domain. As long as lobbyists assert that the size of royalty streams to private owners is a proper measure of public welfare, 33 then policymakers will need to be confronted with hard figures on the value of the public domain.
In addition, occasions may arise when copyright owners can show that a gap in protection results in the serious diminishment of incentives to create new works. For example, Rob and Waldfogel showed that on average student subjects spent $25 less per year because of their illegal downloading activity. 34 The recording industry might be able to link that revenue loss to the public release of fewer songs and then quantify the value of the missing works. 35 In order for a policymaker to evaluate the wisdom of a change in copyright law designed to re-balance incentives, the offsetting consumer benefit from illegal downloading quantified by Rob & Waldfogel would provide additional relevant data (and may explain why a copyright-friendly Congress has failed to adequately address illegal file sharing). Since illegal downloading essentially nullifies the copyright status of a work, the study provides an instructive example of usefulness of valuing the public domain.
Finally, copyright owners have claimed that bad things happen when works fall into the public domain, 36 claiming that works will disappear when they no longer have an owner. 37 Lack of availability of 33 See Rappaport, supra note 2. 34  invest in them due to the problem of free riding. Items which retain enough value for future use should be given indefinite copyrights to maintain their value.").
works to the public would present a social welfare problem that could be quantified. 38 In fact, even a vague estimate might suffice to convince policymakers to extend the term of copyright protection indefinitely because the countervailing public domain value of works that have gone missing would presumably be zero.
So far, copyright owners have been unable to demonstrate any negative affect on availability caused by the transition to public domain status. 39 In fact, the opposite seems to be true. Several studies have shown that works become more available when they fall into the public domain: Figure 1 The 1998 term extension implicitly relied on the notion that the absence of protection would result in diminished distribution and dissemination.  42 Two other studies have shown that public domain music is just as likely to appear in movies as copyrighted music from the same era (1913)(1914)(1915)(1916)(1917)(1918)(1919)(1920)(1921)(1922)(1923)(1924)(1925)(1926)(1927)(1928)(1929)(1930)(1931)(1932). 43 Public domain status does not seem to cause an availability problem. As copyright owners continue to push for term extensions, one can see two distinct valuation issues present themselves in Figure 1. First, how great is the consumer surplus (the public domain value) embodied in the large volume of pre-1923 works that are now being offered? Second, by how much is consumer welfare diminished by the absence of post-1923 books that have gone out-of-print? An answer to the first question has not been attempted to our knowledge, while Smith, Telang, and Zhang suggest a figure of $860 million in unrealized consumer surplus represented by books that are currently out-of-print and unexploited in eBook format by their copyright owners. 44 Finding hard numbers to better answer these questions may help Congress more accurately predict the effect on social welfare of another round of term extensions.

II. VALUING PUBLIC DOMAIN IMAGES ON WIKIPEDIA: METHODOLOGY
Given the Adventures of Sherlock Holmes experience described above, 45 we do not attempt to measure the value that public domain status adds to books. In addition to the measurement problems caused by the existence of multiple publishers of most public domain book titles, publishers keep their revenue and sales data secret, 46 frustrating outsider attempts at valuation. Similar challenges exist when attempting to quantify the value of public domain music or film. Instead, we focus on 40  quantifying the value of public domain images on Wikipedia, a forum which is exceedingly transparent and amenable to data mining. Notably, the value that public domain images add to Wikipedia is not based on their transfer value-Wikipedia is not a market for the sale of images-so revenue streams to or from page builders, users, or the Wiki community need not be considered.
The valuation task, however, is hardly straightforward. Surely a Wikipedia page is more valuable if it contains an image illustrating its subject, but how much value is added? One could attempt to survey users' stated valuations, as did Rob & Waldfogel with popular music, 47 but we doubt subjects could do anything more than guess at the value added by images. Most people are not familiar with market prices for online images, nor are they used to paying for access to online resources like Wikipedia. Instead, we posit that the public domain stock of images could indirectly add value in two ways. First, page builders save transaction costs and, potentially, licensing fees by using free images rather than negotiating with the copyright owner of an image. 48 Second, the Google search engine, which directs a majority of traffic on the worldwide web, prioritizes web pages with images over pages without images; 49 therefore, Wikipedia pages with public domain images should experience more average views than pages without images. Since page visits can be valued according to the equivalent average advertising revenue generated per visitor (and a page view on Wikipedia has been estimated potential value of $.0053) 50 the value of any extra traffic driven by the image should be calculable.
Our first research question considers the scope of the effect of the reservoir of public domain images on page building: 1) Does the existence of an image on a Wikipedia page correlate with the historical scope of available public domain images? Further research questions 47 See supra notes 29-32 and accompanying text. 48 Wikipedia itself does not pay for permission to include images on pages but an individual page builder can pay a copyright owner for a license to include an image. 49  focus on the value of an image on authors, composers, and lyricists pages and the value of the set of all public domain images on Wikipedia: 2) How much does the availability of public domain images lower the cost of web page building? 3) How much does the addition of an image to a web page increase traffic to that page? 4) Could the total value of cost savings and increased traffic on Wikipedia be calculated by reference to the characteristics of our sample of Wikipedia pages?
Influenced by the unpublished attempt of Abhishek Nagaraj to value images of baseball and a basketball players on Wikipedia, 51 we addressed our first research question by identifying the pages of 362 authors who had at least one bestseller on the New York Times bestseller's list from 1895-1969. 52 These authors were born between 1829 and 1942, and constituted a wide mix subjects. In the United States, all works published before 1923 are in the public domain, 53 so one group of authors could be represented only by a public domain image (those who died before 1923), while a second group could only be represented by a copyright-eligible photo (those born after 1923), and a third group could be represented by either a public domain or protected image (the subset of authors whose lives spanned the 1923 date). If author age were the only relevant factor, one might expect authors with earlier birth dates to have fewer images. After all, photographs disappear over time, so the more recent authors should have the highest percentage of images. We propose the opposite: The older the author, the more likely a public domain image will be available and the more likely an image will be used for the subject.
To this end we collected data on the birth and death dates of each author; the year of his or her first bestseller; the number of his or her bestsellers; the date (if any) an image of the author was added to his or her Wikipedia page; the source of the image; the legal status of the image; and the legal justification offered by the web page builder for the use of the image.
As a measure of potential costs saved by the availability of public domain images, we searched for all the photographs we found on the two largest photo licensing agencies, Corbis 54 and Getty Images, 55 and calculated the average licensing fee they charged for digital copies of photos which could be obtained freely from other sources.
In order to measure the value of potential increases in viewership due to image presence, we also counted the number isolate the effect of image presence on traffic, we also collected data on changes in word count in all authors' pages from June 2009 to June 2014. We also counted the number of book reviews for each author's mostreviewed book on Amazon.com.
To augment our findings for author web pages, we collected much of the same data for 792 composers and lyricists. Finally, we used the Wikipedia random page search function to generate a list of 300 random web pages in order to estimate image use and traffic on Wikipedia as a whole for the purposes of extrapolating the findings from our research on the author, composer, and lyricist web pages.

III. VALUING PUBLIC DOMAIN IMAGES ON WIKIPEDIA: FINDINGS
We discuss below the answers to our four primary research questions. We find that the existence of a large public domain reservoir of photographs increases the likelihood that a web page will contain an image and then proceed to estimate the value added by those images.

A. The Public Domain and Author Pages
The reservoir of free public domain works increases the likelihood that an author web page will contain an image. This is seen most clearly when considering the birth dates of the authors in our sample. All things being equal, one would assume that authors with earlier birthdates would have relatively fewer images of them on their web pages. After all, a person born in 1830 should be less likely to be represented in a photograph than someone born in 1900. Photography has become cheaper and more popular over time, while the older a photograph, the less likely it is to survive. Our data, however, show the opposite trend in terms of inclusion of photographs on Wikipedia:

Percent with Image on Wiki Page
As the figure above clearly shows, the earlier the author's birth date, the more likely a searcher will find an image of that author on his or her Wikipedia page. The most likely reason for this trend is the reduced availability of public domain images for the newer authors. 56 Only half of the 112 authors born after 1910 have images on their Wikipedia pages. The image shortage almost certainly does not stem from a lack of photos of more recent authors, but rather from higher acquisition costs associated with the copyrighted status of the later pool of photos.
Using the date of death of our sample authors, rather than their date of birth, reveals much the same trend.

Figure 3
Page builders' reliance on the public domain is borne out by the legal status of the photos used on the author Wikipedia pages. Wikipedia requires that page builders document the source of each image and provide a legal justification of its use. 57 The vast majority are public domain images, although some fair use is claimed. 58  domain images in the sense we use the term-works that may be freely used by Wikipedia page builders. 59

Figure 4
Web page builders typically justify their use of an image in five different ways. Figure 5 59 About 10% of the images we found on the author's pages were used freely by permission of an author who has used a Creative Commons license. In such a case, the author technically retains title but grants permission to the world to freely use the image under certain circumstances, for example, with attribution given to the author. Approximately 75% of images in the Wikimedia library-an image source frequently used by page builders-are used subject to some sort of Creative Commons license. The other 25% are in the public domain due to the expiration of copyright or the failure of the copyright owner to observe some sort of formality like notice, registration, or renewal.

Percent
Most commonly, the copyright on the image has simply expired (PD-Expiry), 60 while in other cases the person taking the photograph has dedicated it for free public use, usually be referencing a form of Creative Common license (PD-Dedicated). 61 Some page builders take advantage of photographers who fell afoul of U.S. formalities that at one time required authors to register or renew their works or publish them with certain notice requirements (PD-Other). 62 Within the smaller realm of copyrighted images, the page builders typically claim fair use or obtain permission from the rights holder, otherwise they are not supposed to use a copyrighted image at all. The existence of a large and vibrant public domain clearly increases the number of images available on author web pages. Data from random page searches supports this conclusion. Fifty percent of 300 pages collected through Wikipedia's random search function contained images. 63 Approximately 87% of the time, web builders cited the public domain as the source of an image. Approximately 8% of the time, the web builder relied on fair use of a copyrighted image, while 5% of the pages contained both copyrighted and public domain images.

B. Costs Saved by Page Builders
Web page builders on Wikipedia save a significant amount of resources by using free public domain images. Sixty-six percent (240/362) of the author Wikipedia pages sampled contained images of the author, and 79% of those images were in the public domain. The cost savings to page builders can be estimated by examining the prices for equivalent photos charged by the two largest licensors of images to web pages: Corbis (library of 100 million images) and Getty Images (library of 80 million images). 64 Both Corbis and Getty license images of many of 60 See supra note 49. 61 62 the authors in this study, and sometimes they license exactly the same public domain image as used by Wikipedia page builders. 65 Based on price information gathered in December 2014, Corbis typically charged $105 per year to license an image of an historically important author for online use for a single year, while Getty regularly charged $117 per image for a year's use on a non-commercial web site. Curiously, in our sample, functionally identical digital versions of more than 10% (25/240) of the public domain images used on author Wikipedia pages are currently being licensed by Corbis or Getty at these rates. For 104 other public domain author images, Corbis or Getty license similar, but not identical, images of the authors. The average charge for all images was approximately $120 per year for online usage. For the tiny slice of Wikipedia that constitutes our sample of historical authors, page builders saved approximately $77,400 over a five-year period (129 public domain images x $120/year x 5 years) over the cost of licensing. Of course, this is presented as a thought exercise, since Wikimedia Foundation's entire annual budget amounts to $45.9 million, 66 and an equivalent commercial information service provider would likely seek to avoid paying full retail price for licensed digital images by negotiating a blanket deal.
Moreover, costs saved by page builders are not a direct measure of the value that an image creates for consumers, but it might serve as a decent proxy. Using the traffic statistics function of Wikipedia, we estimate that our 240 authors with images received approximately 28 million page views in 2014. What was the value to consumers of seeing images on these pages? What would they be willing to pay for an imageenhanced page? If page builders had to obtain licenses from Corbis or Getty to use these images, the total cost for the year 2014 would have been approximately $28,000 (240 x $120/year). This means that the per page view cost would have been about 1/10 of a penny. Would users of Wikipedia be willing to pay a penny for every 10 images they encountered on its web pages? A human subjects experiment might best be able to elicit an answer to this question, but the view from an advertiser's perspective may support the reasonableness of the proxy. We conclude in Part III that the presence of an image increases page views by approximately 19%. If this is correct, then images drove 5,320,000 of our author's page views in 2014. If the WebInDetail estimate of a $.0053 value for each Wikipedia page view is also correct, then the advertising value of the images on our author web pages is $28,196, almost the same as our "cost savings" estimate above. Finally, we note that in a world without public domain photos, Wikipedia might be willing to enter into a blanket licensing agreement with Getty and Corbis which might significantly lower the $120 per year per image. If this hypothetically lower price could be calculated, it might serve as a more accurate proxy for cost savings.

C. Increased Traffic to Wikipedia Pages
The presence of a public domain image is commonly believed to increase traffic to a web page, 67 but the magnitude of the effect is difficult to estimate. In general, author pages with images generate substantially more traffic than pages without images. A number of adjustments were made in order to isolate the effect of the presence of an image from the relative popularity of the authors in the study.

Adjusting for Popularity Based on Amazon Book Review Statistics
Author popularity was measured in terms of the number of reviews for the author's most reviewed book on Amazon.com. More popular books should garner more reviews, and the market response to an author 67  was hypothesized to be a reasonable proxy for public stature. 68 Authors with and without images were grouped in four categories-those with 0-9 reviews, 10-29 reviews, 30-99 reviews, and 100-199 reviews. The results show a very robust increase associated with the presence of an image.

Figure 6
For example, we compared 76 authors with images who had fewer than 9 reviews with 57 authors without images who had fewer than 9 reviews. The author pages with images had on average 75% more page views in March, April, and May of 2014. This large increase associated with image use strikes us as implausible and suggested that the number of Amazon reviews may be a poor proxy for author popularity. Gertrude Stein's most frequently reviewed book, for example, has only 49 reviews on Amazon, while Betty Smith's has 1140. The Amazon adjustment for author popularity suggests that images may have some influence on page traffic, but we decided to employ several more sophisticated matched pairs analyses in order to better distinguish the impact of image presence from the impact of differential author popularity. 68 Using revenue data would be ideal, but those figures are proprietary. Using sales rank on Amazon as a proxy for revenue is made impossible because many of the most popular works of the authors studied are in the public domain and therefore are represented by dozens and sometimes hundreds of different editions on Amazon stymying the estimation of overall sales. See Heald, How Copyright Keeps Works Disappeared, supra note 32, at 840-41.

Matched Pairs Treatment #1 Shows 6% Traffic Increase for Authors
As a more precise measure of popularity, we identified a set of authors whose Wikipedia pages initially received an image after June 1, 2009, and we counted the number of views for these authors' pages for the three months immediately prior to June 1, 2009. 69 Each author was paired with another author of similar popularity whose pages never contained an image. The popularity pairings were based on a Over the five-year period studied, the pages with images saw an increase in traffic of 32%, while the pages without images saw a net increase of only 26%. (The increase in overall traffic on Wikipedia during this time period was 22%) The matched pairs analysis therefore showed a significantly lower net image effect (+6%) than the popularity groupings based on Amazon data presented above.

Matched Pairs Treatment #2 Shows 17% Traffic Increase for Authors
A second matched pairs analysis was conducted to better account for variations in web traffic caused by factors other than the addition of an image to a page. The first set of matched pairs demonstrated substantial volatility in month-to-month web traffic, indicating a variety of exogenous factors might have affected traffic levels. For example, in April 2009, Vladimir Nabokov's page was viewed 41,891 times, while the next month it was viewed 56,552 times. Schools assigning an author's book or the release of a film could easily result in short-term spikes in page visits. As a method of minimizing the impact of external factors, the lowest month of page views for the year preceding June 2009 was identified for each author. The slowest month of traffic for any author was used as a measure of the author's ambient popularity, less likely to be affected by exogenous spikes in interest. As in the earlier matched pairs analysis, authors without images as of June 2009, were selected, and those authors with images added after June 2009 were paired with similar authors whose Wikipedia pages never contained an image. For example, Michael Gold (123 views in lowest month proceeding June 2009) was paired with Harvey Wheeler (123 views in the lowest month during the same period).
The lowest page-view month in the year proceeding June 2009 was compared with the lowest page-view month for the year proceeding June 2014. A comparison of 42 tightly-matched pairs saw an 36% increase in traffic to the author pages containing an image, while traffic to pages without an image increased only 19% over the same five-year period. This matched pairs analysis netted a 17% increase in traffic associated with the presence of an image.

Matched Pairs Treatment #3 Shows 22% Traffic Increase for Composers and Lyricists
The analysis of changes in traffic to author's web pages after making adjustments for relative author popularity netted three different increases: 100%, 17%, and 6%. Although the 17% figure generated by the second matched pairs analysis struck us as the least affected by nonimage-related exogenous factors, we decided to mine a large database of well-known composers and lyricists from approximately the same era as a robustness check and to increase the number of data points. 70 We repeated both of the matched pairs techniques we used with our data set of authors.
We established 77 pairs and compared the number of page views during the period of March, April, and May 2009 before any composer or lyricist page acquired an image, with the number of page views in March, April, and May of 2014, after half of the pages acquired an image. The pairs were very tightly matched. Pages that never acquired an image had 209,116 aggregate page views in March, April, and May of 2009, while pages that later acquired an image had 209,294 aggregate pages views over the same three-month period. Between 2009 and 2014, the traffic to pages with images increased 56% while the traffic to pages without images increased only 34%, resulting in a net increase in traffic to pages with images of 22%.
Interestingly, we observed a lower level of month-to-month volatility in this data set and speculate that lower variation in month-tomonth traffic may have been due to the fact that our list of composers and lyricists were less famous. For example, they averaged only half as many page views during March, April, and May of 2014 as did our list of authors. Although the March, April, and May comparisons of page traffic on composer and lyricist web pages showed less volatility than the parallel comparison made on the author web pages, we proceeded to engage in a comparison of the lowest traffic months in 2009 and 2014 that had earlier resulted in the 17% net traffic increase figure for the authors. We were able to assemble 68 tightly matched pairs based on the lowest traffic month for each composer and lyricist in 2009 before any sample page contained an image. 71 Over the five-year period, traffic to pages with images increased 40% while the traffic to pages without images increased only 21%, resulting in a net increase of 19%.

Controlling for Changes in Verbiage
In order to control for the possible effect of increased verbiage on the web pages over the five-year period studied, the number of words present on all author web pages in June 2009 was compared to the number of words present on the same pages in June 2014. The change was virtually identical for the set of web pages with images and without images. Over five years, the pages with images saw an increase in word count of 66% while the pages without images saw an increase in word count of 67%. Any increase in traffic to the web pages with images does not seem to be driven by the growth of text as opposed to the addition of the image. 72

D. Extrapolating the Data to Estimate the Value of Public Domain
Photographs on Wikipedia as Whole The analysis of a sample of 300 Wikipedia pages collected through its random search functions enable us to extrapolate the author, composer, and lyricist data to Wikipedia as a whole. We offer a rough estimate of the total value of public domain photographs on Wikipedia.

Cost Savings on Wikipedia as a Whole
Public domain photographs surely enable all sorts of page builders to add images without incurring the cost of negotiating or paying licensing fees. A random sample we collected of 300 Wikipedia pages shows that 50% contain images, and 87% of those page builders cite the public domain as the source of the image. If the random sample is representative 71 Due to a clerical error, the year-long period measured was 6/2009 to 5/2010, which caused us to have 9 fewer pairs than in the prior analysis which had included 77 pairs. 72 It is possible that that number of links inserted in a page or the frequency of editing also affects web traffic to a page, but we were unable to measure these elements.
of Wikipedia as a whole, 73 then public domain images can be seen on 1,983,609 Wikipedia pages (4,560,201 [total Wikipedia pages as of July 18, 2014] x .50 x .87). Given that Corbis and Getty routinely charge $105 and $117 dollars respectively to license a photographic image for a year on the internet, this suggests a net savings to page builders of between $208 million to $232 million per year. We recognize that these figures are only a proxy for consumer surplus, but as discussed above, 74 it may be a plausible proxy. As we noted, it is based on what the two largest players in the market believe they can extract from consumers, and on a perimaged-viewed basis represents only a fraction of a cent cost per view.
Nonetheless, the estimate is rough for several reasons. In many circumstances, neither Corbis nor Getty may have an appropriate stock photo available for use on a particular page. In that case, the savings accruing to the page builder who uses a satisfactory public domain photo would best be measured in terms of the cost saved by not having to take the photo. This would vary. For example, one of the random pages is about "Netley Heath," a rural location in England. 75 If the page builder can walk out his front door and snap a picture of the heath, then the costs saved by the existence of an easy-to-locate public domain photo would be quite small. On the other hand, if the page builder for "Netley Heath" is in the USA, the savings could be substantially larger.
Additionally, it should be noted, that active photographic opportunities avail themselves most frequently in the context of the 25% percent of Wiki pages about "places," like "Netley Heath" or "Ely Place" 76 or the "Shudehill Interchange" 77 (all pages from the random sample). Images for biographical pages or pages about events are often impossible for a page builder to photograph. People and past events are often not available to be photographed, no matter how much the page builder is willing to spend. Among the random pages, 27% were biographical and 5% were about events in the past (for example the "Taiyo Department Store Fire in 1973" 78 ). For the one-third of Wikipedia pages that consist of biographical or event entries, the costs savings of using a public domain photograph is best estimated in reference to saved licensing fees for existing photos. A final category of random pages, "things," (43% of the total), represent a mixed bag of accessibility to photographers. If one is in north Texas, it would be relatively easy to snap a photo of the "Denton County Transportation Authority." 79 On the other hand, finding a south Asian "Banded Kingfisher" 80 willing to pose for a photograph raises greater difficulties. 81 Whether using a measure based on saved licensing fees or costs saved in locating and shooting photos, we are comfortable with estimating a cost savings in the neighborhood of $208 to $232 million per year based on the saved fees rationale and using that as a rough proxy for consumer surplus.

Increased Traffic Due to Public Domain Images on Wikipedia as a Whole
Estimating the value added by increased traffic due to public domain images is complicated by the difficulty of isolating the effect of author popularity and other exogenous factors from the effect of the addition of an image. Using the number of Amazon reviews to control for author popularity generated an increased traffic figure that seemed extremely high (over 100%), while the month-to-month volatility of the March, April, and May 2009 figures for authors also rendered its 6% finding suspect.
Our final three matched pairs analyses converge much more closely. Using a lowest-month technique, we believe we were able to better control for exogenous effects on page views and we found a traffic increase for authors of 17% and an increase for composers and lyricists of 19%. Furthermore, the March, April, and May comparison for composers and lyricists obtained a similar result, a net 22% increase in traffic. We use the average of these three figures (19%) in estimating the net value of increased traffic to Wikipedia as a whole due to the widespread use of public domain images.
In order to derive a total value for increased traffic associated with the use of public domain images on Wikipedia, we multiply the total number Wikipedia pages by .5 (the percentage of pages in the random sample with images) and then by .87 (the percentage of random pages with images that rely on public domain works). We then calculate the average number of annual page views for each page with an image (18,966) 82 and credit .19 of those views to the presence of the public domain image. Finally, we multiply by the value assigned to a Wikipedia page view by Webindetail.com which finds that Wikipedia is currently averaging 413,270,000 page views per day with an overall unrealized 79 http://en.wikipedia.org/wiki/Denton_County_Transportation_Authority 80 http://en.wikipedia.org/wiki/Banded_kingfisher 81 Of course, the size of licensing fees is probably affected by the work of amateur photographers who make their work available for free. 82 We identified each random page with an image and counted page views for the most recent 90-period and multiplied by four to estimate an annual viewership for each page. The 18,966 figure is the average number of annual views per page. advertising value of $2,210,000. This works out to $.0053 per page view. 83 In total, therefore, we estimated the value of the increased traffic Again, we emphasize that the figure of almost $38 million is a proxy for consumer surplus. It is more directly a measure of a premium that advertisers would be willing to pay due to traffic increases caused by the inclusion of public domain images. Nonetheless, it captures a monetary value that Wikipedia would be able to realize were it willing to accept advertising. Given the free access and non-profit nature of Wikipedia, it is not too fanciful to see that surplus as inuring primarily to its consumers. We note that for some works unconsidered here, advertising revenue might well be the best proxy. Before cable television, programming was monetized exclusively through advertisements sold to those promoting consumer goods. The best way to value a television show during its run in the 1970's would be the advertising revenue it generated. Of course, the cost to consumers would have been spread nearly invisibly in terms of slightly higher prices (plus the value of their time spent watching the commercial, where that cost exceeded the information value of the commercial), but one might fairly impute a consumer willingness to pay.

IV. POLICY IMPLICATIONS
Public domain photos on Wikipedia clearly increase net social welfare under either a cost-savings measure ($208-$232 million) or a traffic-increasing measure ($38 million). We believe, however, that it is proper to aggregate the two figures to better estimate a net value for the images. Our estimate of cost savings is based on a per-website license price charged by Corbis and Getty Images. Since neither Corbis nor Getty adjust their price to account for web traffic, the savings we hypothesize do not vary with the number of page views. The savings are the same whether 100 or 100 million people visit an author site. This means that the value of any increased traffic is independent of the costs savings and can be added to it in our estimate of the overall value of public domain photographs on Wikipedia, resulting in a final estimate of between $246 and $270 million dollars per year. 84 Since the primary purpose of this paper has been to demonstrate one possible method for estimating the value of public domain works, we will only briefly note some possible policy implications. First, we believe that the time has come for evidence-based policymaking in U.S. copyright law. The Intellectual Property Office of the United Kingdom has already endorsed the Hargreaves Report which concludes that no further changes to U.K. intellectual property law should be made in the absence of sound empirical evidence. 85 Econometric tools exist to help inform legislators of the social welfare effects of copyright legislation they consider enacting, and the time has come for a more objective legislative process to emerge. 84 Imagine a 20-store bakery chain that invents a process to make a new and delicious gluten-free donut. The invention has two benefits for the bakery. First, it allows the chain to avoid the state gluten tax, which saves it $1000 per year per store. Second, the new donuts are delicious and revenues increase by $1 million per year. The value of the invention to the bakery chain is $20,000 + $1 million per year. On the other hand, if the bakery's savings varied with the number of extra donuts sold, then aggregating the two figures would not be appropriate. For example, if the invention makes it $1 cheaper to make each donut, which enables the bakeries to lower their prices and sell more donuts, one cannot merely add the increased revenues to the cost savings and calculate the value of the invention to the bakery chain's bottom line because the total savings are a function of the number of extra donuts sold. 85 See HARGREAVES REVIEW OF INTELLECTUAL PROPERTY LAW AND GROWTH: GOVERNMENT RESPONSE 3 (Aug. 1, 2011) ('Fundamentally, the Government agrees with not only the Review's headline conclusion but also with its underlying critique: too many past decisions on IP have been supported by poor evidence, or indeed poorly supported by evidence."), available at https://www.gov.uk/government/publications/hargreaves-review-of-intellectualproperty-and-growth-government-response. See also Ian Hargreaves, DIGITAL OPPORTUNITY: A REVIEW OF INTELLECTUAL PROPERTY AND GROWTH 8 (May 2011) ("Government should ensure that development of the IP System is driven as far as possible by objective evidence. Policy should balance measurable economic objectives against social goals and potential benefits for rights holders against impacts on consumers and other interests. These concerns will be of particular importance in assessing future claims to extend rights or in determining desirable limits to rights."), available at https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/32 563/ipreview-finalreport.pdf.
More specifically, our study suggests that massive social harm was done by the most recent copyright term extension that has prevented millions of works from falling into the public domain since 1998. 86 Public domain works have a quantifiable monetary value which can be used to estimate consumer surplus. As long as the transition to public domain status does not lessen the availability of works to the public, then we find no economic justification for further retroactive extensions of copyright protection for existing works.
Finally, we conclude that our study provides a strong justification for the enactment of orphan works legislation that has languished in Congress for years. 87 The proposed legislation would limit remedies to injunctive relief and a reasonable royalty when the unauthorized user of a copyrighted work cannot locate its owner after engaging in a diligent search. Similar legislation has already been passed in the U.K., freeing access to an estimated 91 million works. 88 Orphan works are works that are technically protected by copyright, but their owners are unfindable by ordinary means. Around the world, the difficulty of sourcing photographs presents the most urgent case for orphan works reform. 89 Even a cursory examination of photographs in older books and magazines reveals the problem. Often no credit is given at all to a photograph or the photographer listed is long dead or cannot be located. Even when a copyright is clearly claimed, it is often impossible for a potential user/licensee to determine whether it was properly renewed and is therefore still protected. Many, probably the vast majority, of copyrights in photographs were never registered or renewed at all. Imagine a photograph of President Franklin Roosevelt in a 1935 newspaper. In order to be protected by copyright the photo must have been registered and then renewed in 1963, but the Copyright Office web site does allow users to limit their searches to only photographs. And finding a file entry entitled "Franklin Roosevelt" on the web site does not ensure that the newspaper photo sought to be used is the one referenced in the registration. Even a visit to the copyright office in Washington, D.C. (which will be necessary because renewal records are not online) will not