The Huanan Seafood Wholesale Market in Wuhan was the early epicenter of the COVID-19 pandemic

Understanding how severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in 2019 is critical to preventing zoonotic outbreaks before they become the next pandemic. The Huanan Seafood Wholesale Market in Wuhan, China, was identified as a likely source of cases in early reports but later this conclusion became controversial. We show the earliest known COVID-19 cases from December 2019, including those without reported direct links, were geographically centered on this market. We report that live SARS-CoV-2 susceptible mammals were sold at the market in late 2019 and, within the market, SARS-CoV-2-positive environmental samples were spatially associated with vendors selling live mammals. While there is insufficient evidence to define upstream events, and exact circumstances remain obscure, our analyses indicate that the emergence of SARS-CoV-2 occurred via the live wildlife trade in China, and show that the Huanan market was the epicenter of the COVID-19 pandemic.

influence to outliers like those that can be seen in fig. S8. The lineage A cases were looked into both as a pair and as a single case because the non-hotel case is known to not be subject to ascertainment bias as it was recognized by Dr. Zhang Jixian at Xinhua Hospital before any notion of a connection of COVID-19 to the Huanan market existed (5) (see Lineage 'A' and 'B' case locations , below).
For the age-matched Wuhan population density data, the weighted median haversine distance to the Huanan market was calculated, with the weights assigned to each point being the pixel value of each point, which represents the population density for the 100 m by 100 m square with that point at its center. We also calculated the median haversine distance of the 737 Weibo help seekers from 8 January to 10 February 2020 (26).
To test whether the December cases were closer to the Huanan market than expected, null distributions were generated from the population density data and the Weibo data, with sample sizes matching the six groups of cases listed above for the December cases. For the population density data, sampling weights were assigned to each point, the weight being the pixel value, such that points representing higher population densities had a correspondingly higher probability of being sampled. For the population density nulls, 5.75% of the cases were sampled from the 80 and older age group population density data, 12.64% were sampled from the 70-80 age group population density data, 23.56% were sampled from the 60-70 age group population density data, 25.29% were sampled from the 50-60 age group population density data, 21.84% were sampled from 40-50 age group population density data, 7.47% were sampled from the 30-40 age group population density data and 3.45% were sampled from the 20-30 age group population density data. As noted, these percentages correspond to the percentage of December COVID-19 cases in each age group in WHO cases we analyze in this study.
For the lineage B eleven cases, the two lineage A cases and the single lineage A case the sample size was too small to accurately sample the corresponding percentages of cases from each age group population density dataset; an additional population density data set was generated by merging the seven age groups' population density data sets. To merge them, the pixel values from each point were added throughout all the different age groups, before adding each pixel value, they were assigned a weight corresponding to the proportion of cases from this age group that we would expect to sample. This weight was simply the proportion of early December cases that came from that age group. For each point in each pseudoreplicate the Haversine distance to Huanan was calculated, and the median or mean distance to Huanan was calculated for each pseudoreplicate. The median (or mean for n=2) distance between all the early December cases (n=155), the cases epidemiologically linked to the Huanan market (n=35), the cases not epidemiologically linked to the market (n=120), the confirmed cases (n=65), the clinically diagnosed cases (n=79), the lineage B cases linked to the market (n=11), and the lineage A cases (n=2) (as well as one of those lineage A cases analyzed on its own as a special case because it could not possibly have been subject to ascertainment bias) were compared to these null distributions. In the case of the single lineage A case and its corresponding null, instead of comparing median distances of each pseudoreplicate, this single distance was compared to the distance to Huanan from 1,000 single points drawn at random from the population density dataset null created with the methodology described above.
To test whether there was a difference in distance to the market between market-linked versus unlinked cases, we employed a Wilcoxon rank sum test.

Analyses of centering on the Huanan market
As a central tendency measure for groups of locations (e.g. the 155 COVID-19 cases locations from December 2019) we chose the coordinate-wise median latitude and longitude in order to reduce the influence of outliers (for convenience we refer to these as 'center-points'). Center-points were determined for the eight groupings of December COVID-19 cases listed in the section above, as well as for the population density data (worldpop.org) and the Weibo data (26). The Haversine distance between each resulting center-point and Huanan market was obtained. The population density center-point was estimated by taking the weighted median latitude and weighted median longitude of the population density dataset, assigning the pixel values as the weights, while the center-point of the Weibo data (26) was defined by the median of the 737 latitudes and the median of the 737 longitudes.
Significance testing of the distance between the December center-points and the Huanan market compared to the null distributions was based on random samples (with replacement) of 1,000,000 points from the population density data (weighted). The 1,000,000 points were sampled from each population density age group, in the percentages mentioned above, which match the percentage of early December cases in each age group. 1,000,000 points were also sampled with replacement from the Weibo data (26). The Haversine distance to the Huanan market was calculated for each of these 1,000,000 points and these distances were compared to the distance to Huanan from the different center-points of each set of December cases (all cases, Huanan market-linked cases, or Huanan market-unlinked cases). These center-points represent plausible starting points of the COVID-19 epidemic in Wuhan, insofar as the center-point of early cases might reflect the starting point of the epidemic.
To further examine the distribution of December cases in relation to the Huanan market, kernel density estimates (KDEs) were generated for the market-linked cases, unlinked-cases and all cases, to infer a probability density function (PDF) from which the cases could have been drawn (function 'kde', package 'ks' version 1.13.4, R version 4.1.3, using default bandwidth selector). Contours representing specific probability masses (0.5, 0.25, 0.1, 0.05, and 0.01) were inferred and the location of the market compared to these. These contours contain the peak density with an increasingly small proportion of the total probability mass. Thus for the P=0.01 contour, a location drawn at random from the PDF is nearly 100-fold more likely to fall outside this contour than inside it and thus, if it does fall inside it is very unlikely to be this close to the peak density by chance.

December COVID-19 cases sensitivity analyses
We implemented an approach to investigate how robust the results of our geocoding analyses were to possible errors in locations extracted from the WHO mission report maps. As shown in figs. S1 to S7, we were able to reliably ascertain coordinates of points marked on maps in the WHO mision report, likely to within approximately 50 m. To explore how robust the results of our statistical analyses were to uncertainty in geocoded locations, we resampled each point randomly from anywhere within a circle of radius 1000 m, centered on our geocoded location. In other words, for each of the 155 points in our December COVID-19 case location data set, we introduced noise such that its location in sensitivity analyses could be up to 1 km from where we had placed it. We performed 1,000 iterations of adding noise in this way and ran the sensitivity analyses for all cases (n=155), for the Huanan market-linked cases only (n=35) and for the Huanan-market-unlinked cases only (n=120) ( fig. S9).
We further examined the sensitivity of our analyses to both location "noise" and, simultaneously, to missing data. There were nine cases for which we were not able to extract a location, and these were likely from the region closest to the Huanan market, with a high density of plotted (and therefore likely overlapping) locations. To assess the impact of the missing data, we uniformly sampled nine additional locations from the 25% kernel density estimate of the 155 December cases in 1,000 iterations (reflecting the fact that our location extraction procedure likely missed cases in the high-density area) (see fig. S10).
Finally, we generated 1000 replicates sampling 104 cases from the set of 120 cases not linked to the Huanan market, to assess whether our results were robust to mis-assignment of some 'linked' cases as 'not linked' (fig. S11). As described above, we computed the median distance and the center-point distance to the Huanan market for each iteration and compared the distribution of these statistics to critical values of the distributions drawn from the Weibo locations and the age-matched worldpop.org samples. These analyses were performed (i) for the resampled locations, (ii) for the resampled locations with the additional samples representing missing cases, in both cases for all locations, for the locations of the cases linked to the Huanan market, and for the cases not linked to the Huanan market, and (iii) for the 104 sub-sampled cases not linked to the Huanan market.

Tolerance contour analysis
We studied variations in relative risk, r(z) = f(z)/g(z), at each position z, where f(z) is the density of the test distribution, and g(z) is the density of the Weibo data distribution. We explicitly tested the null hypothesis H 0 : r(z) = 1, against an alternative hypothesis of increased relative risk, H 1 : r(z) > 1. Using an asymptotic p-value computation (shown to be more robust than a Monte Carlo approach) (55), we calculated and plotted contours corresponding to significant values of P(z), a pointwise estimate of statistical significance. For bandwidth estimation, we performed least squares cross validation on points within the same rectangular region as in Fig. 1C

Lineage 'A' and 'B' case locations
We linked SARS-CoV-2 genome lineage information ('A' or 'B') (30) and residential location for twelve COVID-19 cases among the 155 from December 2019 for which we extracted locations -11 lineage B (ten linked to the Huanan market and 1 unlinked) and one lineage A ( fig. S8). Because we had earlier determined the location of the single lineage A virus (5) among the thirteen December cases for which genomic information was reported by the WHO mission report (7) (of which we were able to link twelve to particular cases) we were certain that the remaining eleven cases had been infected by lineage B. Location information was also available for one additional lineage A case with COVID-19 onset in 2019: virus 'Wuhan/WH04/2020'. This individual had stayed at a hotel near the Huanan market for the five days before fever onset (32). These are the two earliest-onset lineage A cases currently reported, and the eleven lineage B genomes are among the twelve earliest lineage B genomes. To investigate the distribution of hotels near the Huanan market, we plotted the distance from the market to the twenty nearest hotels using Bing Maps (table S1) and found that all were closer than 500 m. However, to be conservative, we assigned a distance-to-market for the hotel lineage A case of 2.31 km, equal to that of the other lineage A case in our data set. We treated this as comparable to a distance from a residential location to the Huanan market given the timing of the stay prior to symptom onset.

Robustness of statistical test results to possible ascertainment bias
To test the robustness of our results to the possibility of ascertainment bias, we took the following approach. For December 2019 COVID-19 cases (both 'all cases' and 'unlinked cases') we tested how many cases, starting with the one closest to the Huanan market and proceeding to the next closest, could be eliminated before losing statistical significance at the α=0.05 level. Null distributions were generated from the population density data sets for the different age groups. Each age group population density data set was sampled in the same proportion as the ages of the early December cases reported by WHO mission report. The new median distance to Huanan after removing the cases nearest to the market was tested against these null distributions to obtain p-values. For the 'all cases' data set (n=155) the 98 nearest cases to the market could be removed without losing statistical significance (p=0.024). Significance was lost when removing the 99 nearest cases to the market (p=0.069). For the 'unlinked cases' (n=120) the 81 cases nearest to the market could be removed without losing statistical significance (p=0.045). Significance was lost when removing the 82 nearest cases to the market (p=0.088). The center-point of the cases was also calculated after removing the nearest cases to Huanan, from the 155 cases up to 38 cases could be removed without losing statistical significance (p=0.048), significance is lost when removing 39 cases (p=0.052). For the 120 cases unlinked to the market up to 36 cases can be removed without losing statistical significance (p=0.05), significance is lost when removing 37 cases (p=0.051).

Floorplan of the Huanan market.
Two maps of the internal arrangement of stalls in the Huanan Market are available: The CCDC map (12) (data S1) and the WHO map (7). While the two maps agree on key features, including the general arrangement of the stalls, neither map is drawn to scale. The internal location of stalls in the Huanan Market for our study was discerned through detailed analysis of satellite photographs (Google Maps, Google Earth, Baidu Maps), aerial photographs, and other images of the market. Use of Baidu Total View, which displays interactive panoramic images at street level, allowed for construction of an accurate detailed internal map of the locations of individual stalls. The outlines of the eastern and western sections of the market were traced from Google Maps using Adobe Illustrator. Locations of the key internal structural elements of the main areas of the Market, the vertical pillars, were determined from Total View images and other photographs then mapped onto the external walls of the eastern and western sections of the market. The western part of the market is approximately 70 meters wide, with pillars spaced an average of 8.75 meters apart (9 pillars per street), and the pillars along the main market building span about 126 meters, with pillars spaced about 9 meters apart (15 pillars total). Stalls on the north side of the canopy are supported by smaller metal pillars and are spread over the full market width of 70 meters and are each 6 meters deep. Additional stalls and storage units in the West market parking lot were not included, as they were not studied by either the CCDC or the WHO mission report. The eastern part of the market is approximately 92 meters at its widest, and the main building is approximately 90 meters long (supported by a series of 15 pillars). Distances were all confirmed using Google Earth's measurement tool. The panoramic images of Baidu Total View were then used to map the location of individual numbered walkways ("streets") throughout the market; major vehicle passages; and the dimensions of other major structures, such as the canopy in the eastern section of the market.
We also cross-checked our conclusions with an independent researcher's collation covering much of the same information (http://babarlelephant.free-hoster.net/visiting-the-wuhan-seafood-market courtesy of @babarlelephant). The map was then converted into geojson format for spatial analyses.

Analysis of environmental samples in the Huanan market
The location and quantity of positive environmental samples were taken directly from the CCDC map (12) (data S1), with the exception of two positive businesses in the southwest corner of the market that were only noted on the WHO map (7). Locations of known live animal vendors, cases, as well as businesses with no positive environmental samples were obtained from the WHO map (7). Testing of the significance of live animal vendors and/or human SARS-CoV-2 cases on the number of positive environmental samples found in a business was performed using a binomial GLM available in the 'stats' package in R. Distances between businesses were defined as the distance between their respective center-points. Spatial relative risk analysis was performed using the 'sparr' package in R, using linear boundary kernels for edge correction (52), with bandwidth selection performed using least squares cross-validation. Tolerance contours were calculated using the same robust asymptotic approximation used in Fig. 2C (55). Kernel density estimates of positive environmental sample density as well as sampled businesses were performed the same bandwidth identified from cross validation in the relative risk analysis.
Market stalls were assigned by categories of the types of goods sold through integration of several different sources. The WHO mission report(7) identified 10 stalls involved in the trade of live animals (Appendix F, Table 3). We further obtained names and descriptions of stalls from the TianYanCha.com business directory (table S8) based on stall addresses within the market. In many cases, business names and meat products sold were further confirmed through photographic evidence of stalls. Three stalls were identified as involved in domesticated wildlife sales from an official local forestry bureau fine for their registered owners for illegal hedgehog sales in summer 2019 (36). One of these two stalls (street 8, 25) yielded an environmental positive from the interior of a freezer, but was not noted as selling domesticated wildlife in Figure 2 from the "ANIMAL AND ENVIRONMENT STUDIES" section of the WHO mission report (7).

Analysis of human cases in Huanan Seafood Market
The location and timing of human cases was taken from the WHO mission report, with the exception of one case identified by media as well as in recent work. Analysis was performed using the 'sparr' package in R (52). Bandwidth selection for each kernel density estimate was determined by likelihood cross-validation.

Mobility analysis of the Huanan Seafood Market and potential superspreader sites within the city of Wuhan
To estimate the relative amount of intra-urban human traffic to the Huanan Seafood Market compared to other locations within the city of Wuhan, we utilized a location-specific dataset of social media check-ins in the Sina Visitor System as shared by Li et al. 2015 (33). This dataset is based on 1,491,499 individual check-in events across the city of Wuhan from the years 2013-2014 (5-6 years before the start of the COVID-19 pandemic), and 770,521 visits are associated with 312,190 unique user identifiers. We translated location names and categories to English using a Python API for Google Translate (see "Data and code availability"). Of the four markets reported to be selling a significant number of SARS-CoV-2 susceptible animals during 2019 (8), we found there had been 120 visitor check-ins to the Huanan Seafood Market during this time period, at least 81 to the Baishazhou Agricultural Products Market, 4 to Dijiao Flower and Bird Pet World, and 0 to Qiyimen Fresh Market. Compared to other markets across Wuhan (translated categories: 'supermarket', 'Convenience Store/Convenience Store', and 'shopping mall'), we found at least 70 other markets throughout the city of Wuhan that received more visitors than the Huanan Seafood Market (see "Data and code availability") ( fig. S12).
Beyond markets, we found at least 1,676 total locations in Wuhan with more visitors than the Huanan Seafood Market. However, some high traffic locations may be less predisposed to COVID-19 superspreader events or substantial spread over a longer period than others: to further quantify this, we utilized a list of known SARS-CoV-2 superspreader locations/events (34) to further subset the following categories of locations that may serve as potential high-risk locations for superspreader events: 'Residential area', 'College', 'Building', 'shopping mall', 'Hospital', 'Middle school', 'supermarket', 'bar', 'Convenience Store/Convenience Store', 'Sports place', 'Comprehensive Stadium', 'church', 'Temple', 'primary school', 'company'. This subset identified another further 430 locations which may be at higher risk for superspreader events, which received more human visitors than the Huanan Seafood Market. As a fraction of all social media check-ins to the set of 70 markets described above, the Huanan market represented (120/98,146) visits or 0.12%; as a fraction of all social media check-ins to the set of 430 locations similar to those of known superspreader events, the Huanan market represented (120/262,233) or 0.046%. For all four wet markets selling wild animals in Wuhan, these numbers were 206/98,146 (0.21%) and 206/262,233 (0.079%), respectively.
While the potential risk of a location to be the site of an ascertained COVID-19 superspreader event (SSE) undoubtedly depends on many factors beyond number of visitors, there are no reasons to believe the Huanan Seafood Market is at an unusually high risk of a SSE compared with several other locations in Wuhan. COVID-19 cases associated with the Huanan market were not older (and actually leaned slightly younger) than all December 2019 COVID-19 cases on average (7), indicating the market population was not excessively elderly. Further, the main entryways to the market were large and open to the street, indicating a significant degree of airflow through the main thoroughfares.
While the association of social media check-ins and true visitor number likely varies across different types of sites and is likely subject to demographic biases, for the Huanan market to be even a remotely likely random location for a superspreader event within the city of Wuhan would require it to be extremely under-reported in the social media data. The fraction of Huanan market social media visitors out of social media visitors to all markets was 0.12%, or slightly higher than the number of visitors per day officials reported to the WHO mission report (10,000) as a fraction of the general Wuhan population of approximately 11 million (0.09%). Further, the Huanan market specifically received fewer social media visitors than 2 Walmart stores, 2 Carrefour stores, and 1 RT-Mart store, and does not stand out among other large wholesale markets in the city.
Furthermore, in light of the strong evidence for independent introductions of lineages A and B, and the association of both with the Huanan market, these check-in data suggest that independent introductions of these lineages at the same, relatively seldom visited location in Wuhan, would be extremely unlikely to occur by happenstance.

Results comparing to Weibo empirical distribution, and comparing laboratory confirmed vs clinically-diagnosed cases
In addition to comparisons of the spatial distribution of the early COVID-19 cases to the age-matched Wuhan, we further compared these cases with the empirical distribution of Weibo COVID-19 help seekers from, as a proxy for the distribution of cases within the city during the later phase of the pandemic. We found that early COVID-19 cases had a significantly closer median distance to the Huanan market than Weibo help seekers later in the pandemic. This was true considering all cases (median 4.28 km, p<0.001), Huanan-unlinked cases (median 4 km; p<0.001), Huanan-linked cases (median 8.3 km; p<0.031), and the two early lineage A cases (median 1.33 km; p<0.002). Further, the early cases also had significantly closer geographic center-points to the Huanan market than Weibo help seekers later in the pandemic. This was also true considering all cases (center-point distance 1.02 km; p<0.015), Huanan-unlinked cases (center-point distance 0.91 km; p<0.005), and the two early lineage A cases (center-point distance 1.12 km; p<0.009).
The WHO mission report on the origins of SARS-CoV-2 noted that early COVID-19 cases were either laboratory confirmed (by sequencing, PCR, or serology) or clinically confirmed based on clinical characteristics. We further tested cases based on each case definition separately and found that both laboratory confirmed and clinically confirmed cases had significantly closer median and center-point distances to the Huanan market than the Wuhan general population or the Weibo help seekers.
Laboratory-confirmed cases resided a median of 2.91 km from the Huanan market, while clinically-diagnosed cases resided a median of 4.67 km away. Both were closer to the market than the Wuhan population distribution and Weibo distribution (p<0.001 for all four combinations). The center-point of the laboratory-confirmed cases was 0.32 km from the Huanan market, closer than the Wuhan population distribution (p=0.0007) and the Weibo cases (p=0.002). The center-point of the clinically-diagnosed cases was 1.80 km from the Huanan market, closer than the Wuhan population null distribution (p=0.022) and Weibo data (p=0.038) All p-values at the 0.05 significance level were still significant after applying the Benjamini-Hochberg Procedure for multiple hypothesis correction (table S4).

No other location except the Huanan market clearly epidemiologically linked to early COVID-19 cases
No other location in Wuhan was, at the time or retrospectively, identified as being clearly epidemiologically linked to the December 2019 COVID-19 cases (42). For example, a major transit hub, the Hankou Railway Station, is located near the Huanan market; but if mass transit was the route of entry into Wuhan, it would still require the establishment of infection at the Huanan market to be consistent with the results presented here, and, while numerous early COVID-19 cases where detected among workers at the Huanan market, none were reported among workers at this station (or any other major transit hub in Wuhan). Likewise, the Wuhan CDC (WCDC) is very close to the market's location, but there was "no storage nor laboratory activities on CoVs or other bat viruses preceding the outbreak" at that site (7) and no epidemiological evidence of COVID-19 cases linked to the WCDC in December 2019 or earlier (7). Beyond the Huanan market, no other proposed or hypothesized origin has been supported by data (42,56).

Animal screening for SARS-CoV-2 in China
WHO mission members were told (despite known susceptible live animals including raccoon dogs being mentioned in the agreed terms of reference for the study's scope) that no unlicensed or live-trapped wild animals had been for sale at the Huanan market and that "no verified reports of live mammals being sold around 2019 were found" (7). Notably, none of the live (known to be susceptible) mammals from species we identify here as present at the Huanan market in November 2019 have been reported to have been tested for evidence of SARS-CoV-2 infection. The only live mammals 'from' the market among the 188 appear to have been animals such as stray cats, dogs, snakes, rabbits, and mice. Indeed, of the 188 animals tested, 167 were those species or hedgehogs, pigs, chickens, salamanders, crocodiles, turtles, fish or sheep (7). Only 21 individuals from traded mammalian species likely susceptible to SARS-CoV-2 infection were tested -but these were dead, of unknown procurement date, and refrigerated or frozen, and therefore unlikely to be a source of human infection: six "bamboo rat", six "muntjac", six "badger", two "wild boar" and one "weasel" (7). Some of the wild mammals were screened (negative) for active SARS-CoV-2 infection by qRT-PCR, but a very low number have sampled from Hubei province (57). In addition, bamboo rats, porcupines, and wild boar from farms in Hubei supplying the Huanan market, sampled during February and March 2020, also showed no sign of active infection (7). Unfortunately, to our knowledge, no serological testing was conducted on animals from within the market or from farms supplying it, nor on farm workers or animal traders in those supply chains. Indeed, both the Huanan market and the farms in Hubei province that supply it (48) were rapidly shut down before such sampling occurred. Hence, it is apparent that by the time the Huanan market was closed, and animal sampling at the market began, the SARS-CoV-2-susceptible live mammals that had been on sale there in the preceding months (Tables 1 and S5) were no longer present (7).
While >80,000 animal samples were tested and no evidence was found for presence of, or exposure to, SARS-CoV-2 (7), this is far from conclusive evidence for most animals. The negative predictive value of the surveys depends on the assumed prevalence. For serology, for instance, if one assumes a reservoir host in which the virus is endemic, seroprevalence likely is high, but will depend on the age of the animals and infection dynamics in that particular species. In addition, China is a vast country, and the possible sources of animals could be over a long distance. Therefore, the catchment area for surveillance studies is huge (58). Therefore, tracking back along the market supply chain is crucial for a more targeted effort, as recommended in the WHO mission report (7). In addition, it is possible that a one-off spillover event occurred on a farm that supplied the market. Such events would be needles in a haystack that are virtually unresolvable.

A summary of data on the size of the early epidemic within Wuhan
Not until the second week of December 2019 was there a detectable increase in the number of reported ILI-like cases in Wuhan (WHO mission report, Epidemiology, Figure 1). However, even this increase in ILI-cases was consistent with an increase in laboratory confirmed Influenza A and B cases, that occurred primarily in children (see WHO mission report, Epidemiology, Figure 2C) until the last two weeks of December, when an increase in adult ILI-cases was also observed. Consistent with this, retrospective testing of 640 throat swabs collected at the Children's Hospital of Wuhan and Wuhan No. 1 Hospital from Oct 6 2019 to January 21 2020 only identified nine positive SARS-CoV-2 samples, the earliest of which had reported symptom onset of January 4th, 2020 (59). Meanwhile, 22% of these cases tested positive for Influenza. A further 2334 throat swabs from the Wuhan Tongji hospital system and 218 throat swabs from Wuhan Union Hospital, collected in Oct-Dec 2019, were additionally retrospectively tested and found to all be RT-PCR negative for SARS-CoV-2 (see WHO mission report, Review of Stored Biological Samples Testing). Furthermore, travel cases exported from Wuhan can also be used to estimate the number of infections within the city during the early phase of the epidemic. Out of 117 influenza-like illness cases who traveled from Wuhan to Hong Kong from Dec-31-2019 to Jan-21-202, only 1 (0.9%) was positive for SARS-CoV-2, while 36.3% were positive for seasonal influenza (60). Of 2164 blood donations within the city of Wuhan that were collected before January 23 and retrospectively tested for SARS-CoV-2 antibodies, only one sample, from January 20, 2020, was confirmed to be SARS-CoV-2 seropositive (44), providing further evidence that the epidemic was still limited in size by early to mid January 2020. Similarly, 2058 plasma samples collected from patients at Tongji Hospital in Wuhan between July and December 2019 were tested for SARS-CoV-2 antibodies, and all were found to be negative (see WHO mission report, Review of Stored Biological Samples Testing). Excess all-mortality and pneumonia deaths, a lagging but robust signal of COVID-19 cases resilient to possible reporting issues, only exceeded rates of previous years in Wuhan by the week of Jan-14-2020, consistent with an epidemic growing substantially in size only from the end of December 2019, and a 17 day median time from onset of illness to death for COVID-19 (see WHO mission report, Epidemiology, Figure 12).

Additional data related to case ascertainment biases
We investigated the possibility of whether ascertainment biases in case identification could alternatively have shaped the geographic distribution of early cases throughout the city of Wuhan. In particular, three possible ascertainment biases are (i) a direct ascertainment bias towards the Huanan market, where cases were directly diagnosed based on their reported direct contact with the market, (ii) a path-dependent contact tracing ascertainment bias, where cases unlinked to the market were identified primarily through contact tracing of cases linked to the market, and (iii) a geographic ascertainment bias, where cases unlinked to the market were identified primarily through selective reporting by hospitals or by selective case reporting based on residential addresses. First, we note that the strong geographic association between unlinked early cases and the Huanan seafood market, which cannot be explained by the demographics of the city of Wuhan, importantly cannot be explained by a direct market ascertainment bias, as no market link for these cases was ever ascertained. We further identified several data points inconsistent with a significant impact of either contact tracing or geographic ascertainment biases on the early case data, summarized below.
Districts near the Huanan seafood market had elevated seropositivity in a retrospective analysis. If COVID-19 cases were overreported in neighborhoods close to the Huanan seafood market, we would expect a discrepancy between the number of reported cases and seropositivity by neighborhood (neighborhoods around Huanan market would have low seropositivity despite their high reported case numbers). However, a retrospective serosurveillance study (40) in April 2020 reported that the district where the Huanan seafood market resides (Jianghan) and the adjacent districts north of the Yangtze river (Jiang'an and Qiaokou) had higher rates of seropositivity compared to districts south of the Yangtze river (Hongshan, Wuchang, and Qingshan), consistent with the districts adjacent to the Huanan seafood market being characterized by earlier and therefore larger epidemic curves (table S3). Seropositivity by district correlated well with the number of reported cases per capita for each district by March 2020, giving no indication of case overreporting in districts near the Huanan seafood market or underreporting in districts south of the Yangtze river or elsewhere in the city (40) (61) and deaths due to all pneumonia (WHO mission report, Figure 21) were also first elevated in the districts north of the Yangtze river near the Huanan seafood market by mid-January, 2020.
Retrospective ILI-testing for SARS-CoV-2 also identified early patients living near Huanan market. A retrospective analysis of 640 throat swabs collected at two hospitals in Wuhan from Oct 6 2019 to Jan 21, 2020 identified 9 SARS-CoV-2 positive samples (59), with dates of onset Jan 4 to Jan 20th. Seven of these patients lived within the boundaries of Wuhan city, and all 7 of the early JanuarySARS-CoV-2 positive patients lived north of the Yangtze river in Jianghan district or adjacent districts.
Reported cases not directly linked to Huanan market lived significantly closer to the market than linked cases. If cases with no identified link to Huanan market were primarily ascertained through contact tracing of personal contacts of Huanan-linked cases (a path dependent ascertainment bias), they would be expected to have a similar geographic distribution as cases linked to the market. On the contrary, we note that on average they lived closer to the market (4.00 vs 5.74km, p=0.029). This pattern is thus better explained if the primary risk factor for linked cases was their place of work, while the primary risk factor for unlinked cases was proximity to the market and community spread starting at the market.
Publications reporting on early cases do not utilize case criteria dependent upon market links. We identified four studies that first reported on cohorts of cases with timing of onset in December 2019 (table S2). Only one of these papers (62) reported an epidemiological criteria in their case definition. This criterion was secondary to clinical characteristics, as they included patients with only 3/4 clinical characteristics if they also had a link to the Huanan Seafood Market. None of the studies reported case definitions or patient inclusion based on their neighborhood of origin or patient area of residence within the city of Wuhan, and thus none of these case definitions could directly influence a geographic association of officially unlinked cases with the market. Chen et al. (39), describing a cohort of patients moved to Jinyintan Hospital for isolation, reported that "adult patients were admitted centrally to the hospital from the whole of Wuhan without selectivity" [emphasis ours] and also affirmed that their data had been shared with the WHO.
Both clinically confirmed cases and laboratory confirmed cases were geographically centered around the market. If direct or indirect market-associated epidemiological criteria heavily influenced the "clinical diagnosis" case definition used in the WHO mission report, then we might expect that the laboratory confirmed cases would have a strikingly different spatial distribution. However, we have updated our analysis to confirm that cases identified under both definitions are geographically associated with the Huanan seafood market. (See section "Results comparing to Weibo empirical distribution, and comparing laboratory confirmed versus clinically-diagnosed cases", above.)

Both early viral lineages A and B were observed in proximity and directly connected to the Huanan market.
If the majority of sequenced early SARS-CoV-2 cases were subject to a strong path dependency ascertainment bias in relation to the Huanan market, then we would not expect to observe the ancestral viral genotypes in patients and environmental samples with connection to the market. Rather, we would expect to observe these genotypes and their descendants in at least a few patients without connection to the market. We would primarily expect samples with connection to the market to be descendant lineages. However, genomes of SARS-CoV-2 lineages A and B -the two lineages that were likely introduced into humans separately (38) -have been epidemiologically linked to the market, both in this work, and via direct environmental sequencing (24).
Hospitals reporting early cases were distributed throughout the city of Wuhan. One mechanism by which a geographic ascertainment bias could arise is if there was significant selective reporting of cases in hospitals adjacent to the Huanan seafood market, and patients tended only to visit hospitals near their home. However, early cases were reported by several hospitals scattered throughout the city (fig. S11). Some of these hospitals were further from the Huanan market than the mean distance of case residential addresses to the market. In addition, Chen et al. (39) reported that cases were sent to Jinyintan Hospital from the whole of Wuhan "without selectivity". Therefore, the geographic association of the residential addresses of reported cases with the Huanan seafood market was not due to hospital proximity.

On 29 December 2019, the first day public health officials learned of a possible association of COVID-19 cases with the Huanan market, they simultaneously learned of cases not only at Xinhua Hospital, near the Huanan market, but also
Tongji and Union Hospitals, far to the south. There appears to have been no period of time when public health officials could have selectively searched only in hospitals near the market for cases linked to the Huanan market; they knew linked cases were being found further afield as well. Indeed, Wuhan CDC was alerted to the outbreak of unexplained pneumonia and its association with the Huanan market by a vice president of Xinhua Hospital (fairly near the market) whose tipping point appears to have involved learning about Huanan market-linked cases at hospitals many kilometers from the market: "Upon learning of similar patients, also linked to Huanan Market, at Tongji and Union (Xiehe) Hospitals, Xia alerted the Wuhan and Hubei CDCs on 29 December" (5). This is 'all-hospital' search, moreover, is reflected in an internal communication from the Wuhan Municipal Health Commission to all hospitals in Wuhan the very next day, on Dec 30, and in the follow-up announcement on 3 January 2020 for hospitals be on the lookout for unexplained pneumonia in any patient, not just ones with a link to the Huanan market. Accordingly, by 3 January, it is clear that the major hospital in Wuchang district, Zhongnan Hospital, was detecting both Huanan-linked and unlinked cases (5). And even though it was far from the market, of the first trickle of cases identified at this hospital, two were linked to the Huanan market, with the other three being a family cluster. The facts that initial cases at this major hospital, distal from the market (1) lagged considerably behind those at hospitals closer to the Huanan market (7 at Xinhua Hospital by 29 December and 7 at Wuhan Central Hospital Houhu Branch by 28 December (5)) and (2) nevertheless involved large proportion linked to the Huanan market, provide further evidence that the outbreak indeed started at the Huanan market then spread into the wider Wuhan community from there.
All the cases we consider in this study were hospitalized and, therefore, were likely ascertained in hospitals rather than in the community. It is possible that public health officials could have identified previously undiagnosed individuals in hospitals via contact tracing from Huanan-linked cases (i.e., patients that clinicians had not yet identified as unexplained pneumonia cases linked to the outbreak, but which public health officials had traced from other, previously identified, pneumonia patients). However, as pointed out above, the different geographic distribution of cases unlinked to the market (they lived closer to market) than those linked to the market argues persuasively against ascertainment bias in the detection of those unlinked cases; we would expect the unlinked cases to be drawn from a similar geographic distribution as the linked cases if they had been ascertained through contact tracing. So, could those unlinked cases have been detected via biased case-finding involving searching for cases only in neighborhoods near the market but not in other parts of Wuhan? Not likely. Remembering that all the cases were hospitalized and that no diagnostic test was available to identify mild cases it is likely that most or all Huanan market-unlinked cases were ascertained while in hospitals.

Robustness of the December COVID-19 case analyses using center-point and median distances to the Huanan market
We performed three sets of sensitivity analyses for the tests using center-point and median distances to the Huanan market (cfr. Materials and Methods). Adding considerable random noise to the extracted locations results in distance statistics that are similar to those inferred from the actual extracted locations relative to the critical values of the null distribution ( fig.  S13). The same holds true when we add noise as well as missing locations from relatively close to the Huanan market (from the 25% kernel density estimate of the 155 December cases, cfr. Materials and Methods) ( fig. S14). Finally, sub-sampling the cases not linked to the Huanan market also indicates that the test approach is robust to some degree of mis-assignment of cases ( fig. S15).

Supplementary Data Data S1. Translations and URLs for relevant articles and reports
All translations were done using the premium service at TheWordPoint (https://thewordpoint.com/). Originals can be accessed via the original URLs or WayBackMachine URLs provided in the "documents_index.csv". File is available from https://doi.org/10.5281/zenodo.6291868.

Data S2. Business registry screenshots from Tianyancha
File is available from https://doi.org/10.5281/zenodo.6791326. Figure S1. COVID-19 cases in Wuhan in December. This is Fig. 4 on page 148, Annex E, of the WHO mission report.

Figure S11. The locations of key hospitals reporting early COVID-19 patients with respect to the Huanan market.
On the day that local public health authorities learned of an association of unexplained pneumonia cases with individuals connected to the Huanan market, they learned simultaneously about cases not just at hospitals near the market (e.g. Wuhan Central Hospital Houhu Branch and Xinhua Hospital) but also hospitals much further away (e.g. Tongji Hospital and Wuhan Union Hospital) (5). Importantly, major hospitals in districts many kilometers from the Huanan market, just like hospitals close to the market, found that their earliest cases (up to 3 January 2020) were dominated by individuals linked to the Huanan market, strong evidence that the virus was not cryptically widespread throughout the city but that cases were still, at that point, largely connected to the Huanan market (5).

Figure S12. Effect of elimination of cases nearest the market on statistical results.
To test the possibility of ascertainment bias, the nearest cases to the market were removed consecutively and the distance from center-point of the remaining cases to the Huanan market ("Center-point_distance"), and median distance of the remaining cases to the Huanan market ("Median_distance") were subsequently calculated. The x-axis indicates the number of cases closest to the market that have been removed from the dataset. The y-axis indicates the resulting median distance to the Huanan market and center-point distance to the market in kilometers. The center-point null plotted above is the distance between the Huanan market and the center-point of the 1,000,000 points sampled from the age-matched, weighted population density dataset. The median null distribution was generated by sampling the corresponding number of points shown in the x axis (155, 154, 153, and so on) from the population density dataset and calculating the median distance to the Huanan market. A: all cases. B: unlinked cases only.          Methodology: Business names were obtained from the TianYanCha.com business directory (see Table S8) based on addresses. In many cases, business names were further confirmed through photographic evidence of stalls. Two stalls were identified as involved in domesticated wildlife sales from an official local forestry bureau fine for their registered owners for illegal hedgehog sales in summer 2019 (36). † The "Operators" of these businesses listed in TianYanCha.com business directory were fined for illegal hedgehog sales in summer 2019, in an official government report (36) † † Reported in WHO mission report Appendix F, Table 3, contains "game" in the business name, lists game retail as a service in the business directory, and was fined for illegal hedgehog sales in 2019 (see †) (7). † † † Store selling meat products inferred from photographic evidence.