Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Revision Jenny 20100310 When is a sampled network a good enough descriptor for epidemic predictions? Jenny Lennartsson1, Annie Jonsson1, Nina Håkansson1 and Uno Wennergren2,* 1 Systems Biology Research Centre, Skövde University, Box 408, 541 28, Skövde, Sweden 2 IFM Theory and Modeling, Linköping University, 581 83 Linköping, Sweden *Corresponding author ABSTRACT The lack of complete data sets can be a limitation when using network analysis. In this paper, we analyse how the link density affect the property of a network. The methodology is general yet it is applied onto the property spread of disease. We used networks with weighted links to run scenarios with the assumptions of distance dependent probabilities of transmission, and compared it with scenarios with randomly drawn probabilities of transmission, i. e. non-distance dependence. In these scenarios, we also included link sampling procedures equivalent to the two transmission scenarios, one that samples by distance dependence and one at random. We could then study the effect of link density in sampled networks on spread of disease under different link sampling procedures and transmission scenarios. We conclude that under the assumption of distance dependence for both link sampling procedure and disease transmission, predictions about the extent of an epidemic can be drawn from a network even with low link density yet the density is higher than in most empirical studies. In reality both sample procedures and disease transmissions don’t perfectly fit a distance dependency. Our results show how this enforces a yet higher level of link density in sampled networks to achieve reasonable predictions. Keywords: network, missing links, link density, infectious diseases, disease transmission 1 Revision Jenny 20100310 1. INTRODUCTION The use and interest of network analysis has been growing in many different scientific areas during the last decade, for example in biology, epidemiology, economy and social science. A network consists of interacting units, here denoted nodes, and these units connect to each other through relations we term links. Nodes could for example be individual animals or persons, animal holdings, species habitats, or schools and links could be visits, animal transports or links between web pages. The pattern with links between the nodes gives rise to networks with different contact structures. These structures depend on both the amount of links and how these are organized. Here we classify networks into three categories. First, the complete network (Wasserman & Faust 1994) (figure 1a) where all theoretical possible links are included, second the real world network (figure 1b) where all realizations of links during a specified time period are included, and thirdly the sampled network (figure 1c) where all the measured links during the sample period are included. In addition to estimate link structure of networks, one can measure probabilities of occurrence or disease transmission per time unity for the individual links. The network is then called a weighted network (Barrat et al. 2004). In contrast to classical epidemiological models such as SI/SIR/SEIR models, network models relax the assumption of homogeneous mixing (mass-action type of assumptions) since not all nodes are linked to all other nodes or the links are weighted for example depending on distance. Network analysis on the other hand requires huge amounts of data, which is possible to handle by the computational power today. The sampled network can be estimated through sample surveys, literature studies, contact tracing, or by databases such as national databases for animal movements. The estimation is cumbersome and one may expect that estimated networks most probably will lack some links and even some of the nodes (Clauset et al. 2008). The real world network is the network one would like to find but one has to represent it with the sampled network. Hence, there is a need to evaluate the effect of missing links to assess or possibly reduce errors when networks are applied. The focus in this study is on how the number of links in sampled networks affects predictions on epidemical size. We simulate spread of disease in networks with different link densities and different scenarios that mimic sampling procedures. A real world weighted contact network consists of all contacts with probabilities during a specific time period. Another time period is yet another event with its specific contacts and may very well result in a different network with another set of links. The question then arises whether the properties of the two real networks will differ or not? Furthermore, will the properties of the sampled networks for these two events differ? May a property, as spread of disease, of the first sampled network be valid as an approximation of the spread of disease during the second time period? It is obvious that a too short period will result in a bad approximation of the two events. A measured period with very large time frame on the other hand will result in a nearly complete network and will be an almost perfect approximation. Beyond reality is the infinite sampling procedure that results in a complete network where all links exist and has its specified probabilities. Somewhere in between the short time period and the excessive sampling, there is a sampled network with enough links and sufficiently estimated probabilities to generate an adequate approximation. In the present study, we will focus on networks with weighted links since one may expect higher probability of contact for some links compared to others. We will run scenarios with the assumptions of distance dependent probabilities and compare with scenarios with randomly drawn probabilities. The distance dependence is tested for disease transmission and link sampling separately and in combination. In a worst-case scenario, there is a mismatch, with distance dependent transmission probabilities yet random link sampling procedure. We thus combine our study on the necessary amount of measured links with how such an amount depends on the mismatch between real world 2 Revision Jenny 20100310 network and sampling procedure. We will refer to networks in veterinary medicine, with spread of infectious diseases among animal holdings, since network analysis is an increasingly applied tool in this area (Barthélemy et al. 2005; Ortiz-Pelaez et al. 2006). The potential use of network analysis and modeling in veterinary medicine is to predict spread of disease and epidemic size and to examine effects of different intervention methods such as vaccination, stand still and stamping out. For example, Corner et al (2003) studied the transmission of Mycobacterium bovis among a network of wild brushtail possums and the social contacts between them. In another study, Kiss et al. (2006) analysed networks of sheep movements within Great Britain. They showed that during an epidemic it is most efficient to concentrate control interventions to highly connected nodes. Despite the increased use of network analysis in epidemiology, there are shortcomings in the analysis of missing links as well as how to represent a structure given a single sample. Collected network data is often incomplete (Christley et al. 2005; Ortiz-Pelaez et al. 2006; Clauset et al. 2008; Heath et al. 2008; Eames et al. 2009; Guimerà & Sales-Pardo 2009). It could for example be missing animal movements or unknown locations of herds in databases. Accordingly Perkins et al. (2009) demonstrated that network structures are only approximations of contacts and that it is almost impossible to identify all contacts when collecting data. Depending on the structure of the network, properties as spread of disease can vary (Newman et al. 2001; Keeling 2005; Shirley & Rushton 2005; Kiss et al. 2006). Since disease transmission depends on the networks structure, results based on networks with missing links may be misleading. In practice, this means that there will be problems with missing data resulting in lost links in the representation of a network. These lost links is the result of errors during the sampling period or a consequence of the finite length of the sampling period. Guimerà and Sales-Pardo (2009) introduce a method to use a single measure of a network, a sampled network, to generate a more correct representation, i. e. an approximation of the real world network. Their method focuses on a reduced network because of errors during sampling. By measuring and classifying the structure of the sampled network, they could identify either missing or spurious links. In our study, the focus is more general and handles the relation between link density and estimates of properties as spread of disease and specific network measures. Yet our result may combine with their study by stressing when to expect missing links. During a survey to achieve a sampled network, it is important to consider the time window of the sampling period. For example, Kao et al. (2007) studied the relation between UK livestock movement network and disease dynamics over different time-scales. They simulated transmission of two diseases, foot-and-mouth disease and scrapie, which have very different time-scales regarding incubation time as well as infectious period. They concluded that for network analysis to be a valuable tool in epidemiological modeling, it is important to consider the time-scale as well as the potentially infectious contacts. In another study, Robinson et al. (2007) investigated animal movement networks evolving over time in Great Britain and their findings point out the importance of temporal scale. With increased time-period, the networks became more and more connected and in that way fueled the disease transmission. They also found a seasonal pattern with a peak in spring and August. Thus depending on the question to be examined or when comparing different networks it is important to choose the appropriate temporal scale (Vernon & Keeling 2009). Our study will relate this time window problem with the difficulty of achieving enough links during a chosen period. 2. METHODS 2.1 The model Figure 2 illustrates the network generation and simulation process. The first step is the placement of animal holdings in the landscape and the second is the link forming processes. Link forming is here related to empirical sampling. The third step is the disease transmission with simulation runs. The 3 Revision Jenny 20100310 network generation algorithm, simulations and calculations were implemented and run in MATLAB (version R2009a). 2.1.1 Landscape of animal holdings The number of animal holdings was arbitrarily set to 500 and these were randomly placed into a landscape of size 34 x 34. See figure 2. The holding density was chosen according to farm density in southern Sweden. Each animal holding was considered as a node, which implies that each animal was not individually modeled. 2.1.2 Link sampling process and link density Animal holdings were connected by two different processes, either by distance dependence (eq. 1 and Håkansson et al. 2010; Lindström et al. 2008) between the holdings, Dl, or completely at random, Rl. P(lij ) Ke dij b a (1) The probability that a link between node i and j is formed is P(lij) and di,j is the Euclidian distance between holding i and j. Parameters a and b are set by the parameters: kurtosis, к, and standard deviation, σ (see Lindström et al. 2008). The constant K normalized the distribution such that the probabilities of all possible links summed to one. For distance dependent link sampling, Dl, we use a kurtosis value of 10/3, meaning an exponential distribution, and a standard deviation of one. The links were randomly sampled from this probability distribution (eq. 1), one at time until the desired link density was achieved. Since stochasticity was included in this method links between holdings that were more distant from each other can also be sampled, even if a low link density was used. For random distribution of links, Rl, the links were drawn one at time, with the same probability for all links to be sampled. To avoid edge effects, periodic boundaries were used (Lindström et al. 2008) along the edges of the 34x34 landscape. Link density is the actual connections in the network as a proportion of all theoretical possible links in the network (Wasserman & Faust 1994, eq. 2). Cn nn 1 2 (2) Here n is the number of animal holdings in the network. In the simulations we varied link density between 0.001 to 1.0. A link density of 1.0 means a complete network (figure 1a) where all theoretical connections are included. Because the link density of the networks was set when generating the networks, also the mean link degree was given from start, Table 1. 2.1.3 Disease transmission As for the link sampling process we assume two different processes for the transmission probabilities of a disease, one distance dependent, Dt, and one completely random, Rt. The two processes could represent two diseases with different behaviour. Transmission rates were determined with the same processes as for the link sampling, section 2.1.2. Hence, Dt is set by equation 1 and the same parameter values as Dl and transmission probabilities of Rt arbitrary set to 0.01. 2.1.4 Model scenarios The combination of two link sampling processes and two disease transmission processes leads to four different scenarios: DlDt, DlRt, RlDt and RlRt (figure 2). The RlRt scenario is an example of mass 4 Revision Jenny 20100310 action mixing model (Keeling 2005) that assumes that all links have the same probability of transmitting the disease combined with a matching random link sampling procedure. Matching in the sense of process yet not necessarily matching in occurrence of events, i. e two different realizations of the randomization from the same process. The DlDt scenario, with linking- and transmission probabilities for each link, is the distance dependent scenario where there is match between the process of probability of measure and probability of transmission. The remaining two scenarios, DlRt and RlDt, are combinations where the link sampling procedure does not match the actual process that generates probability of transmission. For example, the RlDt is a scenario where transmission is distance dependent yet the link sampling procedure is random and hence is expected to be a non effective procedure. The link sampling will be random and regardless of distance, some of the first connections detected will have low probabilities while some of the connections with high probability of transmission will not be detected within the sampling time frame. 2.1.5 Simulation model To simulate disease transmission in the sampled networks, we used a general and very simple epidemiological model, where the holdings could be in one of the two phases: susceptible (S) or infectious (I) (eq. 3). dS SI dt dI SI dt (3) Parameter λ is the probability for disease transmission from an infective holding through a link to a susceptible. The variables S(t) and I(t) are the number of holdings in respectively phase, susceptible or infected, at time, t. We did not incorporate incubation time and hence animal holdings in contact with an infected holding could infect other holdings already during next time step. Since a recovery phase was neither included in the model, an infected holding could therefore never turn into the susceptible phase again. That is, an infected holding remained in the infectious phase during the remaining simulation time. Undirected links were used and diseases could thus be transmitted in both directions along the links. Disease transmission could only occur between animal holdings that were connected by a link. Note that the probability of a link in the sampled network is according to Rl or Dl while the probability of transmission is according to Rt or Dt. 2.2 Simulation runs For each link density, table 1, simulations were run separately for all four scenarios, figure 2. 100 different networks were generated per link sampling process, Dl and Rl, and link density (figure 2). For each link density and any of the two link sampling procedures 10 replicates of randomly distributed holdings were created and for each of these landscapes of holdings 10 replicates of networks were made by using one of the two sampling processes, see section 2.1.2. For each of these sampled networks 10 simulations were performed per transmission process, Dt and Rt, by initiating the spread from a randomly picked animal holding. Totally, there were 1000 simulations run per scenario and link density. Simulation time was set to 300 time steps and numbers of infected animal holdings were calculated each time step. 2.3 Analysis To compare the different scenarios and prediction power depending on link density the extent of the spread of disease is analysed as mean number of infected holdings per time step as time until a specified proportion of holdings are infected (here 10%, 50% and 90%). To characterize the networks 5 Revision Jenny 20100310 and to see how a change in link density affects the structure and function of the networks, we used network measures; degree assortativity, clustering coefficient and fragmentation index. Degree assortativity (Newman 2002) measures if nodes with equal degree are connected to each other or not. Values range from minus one to one. A value near one indicates that a higher proportion of holdings with equal degree are linked to each other. Assortativity near minus one means that holdings with different degree have a higher probability of being connected. A value of zero means that the connections between holdings are not dependent of node degree. Clustering coefficient (Watts & Strogatz 1998) for a holding is the number of links that exists between neighbors to that holding, divided by all possible links that could exist between the neighbors. Here we have used the average clustering coefficient for the whole network. This measure ranges between zero and one where one indicates that the network is highly clustered. Fragmentation index (Borgatti 2003; Webb 2005) measures to what extent the network is disconnected and it ranges from zero to one. Low value indicates that the network is highly connected and a high value means that the networks are very fragmented. 3. RESULTS The results shows, that for the scenario with distance dependent link sampling and distance dependent disease transmission (DlDt), a link density of around 0.04 gives the same number of infected animal holdings as for networks with higher proportion of connections (figure 3). Under the assumptions of our model, the results indicate that such low proportions of links in the network could be enough to examine the extent of the disease transmission. The scenario with random link sampling with a distance dependent disease transmission (RlDt) requires a higher link density until a limit where more links have no influence. For the scenarios with random transmission (DlRt and RlRt) the number of infected animal holdings increases with increased link density and no limit was reached. In figure 4, the median values of the 1000 replicates of the simulations of the DlDt scenario is plotted with the first and third quartile on each side on the median line. For a link density as low as 0.001 (figure 4a) the median is one for the whole time period because only in some cases the disease is transmitted to other holdings. When link density increases to 0.01 (figure 4b), the difference between the first and third quartiles also increases. The difference is small in the beginning of the simulation time when few holdings are infected but increase during the time period. If link density increases furthermore, up to 0.02 (figure 4c) and 0.03 (figure 4d), the difference between the first and third quartiles decreases. For link densities from 0.04 and higher (figures 4e and 4f), the shape of the curves and the distances between them is almost the same. Also here the difference between the quartiles is small in the beginning of the simulation time and then increases. In the last part of the simulation time, when almost all holdings are infected, the difference decrease to almost no difference between the first and third quartiles. The time until a given proportion of the holdings were infected differs depending on link sampling scenario as well as on disease transmission scenario (figure 5). Random disease transmission scenario (DlRt and RlRt) require almost the same time to reach a given proportion of infected animal holdings. In addition, they also have a much faster disease spread than with the distance dependent disease transmission scenarios (DlDt and RlDt) (figure 3 and 5). Scenario RlDt, i. e. random link sampling and distance dependent transmission, has the slowest transmission rate of all scenarios. 6 Revision Jenny 20100310 The number of infected holdings at a given link density are compared between the four scenarios (figure 6). At low link densities, all methods gave different results. When link density increases, the two distance dependent disease transmission scenarios (DlDt and RlDt) approach each other. As well as the two random disease transmission scenarios (DlRt and RlRt) did. The higher link density the more similar are the results between the different distance dependent disease transmission scenarios. As mentioned before, using random transmission gives a much faster disease spread than using distance dependent transmission. The average assortativity for the networks depends on the link creation method that is used (figure 7a). Distance dependent link creation ends up with higher values of assortativity compare to random link creation. The networks made by random linking have as expected assortativity around zero for all link densities. The average clustering coefficient for all networks increases with increasing link density (figure 7b). The clustering coefficients for the networks generated by distance dependent link sampling are higher than the values for the networks made by random link creation. When link density increases, the random link sampling approaches the distance dependent link sampling. The networks generated by the random link sampling give clustering coefficients that are equal to the link density in question. Of course, the clustering coefficients for all networks are one when the link density is one, and all animal holdings are connected to each other. The fragmentation index for the networks shows that for both link sampling scenarios the index is close to one when link density is 0.001 (table 2). When link density increases to 0.01, the fragmentation index dramatically decreases. With both link sampling scenarios, the index has reached zero when link density is 0.03 or higher. 4. DISCUSSION Our aim with this study was to investigate the effects of using a network with missing links for predictions of disease spread. We investigated if it was possible to predict anything about the size of the spread of disease with only some proportion of all theoretical possible links realized. Our results showed that a link density of 0.04 gave the same number of infected animal holdings as a higher link density when simulating spread of disease in a scenario where the probability for identifying a link, as well as for disease transmission, is distance dependent (DlDt-scenario). For distance dependent disease transmission in a random link sampled network (RlDt), the numbers of infected animal holdings converge to the same numbers as of DlDt, as expected. However, with random link sampling many more links are needed to reach the rate of DlDt. With random link sampling, there are a larger number of less probable (longer distance) links included than with distance dependent link sampling. For random disease transmission (scenario DlRt and RlRt), the number of infected holdings increased with increased link density. This implies a much higher link density to reach relevant approximations of spread of disease. Below we discuss implications of our results in relation to sampling procedures and the effects of using networks with missing links. Empirical data have showed that only a small fraction of all connections in a network actually takes place (Webb 2006; Eames et al. 2009). When sampling data it is almost impossible to trace all connections between nodes, even if it is a small fraction, and this often leads to incomplete data sets. Therefore, it is important to consider link density when working with network modeling. Comparing simulations in a scenario DlDt network with a link density of 0.04 or higher, to simulations in a complete network, both would result in the same number of infected holdings. This implies that a link density of 0.04 is sufficient and sampling beyond that is unnecessary. Another important issue to consider when using empirical sampled networks is the time window for the sampling period. Using 7 Revision Jenny 20100310 “wrong” time window can lead to missing links or unnecessary sampling. The period used affects how complete the network will be, a longer time window could result in a more connected network than one based on a very short time window. Different studies use different length on the time windows. For example, Kiss et al. (2006) used a 4-week time scale in their study of sheep movements in Great Britain. In another study of animal transports, Robinson and Christley (2007) used periods of 10 weeks. Such studies may very well describe the network during actual time period yet it is not clear whether it can be used for predictions. Combining our results with the time frame of a study emphasize the problem and focuses on number of actually measured links. A shorter period will by definition result in fewer measured links. The question is whether such a sample is enough to test for spread of the disease during a longer period than the measured one. Obviously, in the DlDt case a link density of 0.04 is a guarantee for correct measure of the spread of disease given any time period. If the link density is below 0.02 our results indicate that measure of spread of disease seems erroneous even to short time periods. While a density of 0.03 may hold until a time period of 50-100 time steps, that is when the 0.03 curve diverge from the ones with higher density (see fig 3). These conclusions are only true for a perfect link sampling procedure as the DlDt scenario. On the other hand if the sample procedure is not as perfect as in our DlDt scenario we have to include even more links. Our RlDt scenario is the extreme at the other end, i. e. a complete random link sampling procedure that is not at all related to the probability of contacts. In such a case almost all links has to be sampled even for estimations during short time periods. In real life our link sampling procedures are somewhere in between these two extremes. The better link sampling procedure the fewer links yet below 0.02 is not recommendable for any time period. Of course, this conclusion does depend on our setup: how we model the spread of disease, what distance dependence we have used (eq. 1), and the spatial configuration of holdings. Still our results show that to achieve reliable measures one may have to include a higher link density than expected. Furthermore, our methodology can be applied to any specific system to assess the necessary level of link density. The link density can be achieved by a long single measure or by repetitive sampling procedures. Measured link densities in empirical investigated networks is often only about some per million or just a few percent of the total number of theoretical connections in the networks. An example is Ortiz-Pelaez et al. (2006) who have studied animal movements during the initial phase of the epidemic of foot-and-mouth disease in Great Britain in 2001. Their network has an average link degree of 1.22 and that corresponds to a link density as low as about 0.0019. Also in the Swedish animal transport network a low link density is measured (Nöremark et al, submitted). It is important to remember that the measured contacts in an empirical network only are subsets of realizations of all possible contacts. That means that the number of links in these networks is a subset of the ones that have been realized during the time for data collection. Actually, there are probabilities for a huge number of additional connections but these were not even realized during the current time period. For example when a link density of 0.01 is used in our study, all theoretical connections are possible but only 1% of all of them are realized and these could differ between the replicates. Also when modeling virtual networks it is important to consider the link density. Kiss et al. (2005) have used virtual networks with different mean degree in their epidemiological modeling. They have varied the mean degree between 5 and 20 to see how it implies the final epidemic size. These values corresponds to link density values from 0.0025 to 0.01, that is, rather low densities. Despite the study of Kiss et al. (2005), network studies focusing on link densities and missing links are rare and more studies are needed in this field. That a network with random disease transmission will spread diseases faster than spatially clustered network is well known (Watts & Strogatz 1998; Kiss et al. 2005). Such random networks are generated in our study when we apply random transmission probabilities (RlRt and DlRt). For the DlRt scenario with random transmissions and distance dependent link sampling the transmission rate is 8 Revision Jenny 20100310 slightly slower at any given link density. The link sampling procedure of DlRt assumes density dependent contact yet the contact structure is random. In such a case, the rate is higher than the rate of the sampled network since the link sampling procedure will miss some important long distance links. Hence even higher link density is necessary to sample to reach the right levels rate of the spread of disease. Lindström et al (2009) showed that the spatial kernel explaining the distance dependence of contacts between holdings due to transport is a mix of distance independence, mass action mixing, and distance dependence. The mass action mixing is represented by Rt in our setup and once again the reality is somewhere in between these two extremes, the DlRt and the DlDt scenario. Consequently our study of the RlRt and DlRt scenarios implies that the found 0.03 and 0.04 link density levels is expected to be too low since the mass action component in contact structure creates even higher demands of link density. That random networks have a low level of clustering compared to other kind of networks (for example small-world) is recognized (Watts & Strogatz 1998; Shirley & Rushton 2005). We have measured the clustering coefficient of our networks and as expected, the clustering coefficient was lower in the random networks than in the networks generated by distance dependent link sampling. How fragmented a network is influence how well diseases could spread between the holdings. Fragmentation index measures to what extent the networks are disconnected. Here, only the networks with link density below 0.03 resulted in disconnected networks. Link densities of 0.03 or higher give rise to a connected graph and it is then possible for a disease to spread between all animal holdings in the network. Consider that a link density of 0.03 corresponds to an average link degree of almost 7.5, the values of the fragmentation index are reasonable. A disconnection may reduce the spread of the disease immensely. Hence, any disconnection that is apparent after a link sampling procedure should be scrutinized. If the disconnection is a result of the specific realization and hence not necessary to exist in the same manner for any other realization, that is new time period, such disconnection will jeopardize any conclusions drawn from the study. This points out the difference between a network that represent one time period with all its measures and a network that is possible to use for predictions and estimations of rate of any time period yet of the same length. We were interested in how many animal holdings that may become infected and in the rate of the disease transmission. That is why no incubation time is included in the model. This is a simplification since the incubation time for infections differs between diseases. Some diseases have an incubation time of only a few days while it for others may be as long as a couple of years. The model can easily be extended to a more complex model, by including a recovery phase and incubation time. We measured the number of infected animal holdings as a measure of spread of a possible epidemic. In practice, this is perhaps not that relevant because it is not desirable to let the disease transmission go on for such a long time. Instead, it is of course desirable that control strategies are adopted as soon as possible after identifying an infection. Still our study has implications for what link density one ought to achieve when testing for different strategies. 4.1 Conclusion Our results indicate that to estimate properties of networks such as spread of disease one may have to construct link sampling procedures that reach high link densities. More specifically our scenarios show that with a perfect sample procedure it is enough with 0.02 density to estimate spread of disease during shorter time periods while 0.04 is necessary for longer periods. Yet, in reality link sampling procedures are not perfect and we also expect some mass action mixing component in the contacts between holdings. Our results show that these two components of reality enforce an even higher level of link density to achieve a relevant measure on spread of disease. 9 Revision Jenny 20100310 Acknowledgements References Barrat, A., Barthélemy, M., Pastor-Satorras, R. and Vespignani, A., 2004. The architecture of complex weighted networks. PNAS 101, 3747-3752. (doi: 10.1073/pnas.0400087101) Barthélemy, M., Barrat, A., Pastor-Satorras, R. and Vespignani, A., 2005. Dynamic patterns of epidemic outbreaks in complex heterogeneous networks. Journal of Theoretical Biology 235, 275-288. (doi:10.1016/j.jtbi.2005.01.011) Borgatti, S., 2003. The Key Player Problem in Dynamic Social Network Modeling and Analysis: Workshop Summery and papers, R. Breiger, K. Carley, P. Pattison, (Eds). National Academy of Sciences Press. Christley, R.M., Robinson, S.E., Lysons, R. and French, N.P., 2005. Network analysis of cattle movement in Great Britain. Proceedings of the Society for Veterinary Epidemiology and Preventive Medicine (2005), 234-243. Clauset, A., Moore, C. and Newman, M.E.J., 2008. Hierarchical structure and the prediction of missing links in networks. Nature 453, 98-101. (doi:10.1038/nature06830) Corner, L.A.L., Pfeiffer, D.U. and Morris, R.S., 2003. Social-network analysis of Mycobacterium bovis transmission among captive brushtail possums (Trichosurus vulpecula). Preventive Veterinary Medicine 59, 147-167. (doi:10.1016/S0167-5877(03)00075-8) Eames, K.T.D., Read, J.M. and Edmunds, W.J., 2009. Epidemic prediction and control in weighted networks. Epidemics 1, 70-76. (doi:10.1098/rspb.2003.2554) Guimerà, R. & Sales-Pardo, M., 2009. Missing and spurious interactions and the reconstruction of complex networks. PNAS 106, 22073-22078. (doi:10.1073/pnas.0908366106) Heath, M.F., Vernon, M.C. and Webb, C.R., 2008. Construction of networks with intrinsic temporal structure from UK cattle movement data. BMC Veterinary Research 4:11. (doi:10.1186/1746-6148-411) Håkansson, N., Jonsson, A., Lennartsson, J., Lindström, T. and Wennergren, U., 2010. Generating structure specific networks. Accepted for publication in Advances in Complex Systems. Kao, R.R., Green, D.M., Johnson, J. and Kiss, I.Z., 2007. Disease dynamics over very different timescales: foot-and-mouth disease and scrapie on the network of livestock movements in the UK. J. R. Soc. Interface 4, 907-916. (doi:10.1098/rsif.2007.1129) Keeling, M. 2005. The implication of network structure for epidemic dynamics. Theoretical Population Biology 67, 1-8. (doi:10.1016/j.tpb.2004.08.002) Kiss, I.Z., Green, D.M. and Kao, R.R., 2005. Disease contact tracing in random and clustered networks. Pro. R. Soc. B 272, 1407-1414. (doi:10.1098/rspb.2005.3092) 10 Revision Jenny 20100310 Kiss, I.Z., Green, D.M. and Kao, R.R., 2006. The network of sheep movements within Great Britain: network properties and their implications for infectious disease spread. J. R. Soc. Interface 3, 669-677. (doi:10.1098/rsif.2006.0129) Lindström, T., Håkansson, N., Westerberg, L. and Wennergren, U., 2008. Splitting the tail of the displacement kernel shows the unimportance of kurtosis. Ecology 89, 1784-1790. (doi:10.1890/071363.1) Lindström, T., Sisson, S.A., Nöremark, M., Jonsson, A. and Wennergren, U., 2009. Estimation of distance related probability of animal movements between holdings and implications for disease spread modeling. Preventive Veterinary Medicine 91, 85-94. (doi:10.1016/j.prevetmed.2009.05.022) Nöremark, M., Håkansson, N., Sternberg Lewerin, S., Lindberg, A. and Jonsson, A.. Network analysis of cattle and pig movements in Sweden: measures relevant for disease control and risk based surveillance. Submitted to Preventive Veterinary Medicine. Newman, M.E.J., Strogatz, S.H. and Watts, D.J., 2001. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64, 026118. (doi:10.1103/PhysRevE.64.026118) Newman, M. E. J., 2002. Assortative mixing in networks. Phys. Rev. Lett. 89 (20). (doi:10.1103/PhysRevLett.89.208701) Ortiz-Pelaez, A. Pfeiffer, D.U. Soares-Magalhães, R.J. and Guitian, F.J., 2006. Use of social network analysis to characterize the pattern of animal movements in the initial phases of the 2001 foot and mouth disease (FMD) epidemic in the UK. Prev. Vet. Med. 76, 40-55. (doi:10.1016/j.prevetmed.2006.04.007) Perkins, S.E., Cagnacci, F., Straditto, A., Arnoldi, D. and Hudson, P.J., 2009. Comparison of social networks derived from ecological data: implications for inferring infectious disease dynamics. Journal of animal ecology 78, 1015-1022. (doi:10.1111/j.1365-2656.2009.01557.x) Robinson, S.E. and Christley, R.M. 2007. Exploring the role of auction markets in cattle movements within Great Britain. Preventive Veterinary Medicine 81, 21-37. (doi:10.1016/j.prevetmed.2007.04.011) Shirley, M.D.F. and Rushton, S.P. 2005. The impacts of network topology on disease spread. Ecological Complexity 2, 287-299. (doi:10.1016/j.ecocom.2005.04.005) Vernon, M.C. and Keeling, M.J., 2009. Representing the UK´s cattle herd as static and dynamic networks. Proc. R. Soc. B 276, 469-476. (doi:10.1098/rspb.2008.1009) Wasserman , S. and Faust, K., 1994. Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge. Watts, D.J. and Strogatz, S.H., 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 440442. (doi:10.1038/30918) Webb, C.R. 2005. Farm animal networks: unraveling the contact structure of the British sheep population. Preventive Veterinary Medicine 68, 3-17. (doi:10.1016/j.prevetmed.2005.01.003) 11 Revision Jenny 20100310 Webb, C.R., 2006. Investigating the potential spread of infectious diseases of sheep via agricultural shows in Great Britain. Epidemiology and Infection 134, 31-40. (doi:10.1017/S095026880500467X) 12 Revision Jenny 20100310 Table captions Table 1. Link densities used in simulations and the corresponding mean link degree for the networks. mean degree link density (nr of links/node) 0.001 0.005 0.01 0.02 0.03 0.04 0.05 0.07 0.10 0.25 0.50 0.75 1.00 0.250 1.248 2.495 4.990 7.485 9.980 12.48 17.47 24.95 62.38 124,8 187.1 249.5 Table 2. Fragmentation index depending on link density and the link forming method used. link density distance dependence random 0.001 0.005 0.01 0.02 0.03 0.9983 0.9065 0.0385 0.0002 0.0000 0.9981 0.1976 0.0133 0.0001 0.0000 13 Revision Jenny 20100310 Figure captions a) b) c) Figure 1. Network categories: a) complete network, b) real world network, c) sampled network Number of Animal holdings Random Placement Distance Dependent Linking Distance dependent transmission Dl Dt Random Linking Distance Random dependent transmission transmission D l Rt Rl Dt Random transmission R l Rt Figure 2. Flow chart showing the different parts of the model and how these relate to each other. 14 Revision Jenny 20100310 Figure 3. Mean number of infected holdings per time step depending on linking and disease transmission scenarios. Scenarios DlDt (a) and RlDt (b) have distance dependent disease transmission while scenarios DlRt (c) and RlRt (d) have random transmission. With scenarios DlDt (a) and DlRt (c) distance dependent link creation are used. In scenarios RlDt (b) and RlRt (d) random link creation are used. Link densities used: 0.001 (---), 0.005 (…), 0.01 (--.--), 0.02 (__), 0.03 (-○-), 0.04 (-*-), 0.05 (-□), 0.1 (-♦-), 0.25 (-◦-), 0.5 (-▼-), 0.75 (-x-) and 1.0 (-+-). 15 Revision Jenny 20100310 Figure 4. The solid line shows the median number of infected holdings per time step for the DlDt scenario. Dashed lines represent the first and the third quartiles of the replicates. Link densities in the sub graphs: a=0.001, b=0.01, c=0.02, d=0.03, e=0.04 and f=1.0. Notice that the scales of the y-axes are not the same in all sub graphs. 16 Revision Jenny 20100310 Figure 5. Number of time steps until (a) 10%, (b) 50% and (c) 90% of all in the network are infected. The time depends on which of the four scenarios that are used. Dashed line = DlDt , dotted line = RlDt, solid line = DlRt and RlRt = dash-dot line. For scenario RlDt the number of infected holdings did not reach any of the given proportions during the simulation time. 17 Revision Jenny 20100310 Figure 6. Mean number of infected per time step for a given link density and the four scenarios. Dashed line = DlDt , dotted line = RlDt, solid line = DlRt and RlRt = dash-dot line. Here, eight link densities, one at time, are used and compared. Link densities in the sub graphs: a= 0.001, b= 0.01, c= 0.03, d=0.05, e=0.07, f=0.1, g=0.5 and h=1.0. Notice that the scales of the y-axes are not the same in all sub graphs. 18 Revision Jenny 20100310 1 (a) assortativity 0.8 0.6 0.4 0.2 0 0.1 0 0.7 0.6 0.5 0.4 0.3 0.2 0.8 0.9 link density 1 (b) clustering coefficient 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 link density Figure 7. Average (a) assortativity and (b) clustering coefficient for the networks, depending on the way the holdings are connected to each other. Distance dependent linking = dashed line and random linking = solid line. Short title for page headings: Network sampling and epidemic predictions 19