Improving network response times using social information
Publication Type:Journal Article
Source:Social Network Analysis and Mining, Springer Wien, p.1-12 (2012)
Social networks and discussion boards have become significant outlets for people to communicate and freely express their opinions. Although the social networks themselves are usually well-provisioned, the participating users frequently point to external links in order to substantiate their discussions. Unfortunately, the heavy traffic load suddenly imposed on these externally linked websites makes them unresponsive, leading to the “flash crowd effect.” Flash crowds present a real challenge as their intensity and occurrence times are impossible to predict. Moreover, most present-day web hosting servers and caching systems, although increasingly capable, are designed to handle a nominal load of requests before they become unresponsive due to limited bandwidth or the processing power allocated to the hosting site. In this paper, we quantify the prevalence of flash crowd events for a popular social discussion board (Digg). Using PlanetLab, we measured the response times of 1,289 unique popular websites and verified that 89 % of the popular URLs suffered variations in their response times. In an effort to identify flash crowds in advance, we evaluated and compared traffic forecasting mechanisms. We showed that predicting network traffic using network measurements has very limited success and cannot be used for large-scale prediction. However, by analyzing the content and structure of social discussions, we were able to accurately forecast popularity for 86 % of the websites within 5 min of a story’s submission and for 95% of the sites when more social content (5 h worth) became available. Our work indicates that we can effectively leverage social activity to forecast network events when it would otherwise be infeasible to anticipate them.