Thursday, 30 January 2020

Forecasting the number of potential coronavirus cases

Chinese health authorities reported a group of cases of viral pneumonia to the World Health Organization (WHO) in late December 2019 and this new virus is now referred to as 2019-nCoV. The number of cases is almost eight thousand and is growing at an alarming rate of 40% per day. A total of 170 people have died as a result of the virus. The coronavirus appears to be contagious before symptoms appear with an incubation period of 10 to 14 days, making it incredibly difficult to identify cases. The number of cases reported by the National Health Commission in China and the WHO are crucial for facilitating decision-making about the virus.  A wikipedia website is dedicated to collecting the timeline statistics for the virus [1]. By plotting the number of confirmed cases it is apparent by eye that the numbers are growing exponentially which is demonstrated by the straight line on the above log-scale chart. If the virus maintains this trajectory, the number of cases could surpass 100,000 by 04-Feb-2020. While this projection demonstrates its potential based on the existing cases, it is expected that a coordinated response will reduce the rate and eventually prevent the progression of the virus. There is no doubt that the situation will likely get worse before the virus is brought under control.

Investigations suggested that the source of the virus is a seafood and animal market in Wuhan, a large city with eleven million inhabitants in eastern China. It was initially believed that people were at risk due to exposure to animal markets, but later developments and the dramatic increase in confirmed cases clearly demonstrates that person-to-person infection is occurring. China has announced the virus can be spread by touching and contact and it is likely that it could also be spread via fine, airborne particles, which are inhaled into the lungs of the recipient [2].

On 23-Jan-2020, the WHO Director-General Tedros Adhanom Ghebreyesus stated “This is an emergency in China, but it is not yet a global health emergency”. Apparently the WHO emergency meeting struggled to reach a decision, explaining that at that time all the deaths, and 575 of the 584 reported illnesses, took place in China.

The response to the threat of the virus has been considerable, especially within China.  The Chinese President, Xi Jinping, told the head of the WHO on 28-Jan-2020 that the new coronavirus is a “devil” and that China is confident of winning the battle against it. China has has issued the largest quarantine in human history with 16 cities in lockdown (approximately 46 million people) in an attempt to contain the virus.  Cases have now been confirmed in 19 countries outside mainland China, including those as far away as the US, Canada, France, Germany and Australia. In recognition of the severity of the situation, the US, UK, European Union and Australia are airlifting their citizens from the city of Wuhan.

On 29-Jan-2020, the WHO announced that the "whole world must take action" and Tedros Adhanom Ghebreyesus, added that the WHO "deeply regrets" referring to the worldwide risk from the virus as "moderate" in three reports last week instead of "high" [3].  The initial decision by the WHO on 23-Jan-2020 to not declare 2019-nCoV a global health emergency seems to have ignored the obvious and alarming trend and demonstrated a remarkable level of optimism that the virus could be managed and contained within China.

The 2019-nCoV virus has a mortality rate of approximately 2% based on confirmed cases making it less deadly in comparison to two previous epidemics. The 2002/03 SARS outbreak (Severe Acute Respiratory Syndrome) started in Guangdong Province of China and killed 774 people out of a total 8,096 infected [4]. The 2012 MERS outbreak (Middle East respiratory syndrome) killed 858 people out of the 2,494 infected [5]. These numbers imply mortality rates of 9.6% and 34.4% for SARA and MERS respectively.

Seasonal flu typically kills 290,000 to 650,000 people per year, according to the WHO [6]. Two-thirds (67%) of seasonal influenza deaths occurred in those greater than or equal to 65 years of age [7]. Unlike seasonal flu, however, there is no vaccine for the new coronavirus, which means that vulnerable members of the population are at risk. Chinese officials have already released the genetic code of the virus,  which will enable scientists to identify its origin, understand potential mutations and most importantly how to protect people against it. While research is underway, estimates for the time taken to obtain a vaccine range from several months to a year. 

Given the concerning increase in confirmed cases, it is important to acknowledge the substantial risk posed by this new coronavirus. History provides some chilling evidence of previous pandemics. In the 20th century, three influenza pandemics occurred: Spanish influenza in 1918 (40–50 million deaths), Asian influenza in 1957 (two million deaths), and Hong Kong influenza in 1968 (one million deaths) [8]. Surely the risk of an emerging pandemic is enough to warrant concerted action. It may finally be time for the WHO to recognise this virus as a high worldwide threat and start coordinating a rapid global response.

[1] Timeline of the Wuhan coronavirus outbreak, Wikipedia. 
[2] How contagious is the Wuhan coronavirus and can you spread it before symptoms start? The Conversation 28-Jan-2020.
[3] Coronavirus: Whole world 'must take action', warns WHO. BBC News 28-Jan-2020.
[4] Severe Acute Respiratory Syndrome (SARS), World Health Organization. 
[5] Middle East Respiratory Syndrome (MERS), World Health Organization.
[6] Influenza (Seasonal),World Health Organization.
[7] Global mortality associated with seasonal influenza epidemics: New burden estimates and predictors from the GLaMOR Project. J Glob Health. 2019 Dec; 9(2): 020421.
[8] Influenza, Wikipedia.

Wednesday, 14 February 2018

A racing heart for St Valentine's day

Happy St Valentine's Day

A racing heart might be a sign of love or fear ...

Using a mathematical model of the electrocardiogram produced by a human heart, we speed up the heart to produce some interesting music.

Synthetic ECG generator: ECGSYN

McSharry PE, Clifford GD, Tarassenko L, Smith L. (2003). A dynamical model for generating synthetic electrocardiogram signals. IEEE Transactions on Biomedical Engineering 50(3): 289-294; March 2003.

Wednesday, 9 November 2016

Forecasting tea productivity using local weather conditions

Tea plantation in the western province of Rwanda.

The global tea market is estimated to be worth about 38.2 billion U.S. dollars in 2016 and is projected to continue growing at 2.8% annually until 2020 [1]. After coffee, tea is Rwanda's second most valuable export where agriculture accounts for one third of economic output [2]. Tea productivity depends on a various factors such as weather, fertiliser and management practices. Hot dry conditions are generally detrimental for tea productivity and therefore daily rainfall and temperatures are important variables. A recent drought in Rwanda, claimed to be the worst in 60 years, has generated losses of up to 40% for some tea producers.

Climate variability is already a concern for the agricultural sector.  Uncertainty about the impact of future climate change and the frequency and severity of weather extremes such as droughts poses substantial challenges for policymakers. Information about the current climate relies on 30-year averages. Rwanda faces a unique challenge due to the genocide in 1994 and consequent disruption in its weather observation network between 1994 and 2009.

Tea is unusual since unlike other agricultural crops, it is grown and produced continuously throughout the year. Whereas other crops are harvested once or twice a year, green tea leaves can be picked daily and brought to the factory to be weighed and processed. Rwanda produces high quality tea because the leaves grow relatively slowly in the country of a thousand hills where tea factories are currently situated at relatively high altitudes ranging from 1757m to 2357m above sea level.

Our study of several tea factories in Rwanda found that using ground-based weather observations was problematic due to missing data, sensor failure, relatively large distances between stations and tea plantations and the existence of microclimates due to the hilly terrain [3]. An alternative is to use satellite imagery as a means of estimating weather variables. Big data is driving innovation in the agriculture sector and the ability to measure both weather and productivity more precisely has great potential [4]. Machine learning techniques were used for feature extraction and the identification of a parsimonious model. Alongside rainfall estimates, a number of additional satellite products were found to be relevant for predicting tea productivity such as surface, soil and root moisture content, vegetation greenness and the enhanced vegetation index.
Performance of tea productivity forecasts using weather information at the factory station and satellite imagery. The performance improves with temporal aggregation and the satellite information provides superior performance.

The forecast evaluation used the coefficient of determination to measure predictability.  Three time scales were considered for aggregating productivity: daily, weekly and monthly.  The results demonstrate that satellite information is superior to ground-based weather estimates for all three time scales and that predictability was highest for the monthly aggregated productivity. The temporal smoothing of productivity is expected to improve predictability since the specific hours and days worked by tea pickers become less relevant.

Monthly forecasts based on satellite imagery explained 64% of the variability on average.
Satellite imagery outperforms local ground weather stations by 33%. The ability to forecast tea productivity using satellite imagery suggests that it should be possible to use similar machine learning approaches to predict the yield of a number of important crops such as maize, beans, rice and potatoes.

[1] Transparency Market Research (2016) "Tea Market - Global Industry Analysis, Trend, Size, Share and Forecast, 2014 - 2020". New York, US.

[2] Rwanda Development Board (2016). Kigali, Rwanda.

[3] McSharry, P.E., Swartz, T. & Spray, J. (2016). "Index based insurance for Rwandan tea". International Growth Centre, London, UK.

[4] Thomas, R. & McSharry, P.E. (2015). "Big Data Revolution". Wiley & Sons, London, UK.

Thursday, 19 November 2015

Using big data to turn global catastrophic risks into opportunities

Big data is currently transforming both the public and private sectors by increasing efficiency, transparency and productivity whilst also promoting sustainability. As the ability to utilise intelligent data analytics distinguishes today’s winners, data is fast becoming the oil of the 21st century. Organisations and countries that manage to harness this new commodity will ensure sustainable economic growth in the same way that those with access to cheap fossil fuel resources have been in an advantageous position in the past.  

The proliferation of mobile technology, wireless sensors, social media and the Internet of Things, provides a means of monitoring socio-economic activity, consumption of resources, transactions, human mobility and environmental change. Recent advances in data science are now capable of coping with the technical challenges of collecting, managing and developing actionable insights from big data. Much of the exciting research has focused on addressing the technical challenges of dealing with the three V’s that define big data (volume, velocity and variety), which is growing at 40% per year (Figure 1). The sheer size and complexity of the data being created by internet devices (Figure 2) implies a need to move beyond simple linear models and embrace sophisticated modelling approaches. Many organisations sit on a treasure chest of data, which when combined with external data will offer enormous potential.  

Measuring and monitoring the UN’s sustainable development goals will require better processes to utilise big data. The UN Statistical Commission has established a global working group to provide strategic vision, direction and coordination of a global programme on Big Data for official statistics. There are numerous challenges ahead that will require multidisciplinary teams to process raw data, extract insights and produce dashboards to enable intelligent decision-making. Fortunately, this revolution has already started in the insurance sector.
Figure 1: Amount of big data created each year.

There are many contenders when it comes to identifying the most threatening global catastrophic risks. Over the centuries, epidemics, earthquakes, floods and windstorms have competed for the position of deadliest disaster. Those with the highest death tolls include the Black Death of 1348 that wiped out up to 60% of Europe’s population and the Spanish Influenza of 1918 that killed between 40 and 100 million people. The costliest catastrophe, with estimated economic losses now exceeding $235 billion, is the earthquake and tsunami that hit Tōhoku, Japan in 2011, resulting in meltdowns at the Fukushima nuclear power plant.  

Reinsurance organizations quantify and compare catastrophic risks in terms of potential financial losses. Since 1987, when AIR Worldwide released the first catastrophe model, reinsurers have benefited from the scientific rigor of catastrophe models to assess risk. The financial losses associated with a particular peril are simulated by combining the hazard, exposure and vulnerability. While impact is clearly important, the frequency of catastrophic events must also be calculated to determine how to develop adequate risk management systems. Big data comprising historical events, crowd-sourced data and computer-simulated output form the ingredients of a CAT model. As the science matures and both practitioners and academics seek to cooperate, the growing need for a collaborative platform has emerged in the form of the Oasis Loss Modelling Framework ( 

There are many opportunities to use big data to improve the assessment and management of global catastrophic risks. At present, risk assessment is largely a backward looking exercise where a catalogue of historical extreme events form the basis of the analysis. In many cases, an assumption is made that the risk has not changed during the historical period. This approach is defensible if the hazard, exposure and vulnerability are not changing over time. In reality, all three can vary and both data and advanced modeling techniques are required to understand the complex interactions.  

Emerging risks, such as terrorism, lack a historical catalogue and forward-looking predictive models are required. Natural disasters such as windstorm and flood are affected by climate change and overreliance on the past may underestimate future risk. Satellites and drones are helping to collect data to better understand exposure and vulnerability. Crowd-sourcing can also be used effectively to encourage people to build resilience to disasters and develop disaster risk management strategies. 

Scientific models allow insurers to evaluate the risk associated with natural disasters. The probability of exceeding a specific financial loss is calculated using advanced quantitative modelling. At the core of this risk modelling is the need to determine the relationship between a particular measure of the hazard, such as wind speed or rainfall and the resulting financial losses. Catastrophe models involve the computationally intense process of using geographical information systems (GIS) to describe the spatial variation of exposure and vulnerability for a particular portfolio of buildings. By running numerous simulations of extreme events that vary in time and space, the catastrophe model assesses the chances of experiencing losses of different magnitudes. These models can be broken down into modules describing the hazard, exposure, vulnerability and financial components. The development of these modules relies on access to a skilled team of scientists, engineers, statisticians and actuaries. 


Opportunities arise at the interface of novel data, advanced modeling and a willingness to innovate business practices. The transition to using quantitative models to automate decision-making, remove inefficiencies and prioritize resources is already taking place in many organisations.  

Big data is providing the ability to offer weather insurance for farmers. Data from weather stations or satellites can be used to construct an index that tracks the losses that have arisen due to extreme weather events. With the availability of low-cost wireless sensors and higher resolution information, the accuracy and feasibility of this innovative type of insurance is improving. 

Many countries are now using public-private partnerships as a means of structuring national catastrophe programs to protect against natural disasters. New Zealand’s Earthquake Commission (EQC) provided primary natural disaster insurance that protected the owners of residential properties from the 2010 and 2011 Christchurch earthquakes. Early warning systems are another innovation and these rely on timely access to information – social media is playing an important role in communicating alerts. 

Figure 2: Number of internet devices being used each year.

Great opportunities exist for the private sector to use big data to monitor business activities and interactions with customers. This information is providing information about what works and what does not and is helping to increase efficiencies in many sectors. Success relies on being profitable and also managing risk when making decisions – big data is helping to provide actionable insights for both.  

Data is also becoming a valuable source of information about the preparedness of firms to cope with shocks that might arise from regulation, technology and climate change. Pension funds are consumers of such information in order to make long-term decisions about companies. Novel datasets and surveys are available to assess the true value of firms and to better understand how their activities are likely to be aligned with future opportunities in an effort to strengthen resilience. Key decisions in the face of future uncertainty can be supported by data and those that understand how to utilize big data are more likely to prosper. 

Risk reduction strategies tend to be reactive as it is easier to justify allocating resources in the aftermath of a disaster. As the lifespan of politicians and business leaders is relatively short, they rarely have the stamina to support long-term strategies that will not reap a reward until the next disaster. Furthermore, many responses involve incremental solutions that fail to grasp long-term opportunities.  

Talk about strengthening resilience is a growing trend that is replacing less optimistic discussions about risk management. Resilience implies more than risk reduction and can be viewed as having the capacity to adapt, recover and transform in response to adverse events. There is an important role here for big data to encourage proactive transformative solutions as opposed to incremental changes. New sources of data and innovative decision support tools could identify strategic actions and allow companies to be rewarded for transforming early. Performance metrics could help investors evaluate the companies that are already transforming and positioning themselves to make the most of future opportunities deserve to be rewarded now for their foresight.


Wednesday, 1 April 2015

Forecasting demand using Big Data

As we walk, cycle or drive around Oxford, make telephone calls, send texts or emails and do our shopping, many of us are unaware of exactly how much data is being generated by our activities. "Big data" is a catch-phrase for describing the overwhelming volume, velocity and variety of this stream of information. Big data has the potential to provide many opportunities for the public and private sectors, offering a means of fusing different sources of information and supporting decision-making in real-time.

Perhaps the most interesting aspect of big data is how it deepens our understanding of human behaviour seen through the collective actions of many individuals. We tend to consume services following the temporal cycles in our everyday lives. There are three evident cyclical patterns based around the hour of day, the day of the week and the season of the year. All of these patterns can be seen in electricity consumption, call centre activity, internet usage, financial transactions, traffic flow and the use of healthcare services.

Fortunately the repetition of these patterns offers potential for accurate demand forecasting. Services can be delivered with greater efficiency if staff and limited resources are scheduled in order to meet forecasted demand. The National Grid has been balancing supply and demand for years and knows the value of accurate forecasts. If they get it wrong, the lights go out and everybody notices. While power outages still happen in many countries, we take it for granted in the UK that we have reliable access to electricity at all times. Amazingly, we are relatively tolerant of imbalances in supply and demand in other sectors and this may explain why sophisticated demand forecasting is not widely utilised.

Take healthcare, for example. The NHS has a target of seeing 95% of patients arriving in A&E within four hours. Until recently there was little information about the performance of our local hospitals or indeed how they compare with the rest of the country. Now weekly A&E data about the percentage of patients seen in four hours is available. This week the John Radcliffe Hospital A&E scored 87.1%, slightly less than the national average of 91.5%. Here is a chance for Oxford City to become smarter.

There are many opportunities to use big data and quantitative models to forecast demand, develop early warning systems and improve staff scheduling. The graph below shows the average hourly A&E arrivals at the John Radcliffe for different days of the week. We immediately see the hour of the day effect with low demand during the night and two peaks at 12:00 and 18:00. Most striking is the near doubling of arrivals in the early hours of Saturday and Sunday, which can be attributed to the effect of weekend partying and pubs closing at 11:00 on Friday and Saturday night. While A&E staff are well aware of the additional burden caused by weekend festivities, the data analysis paints a clear picture of its impact on arrivals.
Big data can facilitate a better understanding of social behaviour and the effect of the environment. Arrivals increase on bank holidays. Temperature is another important factor with arrivals increasing in warm weather. But this is just the start. Forecasts of extreme weather events and information about social events could be used to construct an accurate model.

New book: