Neural network taught to predict floods on Flickr

Modern systems of warning about the danger of natural disasters are based largely on the work of specialized equipment and professional analysts. At the same time, there are services with the help of which all those who wish in real time can inform the authorities about changes in some climatic parameters, for example, the amount of precipitation or water level. In addition, the US Geological Survey (USGS) previously acknowledged that analyzing user content and location on Twitter can be a good addition to high-tech methods, acting as a “social barometer”.
Past studies have shown that a similar service can be used to exchange photos of Flickr. Thus, the dynamics of publications, as well as the nature of their descriptions and tags, correlated with fluctuations in atmospheric pressure in the State of New Jersey on the eve and during hurricane Sandy in 2012, which theoretically allowed forecasting weather changes in the affected areas. Nevertheless, existing methods for analyzing content in social networks often depend on keywords and phrases that correspond to a particular type or name of a natural disaster (flood, Katrina). According to the authors of the new work, this approach can be effective in solving operational problems, but its capabilities are severely limited.
To fill the gap, scientists from the University of Warwick developed an algorithm for semantic analysis of tags, which was trained using the Deconstructed Cascade Correlation Matrix. This method allows you to train an artificial neural network to analyze the target problem by “freezing” the weight coefficients of the hidden blocks at the input – as a result, the estimate remains relatively stable despite the variability of the parameters. In addition, DCCM provides the ability to vertically and horizontally deconstruct variables and work online. The method is interdisciplinary and is applied, including, for weather forecasting.
The team trained a new algorithm in photographs and videos from Yahoo Flickr Creative Commons 100M (YFCC100M), which were published between April 2004 and August 2014. At the entrance, the computer analyzed materials on four general (“nature”, “landscape”, “river”, “water”) and two combined (“RW” – from “river” and “water”, and “NL” – from “nature” “And” landscape “) tags, each of which at the output was associated with specific (” flood “,” flood “,” floodplain “) tags without indicating the attributes of weight. The comparison of tags with the risk of a natural disaster was based on three parameters: the scale of the event, the number of publications five days before the peak of the flood and after five days, as well as a pattern of behavior during the peak flood period.

Comparison of the number of tags “RW” and “flooding” (a), the number of “RW” and NL “(b) before and after (1) and in the peak flood period (2) / © Nataliya Tkachenko et al., PLOS ONE, 2017

The results showed that the appearance of flood-related tags in Flickr is statistically significantly correlated with the occurrence of specific (“water”, “river”) and summary (“RW”) tags. At the same time, the threat of a natural disaster was almost unrelated to the growing number of such tags as “landscape” and “nature”. It is noteworthy that the tags “water” and “river” took an intermediate position between the markers of the disaster and the nature theme and approximately the same correlated with the rest of the tags. Combined tags were more often encountered one day before the peak flood period, while the “RW” tag was used more and more often as the approach to the peak, and the “NL” tag, on the contrary, lost its popularity.
In addition, scientists retrospectively tested the ability of the model to predict floods by the number of publications per day five days before the event. The strongest correlation was found for the “RW” and “water” tags. So, the threat of disaster was indicated by the increase in the number of downloads with the tag “RW” to 100 or more five days before the flood, followed by a smooth decline in the indicator. With the increase in the number of publications with the tag “RW” to 125 or more per day, the correlation increased; Similar dynamics were typical for the growth of downloads with the tag “water” to 125 or more per day with a peak three days before the flood and the subsequent decline in the indicator.
According to the authors, their research indicates that social networks are a resource that can be used in combination with professional sources of meteorological data. In the future, such warning systems, based on the analysis of user content, could have unprecedented accuracy and efficiency, scientists believe.

The behavior of users of social networks is not the object of study for the first time. Earlier, psychologists linked activity on such sites with a sense of social isolation, and physicists compared the distribution of memes in social networks with statistical models that describe epidemics and financial markets.