Data Analytics

The approach put forth here will propose that the recurring nature of certain international political/economic crises can be detected using two basic ingredients: 1) historical data and, 2) local knowledge. Historical data allows us to see that certain developments occur repeatedly over time. Those developments, in turn, are not random nor are they the product of one individual mind, but are the product of the culture that bears them and the ideology that rationalizes them. An understanding of the social groups involved, therefore, can help us see why certain events repeat themselves.

The first step in this methodology is to collect large amounts of information concerning past cycles and apply Data Analytics technologies to it. Data Analytics has grown into a dense field of diverse methods and applications. At its core, however, it refers to the process of gathering, transforming and modeling large amounts of data that might otherwise be too difficult to manage. With the use of modern computing power, salient information can be highlighted and put to practical use by the analyst.

Here, the technology is used to draw on thousands of online media sources in order to extract the rising and lowering levels of attention toward a particular issue over time. Ethnographic Edge uses web intelligence provided by Recorded Future. Recorded Future’s software can quantify the level of positive or negative sentiment expressed in large amounts of news articles by using techniques such as natural language processing. This technology transforms unstructured data (i.e., sentiment in news articles) into structured data (i.e., ongoing numerical scores).

If we accept the premise that abnormally high levels of negative sentiment in the media correspond to some form of political tension, and that positive sentiment indicates the presence of trust, then we may appraise the rising and lowering levels of tension and trust in relation to a certain issue by measuring sentiment in the media over time. Organizing this numerical information in the form of a line chart allows us to learn, for example, what was happening in the media at any given point in the past. There are, however, many different types of media “milieus” on the Internet. The methodology proposed here focuses on four of these: blogs, mainstream media, government publications, and social media.

Data is collected for the past five years. This information is then fed into machine learning software that looks for recurring combinations in past sentiment fluctuations in each of the four media milieus. Combinations are categorized according to their repeated effect on future mainstream media sentiment scores. Real time data is then collected and monitored for these same combinations. When they occur, a signal is created predicting future mainstream media sentiment scores, and thus potential future political tensions or trust. See diagram below.

This is all done mechanically, through a set of algorithms that automatically search for recurring patterns. Machines, however, are just enablers. All this stage of the methodology does is identify any significant increases in the likelihood of a development (after certain conditions) based on the simple fact that that same development has repeatedly occurred in the past (after those very same conditions). In other words, it puts us a step ahead of a strategy that relies primarily on the subjective analysis of information assimilated by human means alone.

graph2