The role of the social sciences in data analytics
By: Bear Braumoeller

To the considerable surprise of those of us who cut our teeth on Minitab and had to derive the OLS estimator by hand, data analytics is a sexy topic these days. Led by the likes of Nate Silver at FiveThirtyEight, Ezra Klein at Vox, and Nate Cohn at the New York Times‘ Upshot, data nerds have made unprecedented forays into both the public sphere and the business world. At a time in which “truthiness” seems to have run amok, an increasingly widespread acceptance of the careful use of data in argumentation and analysis can only be welcome.

That’s the good news.

The bad news is that drawing meaningful inferences from data requires quite a bit more than simply looking at the data. It requires a rigorous understanding of the role of chance in producing outcomes, and it requires an understanding of how to bridge the gap between correlation and causation when working with observational data.

These are areas in which the social sciences can contribute greatly to data analytics. Social scientists are rapt consumers and, increasingly, producers of statistical methodologies to derive coherent conclusions from noisy data. And due to reasonable prohibitions on experimentation on human subjects, social scientists have long had to make do with observational data. In short, while many disciplines have encountered these issues, the social sciences have been plagued by them, and practitioners have developed a comparative advantage in dealing with them. For that reason, the social sciences are uniquely well-positioned to contribute to the further evolution of data analytics.

Three examples help to illustrate this point.

Vote fraud is a common problem in democratizing countries, but the efficacy of election monitors is difficult to gauge; because they tend to be sent to situations in which vote fraud is a major issue, the raw data could well show that elections with monitors are more corruption-prone than those without, even if election monitors are succeeding in reducing corruption. To deal with this problem, political scientist Susan Hyde took advantage of the fact that, in Armenia’s 2003 presidential elections, monitors from the Organization for Security and Cooperation in Europe were assigned to polling stations effectively at random. By examining the differences between results from polling stations with monitors and those without, Hyde is able to demonstrate that candidates who engage in fraud receive a significantly lower share of the votes in monitored polling stations than they do in unmonitored ones[1].

Other examples have to do with political attitudes. Data analysts, especially in the media, often attribute fluctuation in public opinion to changes in current events or recent statements by politicians. In contrast, a small but growing literature argues that political attitudes are remarkably stable over time—even across decades or centuries. Political scientists Avidit Acharya, Matthew Blackwell, and Maya Sen demonstrate that the prevalence of slavery in a county 150 years ago still has an impact on contemporary political attitudes[2], while economists Irena Grosfeld and Ekaterina Zhuravskaya demonstrate that the partitions of Poland in the late 18th century produced changes in political attitudes that persist to this day; Grosfeld and Zhuravskaya came to this conclusion by examining the spatial distribution of public opinion and discovering abrupt and significant discontinuities along the old lines of partition[3].

Even when data analysts do use sophisticated methods, they tend to see them as a collection of useful tools rather than as parts of a coherent body of knowledge. For that reason, they fail to realize that applying those tools without having the necessary background can do more harm than good. A recent example of this from my daily commute was an episode of the Data Skeptic podcast on the subject of Bayesian A/B testing (or split-sample hypothesis testing, for those of us not in the business world)[4]. The podcast’s host expressed excitement at the possible applications of the test and asked if there were general principles guiding its use, to which the guest replied, “Test as much as possible”—apparently unaware of the fact that doing so is a recipe for false-positive results, as the science-savvy web cartoon xkcd once pointed out. A business that followed this strategy would end up designing its strategy around statistical anomalies and flukes rather than meaningful results.

I certainly don’t mean to overstate the ability of social scientists to figure out what makes the world go around using only observational data: there will always be caveats, and no methodology or research design is totally ironclad. But the more I observe the incredible proliferation of data analytics in both the business world and the public sphere, the more convinced I become that their main shortcomings are exactly those areas in which the social sciences excel.

[1] Hyde, Susan (2011) “The Pseudo-Democrat’s Dilemma: Why Election Monitoring Became an International Norm.” Ithaca: Cornell University Press.

[2] Acharya, Avidit, Matthew Blackwell, and Maya Sen (2014) “The Political Legacy of American Slavery.” Harvard Kennedy School Faculty Research Working Paper Series RWP14-057.

[3] Grosfeld, Irena, and Ekaterina Zhuravskaya (2013) “Persistent effects of empires: Evidence from the partitions of Poland,” CEPR Discussion Paper 9371.

 

About The Author

Dr. Braumoeller’s research is in international security, especially systemic theories of international relations and the politics of the Great Powers, and political methodology, with an emphasis on complexity. He is currently involved in projects on evaluating the end-of-war thesis and on addressing the problem of endogeneity when estimating the impact of political institutions. Dr. Braumoeller co-leads TDAI’s Computational Social Sciences research community of practice.

Discussion

Add a Comment

To add a comment you must be signed in.

SIGN INREGISTER
Share this page
Suggested Articles
Big Data for Good: Eye in the sky

In  a new  Big Data for Good feature, TDA affiliate Rongjun Qin, professor of computer science and geodetic engineering, discusses his work developing algorithms that interpret data from satellites, drones, and...

36 companies and organizations help shape new degree

Leaders from 36 different companies and organizations are partnering with the Translational Data Analytics Institute on the design of a new graduate degree that TDAI plans to launch in the...

Association for Computing Machinery Call for Papers

Pattern-driven mining, analytics and prediction has received a lot of attention in the last two decades since information discovered in data can be used to support decision and strategy making....

Stressed about COVID-19? Try tapping into the power of music

Music During Global Pandemic | Video by Aaron Nestor ____________________________________________ Research shows music can help regulate emotion, make us feel connected By Laura Arenschield, Ohio State News In Italy, people...

Data analytics reveal how we decide

Can you identify the critical difference between these two scenarios?  In one, a medication has a 1 percent rate of adverse side effects in those who use it. In the...