User-generated data is a social science goldmine
By: Robert Bond

As increasing amounts of digital data are produced and stored online it is important to remember that humans produce much of that data. In an era in which people express themselves on Facebook, follow companies on Twitter, allow their phones’ GPS to track their movements and their online retailers to track their buying decisions, social scientists have tremendous opportunity to help shape the analysis of large-scale data sources to understand human attitudes and behavior.

A particular area of strength among social scientists concerns measurement. In contrast to many other disciplines, social scientists are frequently concerned with a latent variable not easily observed, and must estimate it using some other, more easily observed quantity. For instance, we may be interested in studying how an individual’s political ideology has some effect on their political behavior—e.g., which candidates they support. It is quite difficult to observe something as nebulous as ideology, but social scientists have developed computational methods that we can apply to data to generate an estimate, including simple measures from surveys or behavioral measures like voting records for members of Congress. My own work has shown that digital traces in social media, such as “likes” for a politician on Facebook, can be used to measure ideology as well at a very large scale.

A second area of strength among social scientists concerns the development of causal theories and tests of causal inference. While causal inference is not unique to the social sciences, the problems inherent to developing causal models when considering people’s behavior are often different from those in other domains. For instance, humans select their environments, which makes causal inference difficult. To deal with this, social scientists develop theories that rely on assumptions about the world, along with a wide range of methodological tools, to make causal inference more tractable. In the case of big data, social scientists should play the role of advocating for well-defined theories of human behavior, and for making the assumptions underlying causal tests clear. If we fail to do so, we are likely to understand what the world looks like without having a clear understanding of why it came to be so.

Large-scale data sources also create opportunities for social scientists to conduct research at a scale previously not feasible. Digital traces humans leave behind through their interaction with computers, phones, smart watches, and other digital tools create enormous quantities of data that previously would have been cost prohibitive or impossible to collect. Further, with more people conducting more of their daily lives online, it is possible for social scientific studies to include millions of individuals at once. Through the use of large-scale sources, social scientists are able to study more subtle causal effects through increased statistical power and also to characterize the behavior of ever-larger proportions of the population, thereby using big data both to “zoom in” on small changes and to “zoom out” to examine the effects these small changes have at a societal level.

My particular area of expertise—the study of social networks—has benefited greatly from big data. Social network analysis requires the use of data that traditionally would have been difficult to collect and analyze due to its complexity. Big data and computational tools, however, have largely changed both of these processes. While we have always lived in a network, the ties between individuals have now become more explicit and are more easily tracked and quantified through online interaction, particularly social media. Each friend request we accept, comment we make, Twitter account we follow, or Snapchat we send potentially provides researchers with important information about the social environment we are in. Further, computational tools have advanced such that describing and analyzing a network of millions of individuals is a tractable problem. Not many years ago, either of these would have been impossible.

As our world becomes more computational, and as that change ushers in vast troves of new data about humans, it is critical that social scientists influence how these data are analyzed and the conclusions that are subsequently drawn. Such data offer abundant opportunities to study phenomena of interest at new scale and with increased precision. However, doing so will require careful thought about the processes that have created these data—not only the mechanical processes that translate data from a server to a monitor screen but also the processes through which humans create such data in the first place. If the methods and models we use to understand the data created by humans fail to account for how and why such data were created, we are unlikely to fully appreciate what this kind of data can tell us about human nature.


About The Author

Dr. Bond’s program of research covers political communication and behavior, particularly social influence processes. Frequently his work involves using large-scale data sources from social media to study political engagement, ideology, and turnout. In addition to these substantive areas, Dr. Bond works on methodological tools that help social science researchers analyze large-scale data sources.


Add a Comment

To add a comment you must be signed in.

Share this page
Suggested Articles
TDAI co-leading first NSF workshop on translational data science

TDAI and the University of Chicago’s Center for Data Intensive Science are co-chairing an NSF-sponsored Translational Data Science Workshop June 26-27 in Chicago. The invitation-only event is designed to build...

Student research opportunity: Apps due April 14

The University of North Carolina – Charlotte is accepting applications for its NSF-funded Research Experiences for Undergraduates Program in the area of crime analytics. Ten undergraduates will be selected to work on data-driven research...

War less likely between nations that are “friends of friends”

Even nations can have friends of friends, a new study led by TDA affiliate Skyler Cranmer has found. Skyler Cranmer Results suggest these indirect relationships have a surprisingly strong ability to prevent...

Bendoly co-edits new data visualization textbook

When it comes to data visualization, meaning is in the eye of the beholder. For this reason, knowing your audience is critical for creating data visualizations that fulfill their purpose....

Call for papers: Political Networks Conference and Workshops

TDA is co-sponsoring the 10th Annual Political Networks Conference and Workshops, which Ohio State will host June 14-17. The program chairs and host committee are pleased to invite proposal submissions for...