The past several years have seen a growing focus on big data, data science, and data analytics that spans a spectrum—from fundamental issues related to the management of large-scale and heterogeneous data to the formulation of methods for the systematic manipulation of data sets, to the implementation and evaluation of those methods toward solving driving, real-world problems.
It is this solving of real-world problems—and the need for a responsive framework that captures challenges and yields viable solutions—that inspired the creation of Translational Data Analytics @ Ohio State.
In navigating the spectrum mentioned above, we encounter three major concepts:
- Big data, representing collections of data that are of sufficient volume, velocity, or variability that traditional methods of management and analysis are not able to generate actionable insights;
- Data science, a broad field that explores and develops the collective processes, theories, concepts, tools, and technologies that enable the review, analysis, and extraction of valuable insights from raw data; and
- Data analytics, a process of sifting, organizing, and examining vast amounts of information and then drawing conclusions based on that analysis.
These concepts are the essential components of a data-centric ecosystem that consists of sources (e.g., big data); theories and methods that can be applied to such sources (e.g., data science); and the use of such data, theories, and methods to meet the needs of a variety of applications (data analytics). However, when strategizing how to create and sustain such a data-centric ecosystem, a recurring theme emerges: How do we ensure we are asking and answering the right questions in our use of big data resources, particularly given the high costs associated with their assembly, analysis, and dissemination?
More specifically, once mechanistic issues such as how best to collect, store, integrate, and analyze big data have been adequately addressed, how do we derivedemonstrable value from it?
At The Ohio State University, under the auspices of our paradigm-shifting Discovery Themes initiative2, Translational Data Analytics @ Ohio State (known as TDA@OhioState) is pursuing a transformative effort to address this fundamental question. Our effort involves creating a network of people, places, and programming that support and generate innovative solutions from the translation of data analytics theory into practice—innovations that generate demonstrable value and are driven by real-world problems.
It is a crucial step toward devising the big-picture framework and approach to basic and applied data science that are necessary to ensure we are asking and answering the questions of greatest value.
Figure 1: Conceptual framework for the solution-oriented application of big data to create demonstrable value.
A conceptual framework for generating value from big data involves a variety of interacting components.
- At the core is the interplay between four major types of stakeholders: 1) “upstream” analytics consumers (e.g., researchers, strategic decision makers) who define driving problems that require data-centric solutions employing the integrative analyses of collections of large-scale, heterogeneous data; 2) “downstream” analytics consumers who may not necessarily define driving problems but have a research or clinical need to apply information products generated via big data analytics to advance hypothesis generation, testing, and decision making; 3) developers of processes and tools that enable systematic and tractable understanding of source data, where such understanding may have to be inferred using numerous computational techniques given the scarcity of well characterized data; and 4) developers of processes and approaches that ensure stakeholders understand what big data analytics tools exist, how they are appropriately used, and how to interpret their outputs.
- The primary outcome is “actionable knowledge”—a term used purposely to indicate that such knowledge should both contextualize data products and deliver them in the correct format to the correct individual(s) at the correct time so as to optimally support or influence decision-making and other data-driven workflows.
- Finally, the overarching and unifying objective is to connect the dots between the driving question and contextualizing factors articulated by “upstream” data analytics consumers and the interventions, processes, and outcome measures associated with the decision-making and other data-driven workflows of “downstream” data analytics consumers.
Experience has shown that it is only possible to ask and answer meaningful questions—and thus generate value from big data resources—if the relationships between these components are uniformly strong and well understood. Further, such questions must be informed and motivated by an overarching and unifying framework that links the information needs of “upstream” and “downstream” consumers in synergistic and quantifiably measurable ways.
Achieving this vision will require significant work by the relevant basic and applied science communities at Ohio State and throughout the region, nation, and world, including:
- Creating and delivering tailored workforce and knowledge development programs to ensure the stakeholders described herein are informed with respect to identifying, interacting with, and generating value from big data;
- Developing and applying methods to ensure big data platforms are capable of empowering knowledge workers and stakeholders to leverage their domain knowledge and contribute to the discovery, integration, harmonization, and analysis of big data; and
- Creating collaborative research commons for big data resources, tools, and best practice documentation to promote an efficient and transparent innovation and knowledge ecosystem around big data analytics.
Ultimately, this framework and its associated action items represent a path forward that ensures we are asking questions that matter by situating the critical assessment of big data analytics within a value-generating ecosystem. Realizing this vision is at the core TDA@OhioState’s efforts and is indicative of the unique role Ohio State can serve as an innovation engine capable of addressing problems of global importance.