TDAI affiliates Yusu Wang and Sebastian Kurtek are co-principal investigators on a new interdisciplinary project that has been awarded $500,000 through Phase 1 of the NSF’s Transdisciplinary Research in Principles of Data Science (TRIPODS). The project, entitled “Topology, Geometry, and Data Analysis (TGDA@OSU): Discovering Structure, Shape, and Dynamics in Data,” is led by PI Tamal Dey (computer science and engineering), along with co-PIs David Sivakoff (statistics, math) and Facundo Memoli (computer science and engineering, math).
This project will advance the methodological and theoretical foundations of data analytics by considering the geometric and topological aspects of complex data from mathematical, statistical and algorithmic perspectives, thus enhancing the synergy between the Computer Science, Mathematics, and Statistics communities. Furthermore, this project will benefit a range of impactful scientific areas including medicine, neuronanatomy, machine learning, geographic information systems, mechanical engineering designs, and political science. The research products will be implemented and disseminated through software packages and tutorials, allowing widespread application by industrial and academic practitioners. Through this project, the PIs will develop curricula for cross-disciplinary, undergraduate and graduate education. There is already extant data science curriculum offered jointly between Statistics and Computer Science and Engineering at The Ohio State University, including the recent Data Analytics undergraduate major, providing a platform to develop new courses and an opportunity to engage future industry leaders in basic research. Additionally, this project aims to develop partnerships with the Translational Data Analytics and the Mathematical Biosciences Institutes at OSU, as well as other internal and external research and education centers. Plans for workshops and summer schools are included for outreach and training purposes.
In the past few decades, a large number of models, methods, and algorithmic frameworks have been developed for data science. However, as data become increasingly more complex, the field faces new challenges. In particular, the non-Euclidean nature, the higher order connectivity, the hidden global cues, and the dynamics regulating the data pose further challenges to existing methods. This project will explore and leverage the geometric and topological structures inherent in the data to tackle some of these problems. The main aims are to discover, model and reveal information in the form of (i) structures in data, (ii) shapes from data, and (iii) dynamics underlying data. This project leverages concepts from mathematical areas of differential and algebraic topology and geometry, applied statistics and combinatorics, and computational areas of algorithms, graph theory, and statistical/machine learning. Research in geometric and topological data analysis has brought forth the need to recast and reinvestigate classical concepts in statistics and mathematics in the context of finite data, approximations, and noise. This project investigates explicit or hidden structures behind data, such as cluster trees, which are the basis for understanding and efficient processing of data. Additionally, the PIs aim to model the precise shape behind data globally or locally, which are essential for providing a platform where various statistical analyses can be carried out. Particular examples include the shape space of surface models and the tree space of phylogenetic trees. Finally, this project will consider dynamics in the data, where the interplay between temporal and topological/geometric features can lead to deeper insights. All of these areas will inevitably be enriched by new applications.