Seminar: Next-Generation of Big Multi-Dimensional Data Analytics

Headshot: Dr. John Paparrizos
October 25, 2023
12:00PM - 1:00PM
301 Pomerene Hall

Date Range
2023-10-25 12:00:00 2023-10-25 13:00:00 Seminar: Next-Generation of Big Multi-Dimensional Data Analytics TDAI's Computational Health & Life Sciences community of practice is pleased to present Dr. John Paparrizos (Computer Science & Engineering) on Wednesday, Oct. 25, at 12:00 p.m. in 301* Pomerene Hall. Dr. Paparrizos will give a talk entitled "Next-Generation of Big Multi-Dimensional Data Analytics." Register for in-person     Register for Zoom *This is an updated location. Talk Abstract Today, automated processes, Internet‑of‑Things deployments, and Web and mobile applications generate an overwhelming amount of high‑dimensional data. Meanwhile, computational resources remain limited, and advances in machine learning (ML) create a pressing need to support increasingly expensive and complex analytical tasks. Unfortunately, traditional data management techniques offer limited support for high‑dimensional data, ML tasks, and adaptation to data properties, often resulting in reduced performance. Similarly, due to the difficulty of providing invariances to specific data distortions, applications often resort to inadequate ML methods, reducing their effectiveness. In my work, I ask how we can address the lack of task‑aware and data‑driven adaptations in data management and ML methods. Specifically, I will discuss two solutions for (i) data representations and (ii) computational methods using techniques to exploit similarities, shapes, densities, and distributions in data. Motivated by the ubiquity of high-dimensional time series, I will first present a method for anomaly detection in streaming data to account for distribution drifts. Then, I will discuss a variance-aware quantization method for indexing high-dimensional data that enables similarity search queries at scale. In both examples, the proposed methods substantially improve performance and accuracy, demonstrating the benefit of designing task-aware and data-driven solutions for large-scale data science applications. Read more about the speaker   301 Pomerene Hall America/New_York public

TDAI's Computational Health & Life Sciences community of practice is pleased to present Dr. John Paparrizos (Computer Science & Engineering) on Wednesday, Oct. 25, at 12:00 p.m. in 301* Pomerene Hall. Dr. Paparrizos will give a talk entitled "Next-Generation of Big Multi-Dimensional Data Analytics."

Register for in-person     Register for Zoom

*This is an updated location.

Talk Abstract

Today, automated processes, Internet‑of‑Things deployments, and Web and mobile applications generate an overwhelming amount of high‑dimensional data. Meanwhile, computational resources remain limited, and advances in machine learning (ML) create a pressing need to support increasingly expensive and complex analytical tasks. Unfortunately, traditional data management techniques offer limited support for high‑dimensional data, ML tasks, and adaptation to data properties, often resulting in reduced performance. Similarly, due to the difficulty of providing invariances to specific data distortions, applications often resort to inadequate ML methods, reducing their effectiveness.

In my work, I ask how we can address the lack of task‑aware and data‑driven adaptations in data management and ML methods. Specifically, I will discuss two solutions for (i) data representations and (ii) computational methods using techniques to exploit similarities, shapes, densities, and distributions in data. Motivated by the ubiquity of high-dimensional time series, I will first present a method for anomaly detection in streaming data to account for distribution drifts. Then, I will discuss a variance-aware quantization method for indexing high-dimensional data that enables similarity search queries at scale. In both examples, the proposed methods substantially improve performance and accuracy, demonstrating the benefit of designing task-aware and data-driven solutions for large-scale data science applications.

Read more about the speaker

 

Events Filters: