Current projects
We are currently working on the following projects:
- Analysis of density-based algorithms for clustering and anomaly detection used in CS and data science (and lacking statistics guarantees)
- DBSCAN, OPTICS, local outlier factor, isolation forests.
- We can use these algorithm to get e.g. cluster trees.
- Streaming/online density estimation with applications to TDA
- Maybe sliding windows can work.
- Use of concave-hull and alpha-concave hull for support recovery, shape estimation, clustering, etc
- This project would be more geometrical.
- Confidence sets in the persistent homology framework described in this paper:
- Omer Bobrowski, Sayan Mukherjee, Jonathan Taylor, “Topological Consistency via Kernel Estimation” [ arXiv ].
- Filtration is constructed differently.
Investigate the statistical use/advantages of alpha-witness complexes
- Come up and study continuous geometric signatures (as opposed to the discrete ones used in TDA)
- Such as interpoint distance distribution studied in the paper “Shape classification based on interpoint distance distributions” [ link ].
- Continuous features would be easier to analyze.
- Bifiltrations for persistent homology.
- In particular, the bifiltration arising form letting the level and the bandwidth change in density estimation.
- Intensity functions for handling persistence diagrams and their usage.
- Darren and Yen-Chi wrote a nice conference paper as a first step, but this can be further developed.
- It would be useful when dataset consists of many replications.
- Mode trees.
- Jussi Klemela, Mode Trees for Multivariate Data [ link ].
- Mapper.
- Now Mapper is on R CRAN as package TDAmapper [ CRAN ].
Papers to be considered
Tyrus Berry and Timothy Sauer, Consistent Manifold Representation for Topological Data Analysis [ arXiv ]
Mikhail Belkin, Partha Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation [ pdf ]
Miguel A. Carreira-Perpinan, The Elastic Embedding Algorithm for Dimensionality Reduction [ pdf ]
Miguel A. Carreira-Perpinan, Max Vladymyrov, A Fast, Universal Algorithm to Learn Parametric Nonlinear Embeddings [ pdf ]
Andreas Krause, Volkmar Liebscher, Multimodal Projection Pursuit using the Dip Statistic [ pdf ]
Vivien Seguy, Marco Cuturi, Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric [ pdf ]
Levi Lelis, Jorg Sander, Semi-Supervised Density-Based Clustering, [ pdf ]
Samuel Maurus, Claudia Plant, Skinny-dip: Clustering in a Sea of Noise [ pdf ]
Fei Tony Liu, Kai Ming Ting, Zhi-Hua Zhou, Isolation Forest [ pdf ]
Tarn Duong, Gael Beck, Hanene Azzag, Mustapha Lebbah, Nearest neighbour estimators of density derivatives, with application to mean shift clustering [ pdf ]
Ery Arias-Castro, Beatriz Pateiro-Lopez, Alberto Rodriguez-Casal, Minimax Estimation of the Volume of a Set with Smooth Boundary [ arXiv ]
Several other things to be considered