Roadmap for learning Topological Data Analysis?
I'm a math major who has recently graduated and I w
aligass2004yi
Answered question
2022-07-01
Roadmap for learning Topological Data Analysis?I'm a math major who has recently graduated and I will be starting full time work in 'data analysis'.Having finished with decent marks and still being incredibly interested in mathematics, I was thinking of pursuing graduate study/research at some point in the future. I was reading up about possible areas of study for this when I came across topological data analysis, which (as I understand it) is an application of algebraic topology to data analysis.Given my situation, I was intrigued by the concept and I would like to do some self study so I can have a working understanding of the subject. I have only done basic undergraduate abstract algebra, analysis and point set topology, and I am currently reading Munkres' Topology (Chapter 9 onwards). How do I get from where I am now to understanding the theory behind TDA and being able to apply it?My knowledge on further mathematics is far from extensive and I would appreciate any advice on links/texts which I could use to learn the relevant material.
Answer & Explanation
Anika Stevenson
Beginner2022-07-02Added 19 answers
Before answering you question I would like to discuss some points:Topological data analysis is roughly, as you write, (algebraic) topology applied to the study of data. While you certainly will need to learn some topology, the type of topology that you should learn really depends on the type of applications you are interested in. For this reason I will not give you a roadmap, but a suggestion on how to draw your own roadmap.You should also not forget the second part in the definition of topological data analysis, namely that you are studying data. For this it would be good to learn some general facts about data analysis, and in particular statistics (more about this below). For a statistician’s viewpoint on topological data analysis, there is a nice series of columns by Robert Adler on what he calls TOPOS, available here.You have to know your data. This might go without saying, but too often I have seen people throwing some method at data to see what comes out of it, without even asking themselves why they are using that specific method. While depending on your job conditions you might be given more or less time to work on a specific project, I think that you should really try to make sure that you understand the data and the context as best as possible before even starting to think about which method you want to use. While topology gives a wealth of different methods that can be applied to the study of data, these might not always be the best tools to use, and there might be other techniques which are better suited. The bottom line is: there is no method or set of methods that fits all problems.And here comes my suggestion for how to draw your own roadmap:Topology. Robert Ghrist’s book Elementary Applied Topology gives a succinct overview of the main methods and ideas from topology that are used in applications. Every chapter covers a certain topic in topology and then gives examples of applications of these. While there are other texts on applied topology that delve into more detail from the mathematical point of view, I would suggest to use Ghrist’s book to get an idea of the applications and set of ideas, and then draw your own roadmap of topics that you would like to cover from there. Since the text is succint, you might need to use also other texts to learn more about the mathematics covered in each chapter. For example, to learn more about (smooth) manifolds (Chapter 1) you might want to read up some more things in Lee’s Introduction to smooth manifolds, or to learn more about Cohomology (Chapter 6) you might want to consult Hatcher’s Algebraic Topology. Again, I don’t think that there is a ''one size fits all'' answer to which texts you should use for this, but once you have a good grasp of what exactly you would like to understand better, you could again ask people with more experience for advice.Statistics. A book that analogously to Ghrist’s book could help you in designing your own roadmap is Larry Wasserman’s All of Statistics. Also, note that the application of statistical methods to techniques from topological data analysis is an active area of research, and while there are some tools and libraries that can be used for applications, this area is still in its infancy. I list here the libraries and relevant references for statistical tools for topological data analysis that I know off the top of my head (these are all related to persistent homology):Persistence Landscapes and the corresponding toolboxThe TDA package tutorial and the packagePersistence images and libraryData science. Finally, as for data science more broadly, I don’t know any good text, but you might get an idea of some of the general themes from the book Mathematical Problems in Data Science.Aside: to finish off, I give some additional references to books/papers and software packages.References for topological data analysis, and computational topology:Topology and data, CarlssonComputational Topology, Edelsbrunner and HarerTopology for Computing, ZomorodianPersistence Theory, Oudot (this might be too specific, but this would be useful if you want to learn more about the theory behind persistent homology)Computational homology, Kaczynski, Mischaikow, MrozekOpen source libraries that implement some of the methods from topological data analysis:Mapper: Python MapperPersistent homology: a few of the most recent (and best performing) libraries are Ripser , GUDHI, and DIPHA. Note that there is also an overview of the different libraries for persistent homology available here. (Disclaimer: I am one of the authors of this paper. Also, the version on the ArXiv is outdated, and will be replaced by an up-to-date version in the next weeks, so it might be better to look at this once it is updated.)
taghdh9
Beginner2022-07-03Added 6 answers
Geometric and Topological Inference is an excellent book for introducing persistent homology. If you didn't do algebraic topology course it should be easier than Edelsbrunner and Harer's book. I also found it more approachable since it has more exercises, and gives more details on construction of complexes.