Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. It only takes a minute to sign up.

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 5 years ago.

What are the books about the science and mathematics behind data science? It feels like so many "data science" books are programming tutorials and don't touch things like data generating processes and statistical inference. I can already code, what I am weak on is the math/stats/theory behind what I am doing.

If I am ready to burn $1000 on books (so around 10 books... sigh), what could I buy?

$\begingroup$Asking about "good" books will attract opinion-based answers and so this is off-topic. Flagged.$\endgroup$
– SpacedmanJun 11 '14 at 14:32

3

$\begingroup$I've changed it so I am just looking for books. Nothing opinion-based.$\endgroup$
– AntonJun 11 '14 at 14:34

$\begingroup$It's spelled S-t-a-t-i-s-t-i-c-s :) Stick with something pragmatic that focuses on prediction rather than inference. Both Elements of Statistical Learning and An Introduction to Statistical Learning are on most people's lists.$\endgroup$
– Dirk EddelbuettelJun 11 '14 at 15:17

Statistical Inference by Casella and Berger is a good graduate-level textbook on the theoretical foundation of statistics. This book does require a pretty high level of comfort with math (probability theory is based on measure theory, which is not trivial to understand).

With respect to data generating processes, I don't have a recommendation for a book. What I can say is that a good understanding of the assumptions of the techniques used and ensuring that the data was collected or generated in a manner that does not violate those assumptions goes a long way towards a good analysis.

Other answers recommended a good set of books about the mathematics behind data science. But as you mentioned, its not just mathematics and activities like data collection and inference from data has their own rules and theories, even if not being as rigorous as mathematical backgrounds (yet).

For theses parts, I suggest the book Beautiful Data: The Stories Behind Elegant Data Solutions which contains twenty case-study like chapters written by people really engaged with real world data analysis problems. It does not contain any mathematics, but explores areas like collecting data, finding practical ways of using data in analyses, scaling and selecting the best solutions very well.

I like Amir Ali Akbari's suggestions, and I'll add a few of my own, focusing on topics and skills that are not adequately covered in most machine learning and data analysis books that focus on math and/or programming.