I had been looking for a good book to recommend to my “Introduction to Data Science” classes at UCLA as a text to use once my class completes … sort of the next step after learning the basics. That’s why I was looking forward to reviewing the new 3rd edition of the widely acclaimed title “Python Machine Learning” by Sebastian Raschka, Vahid Mirjalili. The book is a comprehensive guide to machine learning and deep learning with Python. It acts as both a step-by-step tutorial, and a useful resource you’ll keep coming back to as you fill up your data science toolbox.
I knew I was going to like it the minute I started thumbing through the pages and saw some mathematics. I had been warning my students early on that eventually they’d have to break down and engage the mathematical foundations of machine learning to become a down-in-the-trenches data scientist, so this book fits that bill nicely. Many of the chapters start off with some theoretical aspects of the topic being discussed, including some math, followed by plenty of nicely written Python code. It should be noted that this book is not for beginners, and if you don’t know the Python language, you’ll have to find another learning resource before consuming this book.
I appreciated Chapter 2, “Training Simple Machine Learning Algorithms for Classification” which goes all the way back to the beginning of machine learning and defines the “perceptron” algorithm (circa 1957 and Frank Rosenblatt’s seminal paper), and includes the code for implementing this simple model. I think it is a great learning experience to play around with this code to fully understand how this field got started.
The balance of the chapters represent a tour de force of the field of machine learning, with few stones left unturned. Here is a list of topics covered in the book which should give you a good impression for the broad scope addressed for data scientists of varying levels of expertise:
- Using scikit-learn for solving classification problems
- Data prep
- Dimensionality reduction with PCA
- Model evaluation and hyperparameter tuning
- Ensemble learning
- Sentiment analysis
- Adding a ML model to a web app
- Implementing a multilayer ANN from scratch
- Parallelizing NN training with TensorFlow
- Mechanics of TensorFlow
- Deep convolutional neural networks
- Recurrent neural networks
- Reinforcement learning
Wow! Impressive right? You could feasibly get introduced to most of the hot areas of machine learning by using this book. The book is accompanied by a series of Jupyter notebooks with all the code from the text so you can quickly get deeply into the content to advance your knowledge of this growing area of technology. I’ve already added this book to my Data Science Bibliography which I hand out to my students as a pathway to obtaining data science “super powers.”
Another great thing about this book is that it doesn’t presume to be the last and final word on any of the topics covered. Every chapter has a liberal number of sidebars containing citations to additional learning resources, including the author’s own course notes, blog articles, research papers, lecture slides, text books, etc. This effort to fill in the gaps also includes compelling tips for the historical framework of important concepts. For example, Chapter 17 on GANs, has a side bar about why BatchNorm helps optimization by clearly laying out its genesis and making reference to the time-frame and motivations of a group of researchers that were instrumental in carrying this technique forward. This side benefit is significant since it makes this book the starting point (but not ending point) for study on the subject. You needn’t look beyond this book to guide your way. Nice touch!
I highly recommend this book for any advancing data scientist who needs a completely state-of-the-technology picture of our field. I’ve carefully been going through the book myself as a refresher course for the theory, math and code related to machine learning. Very enjoyable!
Contributed by Daniel D. Gutierrez, Managing Editor and Resident Data Scientist for insideBIGDATA. In addition to being a tech journalist, Daniel also is a consultant in data scientist, author, educator and sits on a number of advisory boards for various start-up companies.