ISBN-10: 1801819319
ISBN-13: 978-1801819312
Paperback: 770 pages Packt Publishing Ltd. (February 25, 2022)

About this book

Initially, this project started as the 4th edition of Python Machine Learning. However, after putting so much passion and hard work into the changes and new topics, we thought it deserved a new title.
So, what’s new? There are many contents and additions, including the switch from TensorFlow to PyTorch, new chapters on graph neural networks and transformers, a new section on gradient boosting, and many more that I will detail in a separate blog post.
For those who are interested in knowing what this book covers in general, I’d describe it as a comprehensive resource on the fundamental concepts of machine learning and deep learning. The first half of the book introduces readers to machine learning using scikit-learn, the defacto approach for working with tabular datasets. Then, the second half of this book focuses on deep learning, including applications to natural language processing and computer vision.
While basic knowledge of Python is required, this book will take readers on a journey from understanding machine learning from the ground up towards training advanced deep learning models by the end of the book.

More info

Reddit AMA
My blog post explaining the contents
A Twitter Review Thread by @radekosmulski
A short YouTube review by Bhavesh Bhatt

Table of Contents (short version)

Machine Learning - Giving Computers the Ability to Learn from Data
Training Machine Learning Algorithms for Classification
A Tour of Machine Learning Classifiers Using Scikit-Learn
Building Good Training Sets – Data Pre-Processing
Compressing Data via Dimensionality Reduction
Learning Best Practices for Model Evaluation and Hyperparameter Optimization
Combining Different Models for Ensemble Learning
Applying Machine Learning to Sentiment Analysis
Predicting Continuous Target Variables with Regression Analysis
Working with Unlabeled Data – Clustering Analysis
Implementing a Multi-layer Artificial Neural Network from Scratch
Parallelizing Neural Network Training with PyTorch
Going Deeper – The Mechanics of PyTorch
Classifying Images with Deep Convolutional Neural Networks
Modeling Sequential Data Using Recurrent Neural Networks
Transformers – Improving Natural Language Processing with Attention Mechanisms
Generative Adversarial Networks for Synthesizing New Data
Graph Neural Networks for Capturing Dependencies in Graph Structured Data
Reinforcement Learning for Decision Making in Complex Environments

================

Detailed Table of Contents

Preface

Chapter 1: Giving Computers the Ability to Learn from Data

Building intelligent machines to transform data into knowledge
The three different types of machine learning
- Making predictions about the future with supervised learning
  - Classification for predicting class labels
  - Regression for predicting continuous outcomes
- Solving interactive problems with reinforcement learning
- Discovering hidden - structures with unsupervised learning
  - Finding subgroups with clustering
  - Dimensionality reduction for data compression
Introduction to the basic terminology and notations
- Notation and conventions used in this book
- Machine learning terminology
A roadmap for building machine learning systems
- Preprocessing – getting data into shape
- Training and selecting a predictive model
- Evaluating models and predicting unseen data instances
- Using Python for machine learning
- Installing Python and packages from the Python Package Index
- Using the Anaconda Python distribution and package manager
- Packages for scientific computing, data science, and machine learning
Summary

Chapter 2: Training Simple Machine Learning Algorithms for Classification

Artificial neurons – a brief glimpse into the early history of machine learning
- The formal definition of an artificial neuron
- The perceptron learning rule
Implementing a perceptron learning algorithm in Python
- An object-oriented perceptron API
- Training a perceptron model on the Iris dataset
- Adaptive linear neurons and the convergence of learning
- Minimizing loss functions with gradient descent
- Implementing Adaline in Python
- Improving gradient descent through feature scaling
- Large-scale machine learning and stochastic gradient descent
Summary

Chapter 3: A Tour of Machine Learning Classifiers Using Scikit-Learn

Choosing a classification algorithm
First steps with scikit-learn – training a perceptron
Modeling class probabilities via logistic regression
- Logistic regression and conditional probabilities
- Learning the model weights via the logistic loss function
- Converting an Adaline implementation into an algorithm for logistic regression
- Training a logistic regression model with scikit-learn
- Tackling overfitting via regularization
Maximum margin classification with support vector machines
- Maximum margin intuition
- Dealing with a nonlinearly separable case using slack variables
- Alternative implementations in scikit-learn
Solving nonlinear problems using a kernel SVM
- Kernel methods for linearly inseparable data
- Using the kernel trick to find separating hyperplanes in a high-dimensional space
Decision tree learning
- Maximizing IG – getting the most bang for your buck
- Building a decision tree
- Combining multiple decision trees via random forests
K-nearest neighbors – a lazy learning algorithm
Summary

Chapter 4: Building Good Training Datasets – Data Preprocessing

Dealing with missing data
- Identifying missing values in tabular data
- Eliminating training examples or features with missing values
- Imputing missing values
- Understanding the scikit-learn estimator API
Handling categorical data
- Categorical data encoding with pandas
- Mapping ordinal features
- Encoding class labels
- Performing one-hot encoding on nominal features
  - Optional: encoding ordinal features
Partitioning a dataset into separate training and test datasets
Bringing features onto the same scale
Selecting meaningful features
- L1 and L2 regularization as penalties against model complexity
- A geometric interpretation of L2 regularization
- Sparse solutions with L1 regularization
- Sequential feature selection algorithms
Assessing feature importance with random forests
Summary

Chapter 5: Compressing Data via Dimensionality Reduction

Unsupervised dimensionality reduction via principal component analysis
- The main steps in principal component analysis
- Extracting the principal components step by step
- Total and explained variance
- Feature transformation
- Principal component analysis in scikit-learn
- Assessing feature contributions
Supervised data compression via linear discriminant analysis
- Principal component analysis versus linear discriminant analysis
- The inner workings of linear discriminant analysis
- Computing the scatter matrices
- Selecting linear discriminants for the new feature subspace
- Projecting examples onto the new feature space
- LDA via scikit-learn
Nonlinear dimensionality reduction and visualization
- Why consider nonlinear dimensionality reduction?
- Visualizing data via t-distributed stochastic neighbor embedding
Summary

Chapter 6: Learning Best Practices for Model Evaluation and Hyperparameter Tuning

Streamlining workflows with pipeline
- Loading the Breast Cancer Wisconsin dataset
- Combining transformers and estimators in a pipeline
Using k-fold cross-validation to assess model performance
- The holdout method
- K-fold cross-validation
Debugging algorithms with learning and validation curves
- Diagnosing bias and variance problems with learning curves
- Addressing over- and underfitting with validation curves
Fine-tuning machine learning models via grid search
- Tuning hyperparameters via grid search
- Exploring hyperparameter configurations more widely with randomized search
- More resource-efficient hyperparameter search with successive halving
- Algorithm selection with nested cross-validation
Looking at different performance evaluation metrics
- Reading a confusion matrix
- Optimizing the precision and recall of a classification model
- Plotting a receiver operating characteristic
- Scoring metrics for multiclass classification
- Dealing with class imbalance
Summary

Chapter 7: Combining Different Models for Ensemble Learning

Learning with ensembles
- Combining classifiers via majority vote
- Implementing a simple majority vote classifier
- Using the majority voting principle to make predictions
- Evaluating and tuning the ensemble classifier
Bagging – building an ensemble of classifiers from bootstrap samples
- Bagging in a nutshell
- Applying bagging to classify examples in the Wine dataset
Leveraging weak learners via adaptive boosting
- How adaptive boosting works
- Applying AdaBoost using scikit-learn
Gradient boosting – training an ensemble based on loss gradients
- Comparing AdaBoost with gradient boosting
- Outlining the general gradient boosting algorithm
- Explaining the gradient boosting algorithm for classification
- Illustrating gradient boosting for classification
- Using XGBoost
Summary

Chapter 8: Applying Machine Learning to Sentiment Analysis

Preparing the IMDb movie review data for text processing
- Obtaining the movie review dataset
- Preprocessing the movie dataset into a more convenient format
Introducing the bag-of-words model
- Transforming words into feature vectors
- Assessing word relevancy via term frequency-inverse document frequency
- Cleaning text data
- Processing documents into tokens
Training a logistic regression model for document classification
Working with bigger data – online algorithms and out-of-core learning
Topic modeling with latent Dirichlet allocation
- Decomposing text documents with LDA
- LDA with scikit-learn
Summary

Chapter 9: Predicting Continuous Target Variables with Regression Analysis

Introducing linear regression
- Simple linear regression
- Multiple linear regression
Exploring the Ames Housing dataset
- Loading the Ames Housing dataset into a DataFrame
- Visualizing the important characteristics of a dataset
- Looking at relationships using a correlation matrix
Implementing an ordinary least squares linear regression model
- Solving regression for regression parameters with gradient descent
- Estimating the coefficient of a regression model via scikit-learn
Fitting a robust regression model using RANSAC
Evaluating the performance of linear regression models
Using regularized methods for regression
Turning a linear regression model into a curve – polynomial regression
- Adding polynomial terms using scikit-learn
- Modeling nonlinear relationships in the Ames Housing dataset
Dealing with nonlinear relationships using random forests
- Decision tree regression
- Random forest regression
Summary

Chapter 10: Working with Unlabeled Data – Clustering Analysis

Grouping objects by similarity using k-means
- k-means clustering using scikit-learn
- A smarter way of placing the initial cluster centroids using k-means++
- Hard versus soft clustering
- Using the elbow method to find the optimal number of clusters
- Quantifying the quality of clustering via silhouette plots
Organizing clusters as a hierarchical tree
- Grouping clusters in a bottom-up fashion
- Performing hierarchical clustering on a distance matrix
- Attaching dendrograms to a heat map
- Applying agglomerative clustering via scikit-learn
Locating regions of high density via DBSCAN
Summary

Chapter 11: Implementing a Multilayer Artificial Neural Network from Scratch 335

Modeling complex functions with artificial neural networks
- Single-layer neural network recap
- Introducing the multilayer neural network architecture
- Activating a neural network via forward propagation
Classifying handwritten digits
- Obtaining and preparing the MNIST dataset
- Implementing a multilayer perceptron
- Coding the neural network training loop
- Evaluating the neural network performance
Training an artificial neural network
- Computing the loss function
- Developing your understanding of backpropagation
- Training neural networks via backpropagation
About convergence in neural networks
A few last words about the neural network implementation
Summary

Chapter 12: Parallelizing Neural Network Training with PyTorch

PyTorch and training performance
- Performance challenges
- What is PyTorch?
- How we will learn PyTorch
First steps with PyTorch
- Installing PyTorch
- Creating tensors in PyTorch
- Manipulating the data type and shape of a tensor
- Applying mathematical operations to tensors
- Split, stack, and concatenate tensors
Building input pipelines in PyTorch
- Creating a PyTorch DataLoader from existing tensors
- Combining two tensors into a joint dataset
- Shuffle, batch, and repeat
- Creating a dataset from files on your local storage disk
- Fetching available datasets from the torchvision.datasets library
Building an NN model in PyTorch
- The PyTorch neural network module (torch.nn)
- Building a linear regression model
- Model training via the torch.nn and torch.optim modules
- Building a multilayer perceptron for classifying flowers in the Iris dataset
- Evaluating the trained model on the test dataset
- Saving and reloading the trained model
Choosing activation functions for multilayer neural networks
- Logistic function recap
- Estimating class probabilities in multiclass classification via the softmax function
- Broadening the output spectrum using a hyperbolic tangent
- Rectified linear unit activation
Summary

Chapter 13: Going Deeper – The Mechanics of PyTorch

The key features of PyTorch
PyTorch’s computation graphs
- Understanding computation graphs
- Creating a graph in PyTorch
PyTorch tensor objects for storing and updating model parameters
Computing gradients via automatic differentiation
- Computing the gradients of the loss with respect to trainable variables
- Understanding automatic differentiation
- Adversarial examples
Simplifying implementations of common architectures via the torch.nn module
- Implementing models based on nn.Sequential
- Choosing a loss function
- Solving an XOR classification problem
- Making model building more flexible with nn.Module
- Writing custom layers in PyTorch
Project one – predicting the fuel efficiency of a car
- Working with feature columns
- Training a DNN regression model
Project two – classifying MNIST handwritten digits
Higher-level PyTorch APIs: a short introduction to PyTorch-Lightning
- Setting up the PyTorch Lightning model
- Setting up the data loaders for Lightning
- Training the model using the PyTorch Lightning Trainer class
- Evaluating the model using TensorBoard
Summary

Chapter 14: Classifying Images with Deep Convolutional Neural Networks

The building blocks of CNNs
- Understanding CNNs and feature hierarchies
- Performing discrete convolutions
  - Discrete convolutions in one dimension
  - Padding inputs to control the size of the output feature maps
  - Determining the size of the convolution output
  - Performing a discrete convolution in 2D
- Subsampling layers
Putting everything together – implementing a CNN
- Working with multiple input or color channels
- Regularizing an NN with L2 regularization and dropout
- Loss functions for classification
Implementing a deep CNN using PyTorch
- The multilayer CNN architecture
- Loading and preprocessing the data
- Implementing a CNN using the torch.nn module
  - Configuring CNN layers in PyTorch
  - Constructing a CNN in PyTorch
Smile classification from face images using a CNN
- Loading the CelebA dataset
- Image transformation and data augmentation
- Training a CNN smile classifier
Summary

Chapter 15: Modeling Sequential Data Using Recurrent Neural Networks

Introducing sequential data
- Modeling sequential data – order matters
- Sequential data versus time series data
- Representing sequences
- The different categories of sequence modeling
RNNs for modeling sequences
- Understanding the dataflow in RNNs
- Computing activations in an RNN
- Hidden recurrence versus output recurrence
- The challenges of learning long-range interactions
- Long short-term memory cells
Implementing RNNs for sequence modeling in PyTorch
- Project one – predicting the sentiment of IMDb movie reviews
  - Preparing the movie review data
  - Embedding layers for sentence encoding
  - Building an RNN model
  - Building an RNN model for the sentiment analysis task
- Project two – character-level language modeling in PyTorch
  - Preprocessing the dataset
  - Building a character-level RNN model
  - Evaluation phase – generating new text passages
Summary

Chapter 16: Transformers – Improving Natural Language Processing

with Attention Mechanisms
- Adding an attention mechanism to RNNs
- Attention helps RNNs with accessing information
- The original attention mechanism for RNNs
- Processing the inputs using a bidirectional RNN
- Generating outputs from context vectors
- Computing the attention weights
Introducing the self-attention mechanism
- Starting with a basic form of self-attention
- Parameterizing the self-attention mechanism: scaled dot-product attention
Attention is all we need: introducing the original transformer architecture
- Encoding context embeddings via multi-head attention
- Learning a language model: decoder and masked multi-head attention
- Implementation details: positional encodings and layer normalization
Building large-scale language models by leveraging unlabeled data
- Pre-training and fine-tuning transformer models
- Leveraging unlabeled data with GPT
- Using GPT-2 to generate new text
- Bidirectional pre-training with BERT
- The best of both worlds: BART
Fine-tuning a BERT model in PyTorch
- Loading the IMDb movie review dataset
- Tokenizing the dataset
- Loading and fine-tuning a pre-trained BERT model
- Fine-tuning a transformer more conveniently using the Trainer API
Summary

Chapter 17: Generative Adversarial Networks for Synthesizing New Data

Introducing generative adversarial networks
- Starting with autoencoders
- Generative models for synthesizing new data
- Generating new samples with GANs
- Understanding the loss functions of the generator and discriminator networks in a GAN-model
Implementing a GAN from scratch
- Training GAN models on Google Colab
- Implementing the generator and the discriminator networks
- Defining the training dataset
- Training the GAN model
Improving the quality of synthesized images using a convolutional and Wasserstein GAN
- Transposed convolution
- Batch normalization
- Implementing the generator and discriminator
- Dissimilarity measures between two distributions
- Using EM distance in practice for GANs
- Gradient penalty
- Implementing WGAN-GP to train the DCGAN model
- Mode collapse
Other GAN applications
Summary

Chapter 18: Graph Neural Networks for Capturing Dependencies in Graph Structured Data

Introduction to graph data
- Undirected graphs
- Directed graphs
- Labeled graphs
- Representing molecules as graphs
Understanding graph convolutions
- The motivation behind using graph convolutions
- Implementing a basic graph convolution
Implementing a GNN in PyTorch from scratch
- Defining the NodeNetwork model
- Coding the NodeNetwork’s graph convolution layer
- Adding a global pooling layer to deal with varying graph sizes
- Preparing the DataLoader
- Using the NodeNetwork to make predictions
Implementing a GNN using the PyTorch Geometric library
Other GNN layers and recent developments
- Spectral graph convolutions
- Pooling
- Normalization
- Pointers to advanced graph neural network literature
Summary

Chapter 19: Reinforcement Learning for Decision Making in Complex Environments

Introduction – learning from experience
- Understanding reinforcement learning
- Defining the agent-environment interface of a reinforcement learning system
The theoretical foundations of RL
- Markov decision processes
  - The mathematical formulation of Markov decision processes
  - Visualization of a Markov process
- Episodic versus continuing tasks
- RL terminology: return, policy, and value function
  - The return
  - Policy
  - Value function
- Dynamic programming using the Bellman equation
Reinforcement learning algorithms
- Dynamic programming
Policy evaluation – predicting the value function with dynamic programming
- Improving the policy using the estimated value function
- Policy iteration
- Value iteration
- Reinforcement learning with Monte Carlo
  - State-value function estimation using MC
  - Action-value function estimation using MC
  - Finding an optimal policy using MC control
  - Policy improvement – computing the greedy policy from the action-value function
- Temporal difference learning
  - TD prediction
  - On-policy TD control (SARSA)
  - Off-policy TD control (Q-learning)
Implementing our first RL algorithm
- Introducing the OpenAI Gym toolkit
  - Working with the existing environments in OpenAI Gym
  - A grid world example
  - Implementing the grid world environment in OpenAI Gym
- Solving the grid world problem with Q-learning
A glance at deep Q-learning
- Training a DQN model according to the Q-learning algorithm
  - Replay memory
  - Determining the target values for computing the loss
- Implementing a deep Q-learning algorithm
Chapter and book summary

Machine Learning with PyTorch and Scikit-Learn

Links