Machine Learning with PyTorch and Scikit-Learn

ISBN-10: 1801819319
ISBN-13: 978-1801819312
Paperback: 770 pages Packt Publishing Ltd. (February 25, 2022)

About this book

Initially, this project started as the 4th edition of Python Machine Learning. However, after putting so much passion and hard work into the changes and new topics, we thought it deserved a new title.
So, what’s new? There are many contents and additions, including the switch from TensorFlow to PyTorch, new chapters on graph neural networks and transformers, a new section on gradient boosting, and many more that I will detail in a separate blog post.
For those who are interested in knowing what this book covers in general, I’d describe it as a comprehensive resource on the fundamental concepts of machine learning and deep learning. The first half of the book introduces readers to machine learning using scikit-learn, the defacto approach for working with tabular datasets. Then, the second half of this book focuses on deep learning, including applications to natural language processing and computer vision.
While basic knowledge of Python is required, this book will take readers on a journey from understanding machine learning from the ground up towards training advanced deep learning models by the end of the book.

More info

Table of Contents (short version)

  1. Machine Learning - Giving Computers the Ability to Learn from Data
  2. Training Machine Learning Algorithms for Classification
  3. A Tour of Machine Learning Classifiers Using Scikit-Learn
  4. Building Good Training Sets – Data Pre-Processing
  5. Compressing Data via Dimensionality Reduction
  6. Learning Best Practices for Model Evaluation and Hyperparameter Optimization
  7. Combining Different Models for Ensemble Learning
  8. Applying Machine Learning to Sentiment Analysis
  9. Predicting Continuous Target Variables with Regression Analysis
  10. Working with Unlabeled Data – Clustering Analysis
  11. Implementing a Multi-layer Artificial Neural Network from Scratch
  12. Parallelizing Neural Network Training with PyTorch
  13. Going Deeper – The Mechanics of PyTorch
  14. Classifying Images with Deep Convolutional Neural Networks
  15. Modeling Sequential Data Using Recurrent Neural Networks
  16. Transformers – Improving Natural Language Processing with Attention Mechanisms
  17. Generative Adversarial Networks for Synthesizing New Data
  18. Graph Neural Networks for Capturing Dependencies in Graph Structured Data
  19. Reinforcement Learning for Decision Making in Complex Environments

================

Detailed Table of Contents

Preface

Chapter 1: Giving Computers the Ability to Learn from Data

  • Building intelligent machines to transform data into knowledge
  • The three different types of machine learning
    • Making predictions about the future with supervised learning
      • Classification for predicting class labels
      • Regression for predicting continuous outcomes
    • Solving interactive problems with reinforcement learning
    • Discovering hidden - structures with unsupervised learning
      • Finding subgroups with clustering
      • Dimensionality reduction for data compression
  • Introduction to the basic terminology and notations
    • Notation and conventions used in this book
    • Machine learning terminology
  • A roadmap for building machine learning systems
    • Preprocessing – getting data into shape
    • Training and selecting a predictive model
    • Evaluating models and predicting unseen data instances
    • Using Python for machine learning
    • Installing Python and packages from the Python Package Index
    • Using the Anaconda Python distribution and package manager
    • Packages for scientific computing, data science, and machine learning
  • Summary

Chapter 2: Training Simple Machine Learning Algorithms for Classification

  • Artificial neurons – a brief glimpse into the early history of machine learning
    • The formal definition of an artificial neuron
    • The perceptron learning rule
  • Implementing a perceptron learning algorithm in Python
    • An object-oriented perceptron API
    • Training a perceptron model on the Iris dataset
    • Adaptive linear neurons and the convergence of learning
    • Minimizing loss functions with gradient descent
    • Implementing Adaline in Python
    • Improving gradient descent through feature scaling
    • Large-scale machine learning and stochastic gradient descent
  • Summary

Chapter 3: A Tour of Machine Learning Classifiers Using Scikit-Learn

  • Choosing a classification algorithm
  • First steps with scikit-learn – training a perceptron
  • Modeling class probabilities via logistic regression
    • Logistic regression and conditional probabilities
    • Learning the model weights via the logistic loss function
    • Converting an Adaline implementation into an algorithm for logistic regression
    • Training a logistic regression model with scikit-learn
    • Tackling overfitting via regularization
  • Maximum margin classification with support vector machines
    • Maximum margin intuition
    • Dealing with a nonlinearly separable case using slack variables
    • Alternative implementations in scikit-learn
  • Solving nonlinear problems using a kernel SVM
    • Kernel methods for linearly inseparable data
    • Using the kernel trick to find separating hyperplanes in a high-dimensional space
  • Decision tree learning
    • Maximizing IG – getting the most bang for your buck
    • Building a decision tree
    • Combining multiple decision trees via random forests
  • K-nearest neighbors – a lazy learning algorithm
  • Summary

Chapter 4: Building Good Training Datasets – Data Preprocessing

  • Dealing with missing data
    • Identifying missing values in tabular data
    • Eliminating training examples or features with missing values
    • Imputing missing values
    • Understanding the scikit-learn estimator API
  • Handling categorical data
    • Categorical data encoding with pandas
    • Mapping ordinal features
    • Encoding class labels
    • Performing one-hot encoding on nominal features
      • Optional: encoding ordinal features
  • Partitioning a dataset into separate training and test datasets
  • Bringing features onto the same scale
  • Selecting meaningful features
    • L1 and L2 regularization as penalties against model complexity
    • A geometric interpretation of L2 regularization
    • Sparse solutions with L1 regularization
    • Sequential feature selection algorithms
  • Assessing feature importance with random forests
  • Summary

Chapter 5: Compressing Data via Dimensionality Reduction

  • Unsupervised dimensionality reduction via principal component analysis
    • The main steps in principal component analysis
    • Extracting the principal components step by step
    • Total and explained variance
    • Feature transformation
    • Principal component analysis in scikit-learn
    • Assessing feature contributions
  • Supervised data compression via linear discriminant analysis
    • Principal component analysis versus linear discriminant analysis
    • The inner workings of linear discriminant analysis
    • Computing the scatter matrices
    • Selecting linear discriminants for the new feature subspace
    • Projecting examples onto the new feature space
    • LDA via scikit-learn
  • Nonlinear dimensionality reduction and visualization
    • Why consider nonlinear dimensionality reduction?
    • Visualizing data via t-distributed stochastic neighbor embedding
  • Summary

Chapter 6: Learning Best Practices for Model Evaluation and Hyperparameter Tuning

  • Streamlining workflows with pipeline
    • Loading the Breast Cancer Wisconsin dataset
    • Combining transformers and estimators in a pipeline
  • Using k-fold cross-validation to assess model performance
    • The holdout method
    • K-fold cross-validation
  • Debugging algorithms with learning and validation curves
    • Diagnosing bias and variance problems with learning curves
    • Addressing over- and underfitting with validation curves
  • Fine-tuning machine learning models via grid search
    • Tuning hyperparameters via grid search
    • Exploring hyperparameter configurations more widely with randomized search
    • More resource-efficient hyperparameter search with successive halving
    • Algorithm selection with nested cross-validation
  • Looking at different performance evaluation metrics
    • Reading a confusion matrix
    • Optimizing the precision and recall of a classification model
    • Plotting a receiver operating characteristic
    • Scoring metrics for multiclass classification
    • Dealing with class imbalance
  • Summary

Chapter 7: Combining Different Models for Ensemble Learning

  • Learning with ensembles
    • Combining classifiers via majority vote
    • Implementing a simple majority vote classifier
    • Using the majority voting principle to make predictions
    • Evaluating and tuning the ensemble classifier
  • Bagging – building an ensemble of classifiers from bootstrap samples
    • Bagging in a nutshell
    • Applying bagging to classify examples in the Wine dataset
  • Leveraging weak learners via adaptive boosting
    • How adaptive boosting works
    • Applying AdaBoost using scikit-learn
  • Gradient boosting – training an ensemble based on loss gradients
    • Comparing AdaBoost with gradient boosting
    • Outlining the general gradient boosting algorithm
    • Explaining the gradient boosting algorithm for classification
    • Illustrating gradient boosting for classification
    • Using XGBoost
  • Summary

Chapter 8: Applying Machine Learning to Sentiment Analysis

  • Preparing the IMDb movie review data for text processing
    • Obtaining the movie review dataset
    • Preprocessing the movie dataset into a more convenient format
  • Introducing the bag-of-words model
    • Transforming words into feature vectors
    • Assessing word relevancy via term frequency-inverse document frequency
    • Cleaning text data
    • Processing documents into tokens
  • Training a logistic regression model for document classification
  • Working with bigger data – online algorithms and out-of-core learning
  • Topic modeling with latent Dirichlet allocation
    • Decomposing text documents with LDA
    • LDA with scikit-learn
  • Summary

Chapter 9: Predicting Continuous Target Variables with Regression Analysis

  • Introducing linear regression
    • Simple linear regression
    • Multiple linear regression
  • Exploring the Ames Housing dataset
    • Loading the Ames Housing dataset into a DataFrame
    • Visualizing the important characteristics of a dataset
    • Looking at relationships using a correlation matrix
  • Implementing an ordinary least squares linear regression model
    • Solving regression for regression parameters with gradient descent
    • Estimating the coefficient of a regression model via scikit-learn
  • Fitting a robust regression model using RANSAC
  • Evaluating the performance of linear regression models
  • Using regularized methods for regression
  • Turning a linear regression model into a curve – polynomial regression
    • Adding polynomial terms using scikit-learn
    • Modeling nonlinear relationships in the Ames Housing dataset
  • Dealing with nonlinear relationships using random forests
    • Decision tree regression
    • Random forest regression
  • Summary

Chapter 10: Working with Unlabeled Data – Clustering Analysis

  • Grouping objects by similarity using k-means
    • k-means clustering using scikit-learn
    • A smarter way of placing the initial cluster centroids using k-means++
    • Hard versus soft clustering
    • Using the elbow method to find the optimal number of clusters
    • Quantifying the quality of clustering via silhouette plots
  • Organizing clusters as a hierarchical tree
    • Grouping clusters in a bottom-up fashion
    • Performing hierarchical clustering on a distance matrix
    • Attaching dendrograms to a heat map
    • Applying agglomerative clustering via scikit-learn
  • Locating regions of high density via DBSCAN
  • Summary

Chapter 11: Implementing a Multilayer Artificial Neural Network from Scratch 335

  • Modeling complex functions with artificial neural networks
    • Single-layer neural network recap
    • Introducing the multilayer neural network architecture
    • Activating a neural network via forward propagation
  • Classifying handwritten digits
    • Obtaining and preparing the MNIST dataset
    • Implementing a multilayer perceptron
    • Coding the neural network training loop
    • Evaluating the neural network performance
  • Training an artificial neural network
    • Computing the loss function
    • Developing your understanding of backpropagation
    • Training neural networks via backpropagation
  • About convergence in neural networks
  • A few last words about the neural network implementation
  • Summary

Chapter 12: Parallelizing Neural Network Training with PyTorch

  • PyTorch and training performance
    • Performance challenges
    • What is PyTorch?
    • How we will learn PyTorch
  • First steps with PyTorch
    • Installing PyTorch
    • Creating tensors in PyTorch
    • Manipulating the data type and shape of a tensor
    • Applying mathematical operations to tensors
    • Split, stack, and concatenate tensors
  • Building input pipelines in PyTorch
    • Creating a PyTorch DataLoader from existing tensors
    • Combining two tensors into a joint dataset
    • Shuffle, batch, and repeat
    • Creating a dataset from files on your local storage disk
    • Fetching available datasets from the torchvision.datasets library
  • Building an NN model in PyTorch
    • The PyTorch neural network module (torch.nn)
    • Building a linear regression model
    • Model training via the torch.nn and torch.optim modules
    • Building a multilayer perceptron for classifying flowers in the Iris dataset
    • Evaluating the trained model on the test dataset
    • Saving and reloading the trained model
  • Choosing activation functions for multilayer neural networks
    • Logistic function recap
    • Estimating class probabilities in multiclass classification via the softmax function
    • Broadening the output spectrum using a hyperbolic tangent
    • Rectified linear unit activation
  • Summary

Chapter 13: Going Deeper – The Mechanics of PyTorch

  • The key features of PyTorch
  • PyTorch’s computation graphs
    • Understanding computation graphs
    • Creating a graph in PyTorch
  • PyTorch tensor objects for storing and updating model parameters
  • Computing gradients via automatic differentiation
    • Computing the gradients of the loss with respect to trainable variables
    • Understanding automatic differentiation
    • Adversarial examples
  • Simplifying implementations of common architectures via the torch.nn module
    • Implementing models based on nn.Sequential
    • Choosing a loss function
    • Solving an XOR classification problem
    • Making model building more flexible with nn.Module
    • Writing custom layers in PyTorch
  • Project one – predicting the fuel efficiency of a car
    • Working with feature columns
    • Training a DNN regression model
  • Project two – classifying MNIST handwritten digits
  • Higher-level PyTorch APIs: a short introduction to PyTorch-Lightning
    • Setting up the PyTorch Lightning model
    • Setting up the data loaders for Lightning
    • Training the model using the PyTorch Lightning Trainer class
    • Evaluating the model using TensorBoard
  • Summary

Chapter 14: Classifying Images with Deep Convolutional Neural Networks

  • The building blocks of CNNs
    • Understanding CNNs and feature hierarchies
    • Performing discrete convolutions
      • Discrete convolutions in one dimension
      • Padding inputs to control the size of the output feature maps
      • Determining the size of the convolution output
      • Performing a discrete convolution in 2D
    • Subsampling layers
  • Putting everything together – implementing a CNN
    • Working with multiple input or color channels
    • Regularizing an NN with L2 regularization and dropout
    • Loss functions for classification
  • Implementing a deep CNN using PyTorch
    • The multilayer CNN architecture
    • Loading and preprocessing the data
    • Implementing a CNN using the torch.nn module
      • Configuring CNN layers in PyTorch
      • Constructing a CNN in PyTorch
  • Smile classification from face images using a CNN
    • Loading the CelebA dataset
    • Image transformation and data augmentation
    • Training a CNN smile classifier
  • Summary

Chapter 15: Modeling Sequential Data Using Recurrent Neural Networks

  • Introducing sequential data
    • Modeling sequential data – order matters
    • Sequential data versus time series data
    • Representing sequences
    • The different categories of sequence modeling
  • RNNs for modeling sequences
    • Understanding the dataflow in RNNs
    • Computing activations in an RNN
    • Hidden recurrence versus output recurrence
    • The challenges of learning long-range interactions
    • Long short-term memory cells
  • Implementing RNNs for sequence modeling in PyTorch
    • Project one – predicting the sentiment of IMDb movie reviews
      • Preparing the movie review data
      • Embedding layers for sentence encoding
      • Building an RNN model
      • Building an RNN model for the sentiment analysis task
    • Project two – character-level language modeling in PyTorch
      • Preprocessing the dataset
      • Building a character-level RNN model
      • Evaluation phase – generating new text passages
  • Summary

Chapter 16: Transformers – Improving Natural Language Processing

  • with Attention Mechanisms
    • Adding an attention mechanism to RNNs
    • Attention helps RNNs with accessing information
    • The original attention mechanism for RNNs
    • Processing the inputs using a bidirectional RNN
    • Generating outputs from context vectors
    • Computing the attention weights
  • Introducing the self-attention mechanism
    • Starting with a basic form of self-attention
    • Parameterizing the self-attention mechanism: scaled dot-product attention
  • Attention is all we need: introducing the original transformer architecture
    • Encoding context embeddings via multi-head attention
    • Learning a language model: decoder and masked multi-head attention
    • Implementation details: positional encodings and layer normalization
  • Building large-scale language models by leveraging unlabeled data
    • Pre-training and fine-tuning transformer models
    • Leveraging unlabeled data with GPT
    • Using GPT-2 to generate new text
    • Bidirectional pre-training with BERT
    • The best of both worlds: BART
  • Fine-tuning a BERT model in PyTorch
    • Loading the IMDb movie review dataset
    • Tokenizing the dataset
    • Loading and fine-tuning a pre-trained BERT model
    • Fine-tuning a transformer more conveniently using the Trainer API
  • Summary

Chapter 17: Generative Adversarial Networks for Synthesizing New Data

  • Introducing generative adversarial networks
    • Starting with autoencoders
    • Generative models for synthesizing new data
    • Generating new samples with GANs
    • Understanding the loss functions of the generator and discriminator networks in a GAN-model
  • Implementing a GAN from scratch
    • Training GAN models on Google Colab
    • Implementing the generator and the discriminator networks
    • Defining the training dataset
    • Training the GAN model
  • Improving the quality of synthesized images using a convolutional and Wasserstein GAN
    • Transposed convolution
    • Batch normalization
    • Implementing the generator and discriminator
    • Dissimilarity measures between two distributions
    • Using EM distance in practice for GANs
    • Gradient penalty
    • Implementing WGAN-GP to train the DCGAN model
    • Mode collapse
  • Other GAN applications
  • Summary

Chapter 18: Graph Neural Networks for Capturing Dependencies in Graph Structured Data

  • Introduction to graph data
    • Undirected graphs
    • Directed graphs
    • Labeled graphs
    • Representing molecules as graphs
  • Understanding graph convolutions
    • The motivation behind using graph convolutions
    • Implementing a basic graph convolution
  • Implementing a GNN in PyTorch from scratch
    • Defining the NodeNetwork model
    • Coding the NodeNetwork’s graph convolution layer
    • Adding a global pooling layer to deal with varying graph sizes
    • Preparing the DataLoader
    • Using the NodeNetwork to make predictions
  • Implementing a GNN using the PyTorch Geometric library
  • Other GNN layers and recent developments
    • Spectral graph convolutions
    • Pooling
    • Normalization
    • Pointers to advanced graph neural network literature
  • Summary

Chapter 19: Reinforcement Learning for Decision Making in Complex Environments

  • Introduction – learning from experience
    • Understanding reinforcement learning
    • Defining the agent-environment interface of a reinforcement learning system
  • The theoretical foundations of RL
    • Markov decision processes
      • The mathematical formulation of Markov decision processes
      • Visualization of a Markov process
    • Episodic versus continuing tasks
    • RL terminology: return, policy, and value function
      • The return
      • Policy
      • Value function
    • Dynamic programming using the Bellman equation
  • Reinforcement learning algorithms
    • Dynamic programming
  • Policy evaluation – predicting the value function with dynamic programming
    • Improving the policy using the estimated value function
    • Policy iteration
    • Value iteration
    • Reinforcement learning with Monte Carlo
      • State-value function estimation using MC
      • Action-value function estimation using MC
      • Finding an optimal policy using MC control
      • Policy improvement – computing the greedy policy from the action-value function
    • Temporal difference learning
      • TD prediction
      • On-policy TD control (SARSA)
      • Off-policy TD control (Q-learning)
  • Implementing our first RL algorithm
    • Introducing the OpenAI Gym toolkit
      • Working with the existing environments in OpenAI Gym
      • A grid world example
      • Implementing the grid world environment in OpenAI Gym
    • Solving the grid world problem with Q-learning
  • A glance at deep Q-learning
    • Training a DQN model according to the Q-learning algorithm
      • Replay memory
      • Determining the target values for computing the loss
    • Implementing a deep Q-learning algorithm
  • Chapter and book summary