Preface

xvii

Introduction

(19)

Machine Perception

(1)

An Example

(8)

Related Fields

(1)

Pattern Recognition Systems

(5)

Sensing

(1)

Segmentation and Grouping

(2)

Feature Extraction

(1)

Classification

(1)

Post Processing

(1)

The Design Cycle

(2)

Data Collection

(1)

Feature Choice

(1)

Model Choice

(1)

Training

(1)

Evaluation

(1)

Computational Complexity

(1)

Learning and Adaptation

(1)

Supervised Learning

(1)

Unsupervised Learning

(1)

Reinforcement Learning

(1)

Conclusion

(3)

Summary by chapters

(1)

Bibliographical and Historical Remarks

(1)

Bibliography

(1)

Bayesian Decision Theory

(64)

Introduction

(4)

Bayesian Decison Theory---Continuous Features

(2)

Two-Category Classification

(1)

Minimum-error-Rate Classification

(3)

Minimax Criterion

(1)

Neyman-Pearson Criterion

(1)

Classifiers, Discriminant Functions, and Decision Surfaces

(2)

The Multicategory Case

(1)

The Two-Category Case

(1)

The Normal Density

(5)

Univariate Density

(1)

Multivariate Density

(3)

Discriminant Functions for the Normal Density

(9)

Case 1: Σi = σ2I

(3)

Case 2: Σi = Σ

(2)

Case 3: Σi = arbitrary

(1)

Decision Regions for Two-Dimensional Gaussian Date

(4)

Error Probabilities and Integrals

(1)

Error Bounds for Normal Densities

(5)

Chernoff Bound

(1)

Bhattacharyya Bound

(1)

Error Bounds for Gaussian Distribution

(1)

Signal Detection Theory and Operating Characteristics

(3)

Bayes Decision Theory---Discrete Features

(3)

Independent Binary Features

(1)

Bayesian Decisions for Three-Dimensional Binary Data

(1)

Missing and Noisy Features

(2)

Missing Features

(1)

Noisy Features

(1)

Bayesian Belief Network

(6)

Belief Network for Fish

(3)

Compound Bayesian Decision Theory and Context

(22)

Summary

(1)

Bibliographical and Historical Remarks

(1)

Problems

(15)

Computer exercises

(2)

Bibliography

(2)

Maximum-Likelihood and Bayesian Parameter Estimation

(77)

Introduction

(1)

Maximum-Likelihood Estimation

(5)

The General Principle

(3)

The Gaussian Case: Unknown μ

(1)

The Gaussian Case: Unknown μ and Σ

(1)

Bias

(1)

Bayesian Estimation

(2)

The Class-Conditional Densities

(1)

The Parameter Distribution

(1)

Bayesian Parameter Estimation: Gaussian Case

(5)

The Univariate Case: p(μ\D)

(3)

The Univariate Case: p(x\D)

(1)

The Multivariate Case

(2)

Bayesian Parameter Estimation: General Theory

(5)

Recursive Bayes Learning

(2)

When Do Maximum-Likelihood and bayes Methods Differ?

100

(1)

Noninformative Priors and Invariance

101

(1)

Gibbs Algorithm

102

(1)

Sufficient Statistics

102

(5)

Sufficient Statistics and the Exponential Family

106

(1)

Problems of Dimensionality

107

(7)

Accuracy Dimension and Training Sample Size

107

(4)

Computational Complexity

111

(2)

Overfitting

113

(1)

Component Analysis and Discriminants

114

(10)

Principal Component Analysis (PCA)

115

(2)

Fisher Linear Discriminant

117

(4)

Multiple Discriminant Analysis

121

(3)

Expectation-Maximization (EM)

124

(4)

Expectation-Maximization for a 2D Normal Model

126

(2)

Hidden Markov Models

128

(33)

First-Order Markov Models

128

(1)

First-Order Hidden Markov Models

129

(1)

Hidden Markov Model Computation

129

(2)

Evaluation

131

(2)

Hidden Markov Model

133

(2)

Decoding

135

(1)

HMM Decoding

136

(1)

Learning

137

(2)

Summary

139

(1)

Bibliographical and Historical Remarks

139

(1)

Problems

140

(15)

Computer exercises

155

(4)

Bibliography

159

(2)

Nonparametric Techniques

161

(54)

Introduction

161

(1)

Density Estimation

161

(3)

Parzen Windows

164

(10)

Convergence of the Mean

167

(1)

Convergence of the Variance

167

(1)

Illustrations

168

(1)

Classification Example

168

(4)

Probabilistic Neural Networks (PNNs)

172

(2)

Choosing the Window Function

174

(1)

kn-Nearest-Neighbor Estimation

174

(3)

kn-Nearest-Neighbor and Parzen-Window Estimation

176

(1)

Estimation of A Posteriori Probabilities

177

(1)

The Nearest-Neighbor Rule

177

(10)

Convergence of the Nearest Neighbor

179

(1)

Error Rate for the Nearest-Neighbor Rule

180

(1)

Error Bounds

180

(2)

The k-Nearest-Neighbor Rule

182

(2)

Computational Complexity of the k-Nearest-Neighbor Rule

184

(3)

Metrics and Nearest-Neighbor Classification

187

(5)

Properties of Metrics

187

(1)

Tangent Distance

188

(4)

Fuzzy Classification

192

(3)

Reduced Coulomb Energy Networks

195

(2)

Approximations by Series Expansions

197

(18)

Summary

199

(1)

Bibliographical and Historical Remarks

200

(1)

Problems

201

(8)

Computer exercises

209

(4)

Bibliography

213

(2)

Linear Discriminant Functions

215

(67)

Introduction

215

(1)

Linear Discriminant Functions and Decision Surfaces

216

(3)

The Two-Category Case

216

(2)

The Multicategory Case

218

(1)

Generalized Linear Discriminant Functions

219

(4)

The Two-Category Linearly Separable Case

223

(4)

Geometry and Terminology

224

(1)

Gradient Descent Procedures

224

(3)

Minimizing the Perceptron Criterion Function

227

(8)

The Perceptron Criterion Function

227

(2)

Convergence Proof for Single-Sample Correction

229

(3)

Some Direct Generalizations

232

(3)

Relaxation Procedures

235

(3)

The Descent Algorithm

235

(2)

Convergence Proof

237

(1)

Nonseparable Behavior

238

(1)

Minimum Squared-Error Procedures

239

(10)

Minimum Squared-Error and the Pseudoinverse

240

(1)

Constructing a Linear Classifier by Matrix Pseudoinverse

241

(1)

Relation to Fisher's Linear Discriminant

242

(1)

Asymptotic Approximation to an Optimal Discriminant

243

(2)

The Widrow-Hoff or LMS Procedure

245

(1)

Stochastic Approximation Methods

246

(3)

The Ho-Kashyap Procedures

249

(7)

The Descent Procedure

250

(1)

Convergence Proof

251

(2)

Nonseparable Behavior

253

(1)

Some Related Procedures

253

(3)

Linear Programming Algorithms

256

(3)

Linear Programming

256

(1)

The Linearly Separable Case

257

(1)

Minimizing the Perceptron Criterion Function

258

(1)

Support Vector Machines

259

(6)

SVM Training

263

(1)

SVM for the XOR Problem

264

(1)

Multicategory Generalizations

265

(17)

Kesler's Construction

266

(1)

Convergence of the Fixed-Increment Rule

266

(2)

Generalizations for MSE Procedures

268

(1)

Summary

269

(1)

Bibliographical and Historical Remarks

270

(1)

Problems

271

(7)

Computer exercises

278

(3)

Bibliography

281

(1)

Multilayer Neural Networks

282

(68)

Introduction

282

(2)

Feedforward Operation and Classification

284

(4)

General Feedforward Operation

286

(1)

Expressive Power of Multilayer Networks

287

(1)

Backpropagation Algorithm

288

(8)

Network Learning

289

(4)

Training Protocols

293

(2)

Learning Curves

295

(1)

Error Surfaces

296

(3)

Some Small Networks

296

(2)

The Exclusive-OR (XOR)

298

(1)

Larger Networks

298

(1)

How Important Are Multiple Minima?

299

(1)

Backpropagation as Feature Mapping

299

(4)

Representations at the Hidden Layer-Weights

302

(1)

Backpropagation, Bayes Theory and Probability

303

(2)

Bayes Discriminants and Neural Networks

303

(1)

Outputs as Probabilities

304

(1)

Related Statistical Techniques

305

(1)

Practical Techniques for Improving Backpropagation

306

(12)

Activation Function

307

(1)

parameters for the Sigmoid

308

(1)

Scaling Input

308

(1)

Target Values

309

(1)

Training with Noise

310

(1)

Manufacturing Data

310

(1)

Number of Hidden Units

310

(1)

Initializing Weights

311

(1)

Learning Rates

312

(1)

Momentum

313

(1)

Weight Decay

314

(1)

Hints

315

(1)

On-Line, Stochastic or Batch Training?

316

(1)

Stopped Training

316

(1)

Number of Hidden Layers

317

(1)

Criterion Function

318

(1)

Second-Order Methods

318

(6)

Hessian Matrix

318

(1)

Newton's Method

319

(1)

Quickprop

320

(1)

Conjugate Gradient Descent

321

(1)

Conjugate Gradient Descent

322

(2)

Additional Networks and Training Methods

324

(6)

Radial Basis Function Networks (RBFs)

324

(1)

Special Bases

325

(1)

Matched Filters

325

(1)

Convolutional Networks

326

(2)

Recurrent Networks

328

(1)

Cascade-Correlation

329

(1)

Regularization, Complexity Adjustment and Pruning

330

(20)

Summary

333

(1)

Bibliographical and Historical Remarks

333

(2)

Problems

335

(8)

Computer exercises

343

(4)

Bibliography

347

(3)

Stochastic Methods

350

(44)

Introduction

350

(1)

Stochastic Search

351

(9)

Simulated Annealing

351

(1)

The Boltzmann Factor

352

(5)

Deterministic Simulated Annealing

357

(3)

Boltzmann Learning

360

(10)

Stochastic Boltzmann Learning of Visible States

360

(5)

Missing Features and Category Constraints

365

(1)

Deterministic Boltzmann Learning

366

(1)

Initialization and Setting Parameters

367

(3)

Boltzmann Networks and Graphical Models

370

(3)

Other Graphical Models

372

(1)

Evolutionary Methods

373

(5)

Genetic Algorithms

373

(4)

Further Heuristics

377

(1)

Why Do They Work?

378

(1)

Genetic Programming

378

(16)

Summary

381

(1)

Bibliographical and Historical Remarks

381

(2)

Problems

383

(5)

Computer exercises

388

(3)

Bibliography

391

(3)

Nonmetric Methods

394

(59)

Introduction

394

(1)

Decision Trees

395

(1)

CART

396

(15)

Number of Splits

397

(1)

Query Selection and Node Impurity

398

(4)

When to Stop Splitting

402

(1)

Pruning

403

(1)

Assignment of Leaf Node Labels

404

(1)

A Simple Tree

404

(2)

Computational Complexity

406

(1)

Feature Choice

407

(1)

Multivariate Decision Trees

408

(1)

Priors and Costs

409

(1)

Missing Attributes

409

(1)

Surrogate Splits and Missing Attributes

410

(1)

Other Tree Methods

411

(2)

ID3

411

(1)

C4.5

411

(1)

Which Tree Classifier Is Best?

412

(1)

Recognition with Strings

413

(8)

String Matching

415

(3)

Edit Distance

418

(2)

Computational Complexity

420

(1)

String Matching with Errors

420

(1)

String Matching with the ``Don't-Care'' Symbol

421

(1)

Grammatical Methods

421

(8)

Grammars

422

(2)

Types of String Grammars

424

(1)

A Grammar for Pronouncing Numbers

425

(1)

Recognition Using Grammars

426

(3)

Grammatical Inference

429

(2)

Grammatical Inference

431

(1)

Rule-Based Methods

431

(22)

Learning Rules

433

(1)

Summary

434

(1)

Bibliographical and Historical Remarks

435

(2)

Problems

437

(9)

Computer exercises

446

(4)

Bibliography

450

(3)

Algorithm-Independent Machine Learning

453

(64)

Introduction

453

(1)

Lack of Inherent Superiority of Any Classifier

454

(11)

No Free Lunch Theorem

454

(3)

No Free Lunch for Binary Data

457

(1)

Ugly Duckling Theorem

458

(3)

Minimum Description Length (MDL)

461

(2)

Minimum Description Length Principle

463

(1)

Overfitting Avoidance and Occam's Razor

464

(1)

Bias and Variance

465

(6)

Bias and Variance for Regression

466

(2)

Bias and Variance for Classification

468

(3)

Resampling for Estimating Statistics

471

(4)

Jackknife

472

(1)

Jackknife Estimate of Bias and Variance of the Mode

473

(1)

Bootstrap

474

(1)

Resampling for Classifier Design

475

(7)

Bagging

475

(1)

Boosting

476

(4)

Learning with Queries

480

(2)

Arcing, Learning with Queries, Bias and Variance

482

(1)

Estimating and Comparing Classifiers

482

(13)

Parametric Models

483

(1)

Cross-Validation

483

(2)

Jackknife and Bootstrap Estimation of Classification Accuracy

485

(1)

Maximum-Likelihood Model Comparison

486

(1)

Bayesian Model Comparison

487

(2)

The Problem-Average Error Rate

489

(3)

Predicting Final Performance from Learning Curves

492

(2)

The Capacity of a Separating Plane

494

(1)

Combining Classifiers

495

(22)

Component Classifiers with Discriminant Functions

496

(2)

Component Classifiers without Discriminant Functions

498

(1)

Summary

499

(1)

Bibliographical and Historical Remarks

500

(2)

Problems

502

(6)

Computer exercises

508

(5)

Bibliography

513

(4)

Unsupervised Learning and Clustering

517

(84)

Introduction

517

(1)

Mixture Densities and Identifiability

518

(1)

Maximum-Likelihood Estimates

519

(2)

Application to Normal Mixtures

521

(9)

Case 1: Unknown Mean Vectors

522

(2)

Case 2: All Parameters Unknown

524

(2)

k-Means Clustering

526

(2)

Fuzzy k-Means Clustering

528

(2)

Unsupervised Bayesian Learning

530

(7)

The Bayes Classifier

530

(1)

Learning the Parameter Vector

531

(3)

Unsupervised Learning of Gaussian Data

534

(2)

Decision-Directed Approximation

536

(1)

Data Description and Clustering

537

(5)

Similarity Measures

538

(4)

Criterion Functions for Clustering

542

(6)

The Sum-of-Squared-Error Criterion

542

(1)

Related Minimum Variance Criteria

543

(1)

Scatter Criteria

544

(2)

Clustering Criteria

546

(2)

Iterative Optimization

548

(2)

Hierarchical Clustering

550

(7)

Definitions

551

(1)

Agglomerative Hierarchical Clustering

552

(3)

Stepwise-Optimal Hierarchical Clustering

555

(1)

Hierarchical Clustering and Induced Metrics

556

(1)

The Problem of Validity

557

(2)

On-line clustering

559

(7)

Unknown Number of Clusters

561

(2)

Adaptive Resonance

563

(2)

Learning with a Critic

565

(1)

Graph-Theoretic Methods

566

(2)

Component Analysis

568

(5)

Principal Component Analysis (PCA)

568

(1)

Nonlinear Component Analysis (NLCA)

569

(1)

Independent Component Analysis (ICA)

570

(3)

Low-Dimensional Representations and Multidimensional Scaling (MDS)

573

(28)

Self-Organizing Feature Maps

576

(4)

Clustering and Dimensionality Reduction

580

(1)

Summary

581

(1)

Bibliographical and Historical Remarks

582

(1)

Problems

583

(10)

Computer exercises

593

(5)

Bibliography

598

(3)

A MATHEMATICAL FOUNDATIONS

601

(36)

A.1 Notation

601

(3)

A.2 Linear Algebra

604

(6)

A.2.1 Notation and Preliminaries

604

(1)

A.2.2 Inner Product

605

(1)

A.2.3 Outer Product

606

(1)

A.2.4 Derivatives of Matrices

606

(2)

A.2.5 Determinant and Trace

608

(1)

A.2.6 Matrix Inversion

609

(1)

A.2.7 Eigenvectors and Eigenvalues

609

(1)

A.3 Lagrange Optimization

610

(1)

A.4 Probability Theory

611

(12)

A.4.1 Discrete Random Variables

611

(1)

A.4.2 Expected Values

611

(1)

A.4.3 Pairs of Discrete Random Variables

612

(1)

A.4.4 Statistical Independence

613

(1)

A.4.5 Expected Values of Functions of Two Variables

613

(1)

A.4.6 Conditional Probability

614

(1)

A.4.7 The Law of Total Probability and Bayes' Rule

615

(1)

A.4.8 Vector Random Variables

616

(1)

A.4.9 Expectations, Mean Vectors and Covariance Matrices

617

(1)

A.4.10 Continuous Random Variables

618

(2)

A.4.11 Distributions of Sums of Independent Random Variables

620

(1)

A.4.12 Normal Distributions

621

(2)

A.5 Gaussian Derivatives and Integrals

623

(5)

A.5.1 Multivariate Normal Densities

624

(2)

A.5.2 Bivariate Normal Densities

626

(2)

A.6 Hypothesis Testing

628

(2)

A.6.1 Chi-Squared Test

629

(1)

A.7 Information Theory

630

(3)

A.7.1 Entropy and Information

630

(2)

A.7.2 Relative Entropy

632

(1)

A.7.3 Mutual Information

632

(1)

A.8 Computational Complexity

633

(4)

Bibliography

635

(2)

Index

637

Amazon no longer offers textbook rentals. We do!

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

Pattern Classification

9780471056690

0471056693

Supplemental Materials

Summary

Author Biography

Table of Contents

Supplemental Materials

Rewards Program