Learning and Selecting Features Jointly with Point-wise Gated Boltzmann Machines

Kihyuk Sohn, Guanyu Zhou, Chansoo Lee, and Honglak Lee

 

Overview

Unsupervised feature learning has emerged as a promising tool in learning representations from unlabeled data. However, it is still challenging to learn useful high-level features when the data contains a significant amount of irrelevant patterns. Although feature selection can be used for such complex data, it may fail when we have to build a learning system from scratch (i.e., starting from the lack of useful raw features).

To address this problem, we propose a point-wise gated Boltzmann machine, a unified generative model that combines feature learning and feature selection. Our model performs not only feature selection on learned high-level features (i.e., hidden units), but also dynamic feature selection on raw features (i.e., visible units) through a gating mechanism. For each example, the model can adaptively focus on a variable subset of visible nodes corresponding to the task-relevant patterns, while ignoring the visible units corresponding to the task-irrelevant patterns. In experiments, our method achieves improved performance over state-of-the-art in several visual recognition benchmarks.

 

Model: Point-wise Gated Boltzmann machine
Supervised PGBM

Supervised point-wise gated Boltzmann machine.

Convolutional PGDN

Convolutional point-wise gated Boltzmann machine.

When we deal with complex data, it is desirable for a learning algorithm to distinguish semantically distinct patterns. For example, an object recognition algorithm may improve its performance if it can separate the foreground object patterns from the background clutters. To model this, we formulate a generative feature learning algorithm called the point-wise gated Boltzmann machine (PGBM). Our model performs feature selection not only on learned high-level features (i.e., hidden units), but also on raw features (i.e., visible units) through a gating mechanism using stochastic “switch units”. The switch units allow our model to estimate where the task-relevant patterns occur, and make only those visible units to contribute to the final prediction through multiplicative interaction. The model ignores the task-irrelevant portion of the raw features, thus it performs dynamic feature selection (i.e., choosing a variable subset of raw features depending on semantic interpretation of the individual example).

 

Evaluation 1: Handwritten digit recognition in the presence of noise
MNIST_bgimg

Visualization of learned filters and switch unit activations on MNIST-back-image.

MNIST_results

Error rates on MNIST variation datasets.


Evaluation 2: Weakly supervised foreground object segmentation
CALTECH_faces

Visualization of (top) learned filters from Caltech-101 Face dataset; filters in left are task-relevant and those in right are task-irrelevant, (middle) switch unit activation maps, and (bottom) corresponding examples from Caltech-101 Face dataset overlayed with predicted (red) and ground truth (green) bounding boxes.

CALTECH_car

Visualization of (Top) learned filters from Caltech-101 Car dataset; filters in left are task-relevant and those in right are task-irrelevant, (middle) switch unit activation maps, and (bottom) corresponding examples from Caltech-101 Car dataset overlayed with predicted (red) and ground truth (green) bounding boxes.

CALTECH_101

More examples from other object categories of Caltech-101 dataset.



Publication

Learning and Selecting Features Jointly with Point-wise Gated Boltzmann Machines.

Kihyuk Sohn, Guanyu Zhou, Chansoo Lee, and Honglak Lee.

In Proceedings of the 30th International Conference on Machine Learning (ICML), 2013.


Download

[Paper][Supplementary Material][Presentation Slide]

[Matlab code]


Feedback

Please email me if you have any question.


Acknowledgments

This work was supported in part by NSF IIS 1247414 and a Google Faculty Research Award.