Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations; Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio

We introduce a method to train Quantized Neural Networks (QNNs)
— neural networks with extremely low precision (e.g., 1-bit)
weights and activations, at run-time. At train-time the
quantized weights and activations are used for computing the
parameter gradients. During the forward pass, QNNs drastically
reduce memory size and accesses, and replace most arithmetic
operations with bit-wise operations. As a result, power
consumption is expected to be drastically reduced. We trained
QNNs over the MNIST, CIFAR-10, SVHN and ImageNet datasets. The
resulting QNNs achieve prediction accuracy comparable to their
32-bit counterparts. For example, our quantized version of
AlexNet with 1-bit weights and 2-bit activations achieves $51\%$
top-1 accuracy. Moreover, we quantize the parameter gradients to
6-bits as well which enables gradients computation using only
bit-wise operation. Quantized recurrent neural networks were
tested over the Penn Treebank dataset, and achieved comparable
accuracy as their 32-bit counterparts using only 4-bits. Last
but not least, we programmed a binary matrix multiplication GPU
kernel with which it is possible to run our MNIST QNN 7 times
faster than with an unoptimized GPU kernel, without suffering
any loss in classification accuracy. The QNN code is available
online.

***

Note from Journals.Today : This content has been auto-generated from a syndicated feed.