主要内容

Quantization and Pruning

通过执行量化或修剪来压缩深层神经网络

将深度学习工具箱™与深度学习工具箱Model Quantization Library万博1manbetx支持包,以减少深神经网络的内存足迹和计算要求:

  • Quantizing the weights, biases, and activations of layers to reduced precision scaled integer data types. You can then generate C/C++, CUDA®, or HDL code from this quantized network.

  • Pruning filters from convolution layers by using first-order Taylor approximation. You can then generate C/C++ or CUDA code from this pruned network.

Functions

expand all

dlquantizer Quantize a deep neural network to 8-bit scaled integer data types
dlquantizationOptions Options for quantizing a trained deep neural network
calibrate Simulate and collect ranges of a deep neural network
validate Quantize and validate a deep neural network
quantize Create quantized deep neural network
estimateNetworkMetrics Estimate metrics for a specific layers of a neural network
quantizationDetails 显示量化网络的详细信息
taylorPrunableNetwork Network that can be pruned by using first-order Taylor approximation
向前 计算培训深度学习网络输出
预测 Compute deep learning network output for inference
updatePrunables Remove filters from prunable layers based on importance scores
updateScore 计算和累积基于泰勒修剪的重要性得分
dlnetwork 深入学习网络定制培训循环

Apps

Deep Network Quantizer Quantize a deep neural network to 8-bit scaled integer data types

Topics

Deep Learning Quantization

Quantization for GPU Target

Quantization for FPGA Target

Quantization for CPU Target

Pruning