Simd Library Release Notes (2020).

Home | Release Notes | Download | Documentation | Issues | GitHub

2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013

April 1, 2020 (version 4.6.88)

Algorithms

New features
  • AVX-512VNNI extension support.
  • AVX2, AVX-512BW, AVX-512VNNI and NEON optimizations of SynetConvolution8iNhwcDirect class.
  • Base implementation and SSE4.1, AVX2 AVX-512BW and NEON optimizations of function SynetPoolingForwardMax8u.
Renaming
  • SynetPoolingForwardMax to SynetPoolingForwardMax32f.
Improving
  • SSE4.1 optimization of SynetConvolution8iNhwcDirect class.
Bug fixing
  • Microsoft Visual Studio 2015 compiler error in function SynetConvert32fTo8u.
  • Degradation of performance of AVX2 code.
  • Microsoft Visual Studio compiler error in function Extract64i (32-bit mode).

Test framework

New features
  • Tests for verifying functionality of function SynetPoolingForwardMax8u.
Home

March 2, 2020 (version 4.5.87)

Algorithms

New features
  • Add parameter of bitwise compatibility of function SynetScaleLayerForward and Inference Engine.
  • Add parameter 'type' to function SynetShuffleLayerForward.
  • Base implementation, SSE2, AVX2, AVX-512BW amd NEON optimizations of function SynetConvert32fTo8u.
  • SimdSynetCompatibilityType enumeration.
  • Base implementation of SynetConvolution8iGemmNN class.
  • Base implementation and SSE4.1 optimization of SynetConvolution8iNhwcDirect class.
Renaming
  • SimdSynetConvertImage to SimdSynetReorderImage.
  • SimdSynetConvertFilter to SimdSynetReorderFilter.

Test framework

New features
  • A new commandline test parameter -c - a number of channels in test image for performance testing.
  • A new commandline test parameter -mt - a minimal test execution time (in milliseconds).
  • Tests for verifying functionality of SynetConvolution8i framework.
  • Tests for verifying functionality of function SynetConvert32fTo8u.

Documentation

Bug fixing
  • Error in description of method Detection::LoadStringXml.
Home

February 3, 2020 (version 4.5.86)

Algorithms

New features
  • SimdResizeMethodInferenceEngineInterp method in Resizer framework.
Improving
  • Performance of Convolution32f framework (NHWC format, kernel=3x3, stride=1x1, large H and W).
  • Performance of AVX-512F and NEON optimizations of function GemmPackA.
  • Performance of Convolution32f framework (NHWC format, GemmNN method).
  • Performance of SSE2, AVX, AVX2, AVX-512F and NEON optimizations of Convolution32f framework (NHWC format, NhwcDirect method, kernel=1x1).
  • Performance of AVX-512F optimization of MergedConvolution32f framework (input convolution).
  • Performance of AVX2 and AVX-512F optimizations of MergedConvolution32f framework (output convolution).
  • Performance of Convolution32f framework (stride > 1).
  • Performance of AVX-512F optimization of Gemm32fNN function (add 6x64 and 6x48 micro kernel).
Bug fixing
  • Error in AVX-512F optimization of function WinogradKernel3x3Block2x2SetOutput (NCHW format).
  • Error in SSE, AVX, AVX-512F and NEON optimizations of function SynetPoolingForwardAverage (NHWC format).
  • Error in AVX-512F optimization of function SynetInnerProductLayerForward.
  • Error in AVX, AVX2 and AVX-512F optimizations of function Gemm32fNT.
  • Error in function WinogradKernel3x3Block4x4SetInput (padX != padY != padW != padH).
  • Error in debug FLOPS annotation of Deconvolution32f framework.
  • MergedConvolution32f framework doesn't work with stride == 3.
Home

January 3, 2020 (version 4.5.85)

Algorithms

New features
  • Base implementation, SSE2, AVX2, AVX-512F and NEON optimizations of function SynetUnaryOperation32fLayerForward.
  • Base implementation, SSE2, AVX2, AVX-512F and NEON optimizations of function SynetSoftplus32f.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block2x2SetFilter.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block2x2SetInput.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block2x2SetOutput.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block4x4SetFilter.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block4x4SetInput.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block4x4SetOutput.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x3Block1x4SetFilter.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x3Block1x4SetInput.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x3Block1x4SetOutput.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x5Block1x4SetFilter.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x5Block1x4SetInput.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x5Block1x4SetOutput.
Improving
  • Performance of Convolution32f framework (NHWC format, kernel=1x1x1).
  • Performance of Convolution32f framework (NHWC format, kernel=2x2).
  • Performance of Convolution32f framework (NHWC format, kernel=1x3).
  • Performance of Convolution32f framework (NHWC format, kernel=1x5).
Renaming
  • NeuralSigmoid to SynetSigmoid32f.
  • NeuralTanh to SynetTanh32f.
  • NeuralRelu to SynetRelu32f.
  • Winograd2x3SetFilter to WinogradKernel3x3Block2x2SetFilter.
  • Winograd2x3SetInput to WinogradKernel3x3Block2x2SetInput.
  • Winograd2x3SetOutput to WinogradKernel3x3Block2x2SetOutput.
  • Winograd3x3SetFilter to WinogradKernel3x3Block3x3SetFilter.
  • Winograd3x3SetInput to WinogradKernel3x3Block3x3SetInput.
  • Winograd3x3SetOutput to WinogradKernel3x3Block3x3SetOutput.
  • Winograd4x4SetFilter to WinogradKernel3x3Block4x4SetFilter.
  • Winograd4x4SetInput to WinogradKernel3x3Block4x4SetInput.
  • Winograd4x4SetOutput to WinogradKernel3x3Block4x4SetOutput.
Bug fixing
  • Error in Convolution32f framework (kernel greater than input size, NHWC format).
  • Potential crash in ContourDetector.

Test framework

New features
  • Tests for verifying functionality of function SynetUnaryOperation32fLayerForward.
  • Tests for verifying functionality of function SynetSoftplus32f.
  • Tests for verifying functionality of function WinogradKernel2x2Block2x2SetFilter.
  • Tests for verifying functionality of function WinogradKernel2x2Block2x2SetInput.
  • Tests for verifying functionality of function WinogradKernel2x2Block2x2SetOutput.
  • Tests for verifying functionality of function WinogradKernel2x2Block4x4SetFilter.
  • Tests for verifying functionality of function WinogradKernel2x2Block4x4SetInput.
  • Tests for verifying functionality of function WinogradKernel2x2Block4x4SetOutput.
  • Tests for verifying functionality of function WinogradKernel1x3Block1x4SetFilter.
  • Tests for verifying functionality of function WinogradKernel1x3Block1x4SetInput.
  • Tests for verifying functionality of function WinogradKernel1x3Block1x4SetOutput.
  • Tests for verifying functionality of function WinogradKernel1x5Block1x4SetFilter.
  • Tests for verifying functionality of function WinogradKernel1x5Block1x4SetInput.
  • Tests for verifying functionality of function WinogradKernel1x5Block1x4SetOutput.
Home
2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013