I’ve a buffer of bytes, I need to multiply every byte be one other byte like 0x20. A method is to easily iterate over the buffer and multiply every byte. That is clearly suboptimal, SIMD can do that a lot quicker. However utilizing SIMD in Swift is far slower.
On a MacBook Professional M1 Max:
SIMD: 180ms for 100k iterations (working on 64 bytes at a time)
Loop: 35ms for six.4M iterations (working at a single byte)
Right here is the code:
let inBytes = Knowledge(repeating: 0x20, rely: 6400000).withUnsafeBytes { bufferPointer in
// 100K iterations of the outer loop
// Empty whereas loop takes about 2ms
whereas(iteration < 6_400_000 / SIMD64<UInt8>.scalarCount) {
let assumed = bufferPointer.assumingMemoryBound(to: SIMD64<UInt8>.self)
let batch = assumed[0] // Will use the identical batch on a regular basis for testing functions
// This takes 180ms for 100k iterations (6_400_000 bytes / 64 bytes measurement of the simd)
let spaceMask = batch &* 0x20
/*
Trying to do all these operations a lot quicker, they're all sluggish
let spaceMask = batch .== 0x20
let end result = batch &* 0x20
let tabMask = batch .== 0x09
let combinedMask = (spaceMask .| tabMask)._storage
*/
// Utilizing this loop, it takes 35ms whole, operating 6.4 million iterations in whole
var i = 0
whereas(i < 64) {
let batchNumber = batch[i] &* 0x20
i += 1
}
iteration += 1
}
}
I’d count on the SIMD model to be not less than 10x quicker than some time loop, as a substitute I bought 5 instances slower.