I am leveraging the front-facing TrueDepth digital camera together with Imaginative and prescient to acknowledge factors within the picture and run some measurements. I perceive Imaginative and prescient coordinates are normalized, so I am changing the Imaginative and prescient normalized factors to CGPoints similar to the View, then trying to match these to the depthData in dataOutputSynchronizer to get the z worth. Then utilizing the digital camera intrinsics I am trying to get a distance between 2 factors in 3D house.
I’ve efficiently discovered the factors and (I imagine) transformed them to display factors. My pondering right here is that these CGPoints can be no completely different than if I tapped them on the display.
My problem is that despite the fact that the transformed CGPoints stay largely comparable (my hand does transfer round just a little throughout testing however stays largely planar to the digital camera ) and I am trying to calculate the depth place in the identical manner, the depths could be wildly completely different – particularly level 2. Depth Level 2 appears extra correct by way of calculated distance (my hand is about 1 foot from the digital camera) however it varies rather a lot and nonetheless will not be correct.
Here’s a console print with related information
there are 2 factors discovered
acknowledged factors
[(499.08930909633636, 634.0807711283367), (543.7462849617004, 1061.8824380238852)]
DEPTH POINT 1 = 3.6312041
DEPTH POINT 2 = 0.2998223
there are 2 factors discovered
acknowledged factors
[(498.33644700050354, 681.3769372304281), (602.3667773008347, 1130.4955183664956)]
DEPTH POINT 1 = 3.6276162
DEPTH POINT 2 = 0.560331
Right here is a number of the related code.
dataOutputSynchronizer
func dataOutputSynchronizer(_ synchronizer: AVCaptureDataOutputSynchronizer,
didOutput synchronizedDataCollection: AVCaptureSynchronizedDataCollection) {
var handPoints: [CGPoint] = []
// Learn all outputs
guard renderingEnabled,
let syncedDepthData: AVCaptureSynchronizedDepthData =
synchronizedDataCollection.synchronizedData(for: depthDataOutput) as? AVCaptureSynchronizedDepthData,
let syncedVideoData: AVCaptureSynchronizedSampleBufferData =
synchronizedDataCollection.synchronizedData(for: videoDataOutput) as? AVCaptureSynchronizedSampleBufferData else {
// solely work on synced pairs
return
}
if syncedDepthData.depthDataWasDropped || syncedVideoData.sampleBufferWasDropped {
return
}
let depthPixelBuffer = syncedDepthData.depthData.depthDataMap
guard let videoPixelBuffer = CMSampleBufferGetImageBuffer(syncedVideoData.sampleBuffer) else {
return
}
// Get the cameraIntrinsics
guard let cameraIntrinsics = syncedDepthData.depthData.cameraCalibrationData?.intrinsicMatrix else {
return
}
let picture = CIImage(cvPixelBuffer: videoPixelBuffer)
let handler = VNImageRequestHandler(
cmSampleBuffer: syncedVideoData.sampleBuffer,
orientation: .up,
choices: [:]
)
do {
strive handler.carry out([handPoseRequest])
guard
let outcomes = handPoseRequest.outcomes?.prefix(2),
!outcomes.isEmpty
else {
return
}
var recognizedPoints: [VNRecognizedPoint] = []
strive outcomes.forEach { statement in
let fingers = strive statement.recognizedPoints(.all)
if let middleTipPoint = fingers[.middleDIP] {
recognizedPoints.append(middleTipPoint)
}
if let wristPoint = fingers[.wrist] {
recognizedPoints.append(wristPoint)
}
}
// Retailer the Factors in handPoints if they're assured factors
handPoints = recognizedPoints.filter {
$0.confidence > 0.90
}
.map {
// Alter the Y
CGPoint(x: $0.location.x, y: 1 - $0.location.y)
}
// Course of the Factors Discovered
DispatchQueue.predominant.sync {
self.processPoints(handPoints,depthPixelBuffer,videoPixelBuffer,cameraIntrinsics)
}
} catch {
// Be extra swish right here
}
}
Course of Factors
func processPoints(_ handPoints: [CGPoint],_ depthPixelBuffer: CVImageBuffer,_ videoPixelBuffer: CVImageBuffer,_ cameraIntrinsics: simd_float3x3) {
// This converts the normalized level to display factors
// cameraView.previewLayer is a AVCaptureVideoPreviewLayer inside a UIView
let convertedPoints = handPoints.map {
cameraView.previewLayer.layerPointConverted(fromCaptureDevicePoint: $0)
}
// We'd like 2 hand factors to get the gap
if handPoints.rely == 2 {
print("there are 2 factors discovered");
print("acknowledged factors")
print(convertedPoints)
let handVisionPoint1 = convertedPoints[0]
let handVisionPoint2 = convertedPoints[1]
let scaleFactor = CGFloat(CVPixelBufferGetWidth(depthPixelBuffer)) / CGFloat(CVPixelBufferGetWidth(videoPixelBuffer))
CVPixelBufferLockBaseAddress(depthPixelBuffer, .readOnly)
let floatBuffer = unsafeBitCast(CVPixelBufferGetBaseAddress(depthPixelBuffer), to: UnsafeMutablePointer<Float32>.self)
let width = CVPixelBufferGetWidth(depthPixelBuffer)
let top = CVPixelBufferGetHeight(depthPixelBuffer)
let handVisionPixelX = Int((handVisionPoint1.x * scaleFactor).rounded())
let handVisionPixelY = Int((handVisionPoint1.y * scaleFactor).rounded())
let handVisionPixe2X = Int((handVisionPoint2.x * scaleFactor).rounded())
let handVisionPixe2Y = Int((handVisionPoint2.y * scaleFactor).rounded())
CVPixelBufferLockBaseAddress(depthPixelBuffer, .readOnly)
let rowDataPoint1 = CVPixelBufferGetBaseAddress(depthPixelBuffer)! + handVisionPixelY * CVPixelBufferGetBytesPerRow(depthPixelBuffer)
let handVisionPoint1Depth = rowDataPoint1.assumingMemoryBound(to: Float32.self)[handVisionPixelX]
print("DEPTH POINT 1 = ", handVisionPoint1Depth)
let rowDataPoint2 = CVPixelBufferGetBaseAddress(depthPixelBuffer)! + handVisionPixe2Y * CVPixelBufferGetBytesPerRow(depthPixelBuffer)
let handVisionPoint2Depth = rowDataPoint2.assumingMemoryBound(to: Float32.self)[handVisionPixelX]
print("DEPTH POINT 2 = ", handVisionPoint2Depth)
//Int((width - touchPoint.x) * (top - touchPoint.y))
}
In my thoughts proper now I am pondering my logic for locating the right pixel within the depth map is inaccurate. If that’s not the case, then I am questioning if the information stream is out of synch. However truthfully, I am just a bit misplaced in the intervening time. Thanks for any help!