Come ottenere la componente Y da CMSampleBuffer derivato dalla AVCaptureSession?

https://stackoverflow.com/questions/4085474

28-09-2019
|

Domanda

Ehi, sto cercando di accesso ai dati grezzi dalla fotocamera iPhone utilizzando AVCaptureSession. Seguo la guida fornito da Apple ( collegamento qui ).

I dati grezzi dal samplebuffer è in formato YUV (Sono corretto qui circa il formato della trama video grezzo ??), come ottenere direttamente i dati per il componente Y su dati grezzi memorizzati nel samplebuffer.

Soluzione

Quando si imposta l'AVCaptureVideoDataOutput che restituisce il frame raw, è possibile impostare il formato dei frame utilizzando il codice simile al seguente:

[videoOutput setVideoSettings:[NSDictionary dictionaryWithObject:[NSNumber numberWithInt:kCVPixelFormatType_32BGRA] forKey:(id)kCVPixelBufferPixelFormatTypeKey]];

In questo caso viene specificato un formato di pixel BGRA (io ho usato questo per corrispondenza un formato di colore per una texture OpenGL ES). Ogni pixel in quel formato ha un byte per il blu, verde, rosso e alfa, in questo ordine. Andando con questo lo rende facile da estrarre componenti di colore, ma si fa sacrificare un po 'le prestazioni che necessitano di effettuare la conversione dalla spazio colore YUV fotocamera nativa.

Altri spazi di colori supportati sono kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange e kCVPixelFormatType_420YpCbCr8BiPlanarFullRange sui dispositivi più recenti e kCVPixelFormatType_422YpCbCr8 su iPhone 3G. Il suffisso VideoRange o FullRange indica semplicemente se i byte vengono restituiti tra il 16 - 235, dei Y e 16 - 240 in UV o completa 0 -. 255 per ciascun componente

Credo che lo spazio colore predefinito utilizzato da un'istanza AVCaptureVideoDataOutput è il YUV 4: 0 colorspace planare: 2 (tranne su iPhone 3G, dove è YUV 4: 2: 2 interlacciati). Questo significa che ci sono due piani di dati di immagine contenuti all'interno del fotogramma video, con il piano Y venire prima. Per ogni pixel dell'immagine risultante, v'è un byte per il valore Y in quel pixel.

Si potrebbe arrivare a questi dati grezzi Y mediante l'attuazione di qualcosa di simile nel tuo callback delegato:

- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
    CVImageBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
    CVPixelBufferLockBaseAddress(pixelBuffer, 0);

    unsigned char *rawPixelBase = (unsigned char *)CVPixelBufferGetBaseAddress(pixelBuffer);

    // Do something with the raw pixels here

    CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
}

È quindi possibile capire la posizione dei frame di dati per ogni X, Y coordinata sull'immagine e tirare il byte che corrisponde alla componente Y a quella coordinata.

campione FindMyiCone di Apple da WWDC 2010 (accessibili insieme ai video) mostra come elaborare dati grezzi BGRA da ogni fotogramma. Ho anche creato un'applicazione di esempio, che è possibile scaricare il codice per qui , che esegue inseguimento di oggetto di colore a base di utilizzando il video in diretta dalla fotocamera di iPhone. Entrambi mostrano come elaborare i dati dei pixel prime, ma nessuno di questi lavori in spazio colore YUV.

Altri suggerimenti

In aggiunta alla risposta di Brad e il proprio codice, si vuole considerare quanto segue:

Poiché l'immagine ha due piani separati, la funzione CVPixelBufferGetBaseAddress non restituirà l'indirizzo di base del piano, ma piuttosto l'indirizzo di base di una struttura dati aggiuntivi. E 'probabilmente per l'implementazione corrente che si ottiene un indirizzo abbastanza vicino al primo piano in modo che si può vedere l'immagine. Ma è la ragione è spostato e ha spazzatura in alto a sinistra. Il modo corretto per ricevere il primo aereo è:

unsigned char *rowBase = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);

Ogni riga della immagine potrebbe essere più lungo della larghezza dell'immagine (a causa dell'arrotondamento). Ecco perché ci sono funzioni separate per ottenere la larghezza e il numero di byte per riga. Non si dispone di questo problema al momento. Ma che potrebbe cambiare con la prossima versione di iOS. Così il vostro codice dovrebbe essere:

int bufferHeight = CVPixelBufferGetHeight(pixelBuffer);
int bufferWidth = CVPixelBufferGetWidth(pixelBuffer);
int bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0);
int size = bufferHeight * bytesPerRow ;

unsigned char *pixel = (unsigned char*)malloc(size);

unsigned char *rowBase = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);
memcpy (pixel, rowBase, size);

Si prega di notare, inoltre, che il codice verrà miseramente fallire su un iPhone 3G.

Se avete solo bisogno del canale di luminanza, mi raccomando contro utilizzando il formato BGRA, come si tratta con un overhead di conversione. Mela suggerire utilizzando BGRA se si sta facendo cose rendering, ma non è necessario per estrarre le informazioni di luminanza. Come Brad già accennato, il formato più efficiente è il formato YUV fotocamera nativo.

Tuttavia, l'estrazione dei byte dal buffer giusti campione è un po 'complicato, soprattutto per quanto riguarda l'iPhone 3G con il suo intercalati YUV 422 formato. Così qui è il mio codice, che funziona bene con l'iPhone 3G, 3GS, iPod Touch 4 e iPhone 4S.

#pragma mark -
#pragma mark AVCaptureVideoDataOutputSampleBufferDelegate Methods
#if !(TARGET_IPHONE_SIMULATOR)
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection;
{
    // get image buffer reference
    CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);

    // extract needed informations from image buffer
    CVPixelBufferLockBaseAddress(imageBuffer, 0);
    size_t bufferSize = CVPixelBufferGetDataSize(imageBuffer);
    void *baseAddress = CVPixelBufferGetBaseAddress(imageBuffer);
    CGSize resolution = CGSizeMake(CVPixelBufferGetWidth(imageBuffer), CVPixelBufferGetHeight(imageBuffer));

    // variables for grayscaleBuffer 
    void *grayscaleBuffer = 0;
    size_t grayscaleBufferSize = 0;

    // the pixelFormat differs between iPhone 3G and later models
    OSType pixelFormat = CVPixelBufferGetPixelFormatType(imageBuffer);

    if (pixelFormat == '2vuy') { // iPhone 3G
        // kCVPixelFormatType_422YpCbCr8     = '2vuy',    
        /* Component Y'CbCr 8-bit 4:2:2, ordered Cb Y'0 Cr Y'1 */

        // copy every second byte (luminance bytes form Y-channel) to new buffer
        grayscaleBufferSize = bufferSize/2;
        grayscaleBuffer = malloc(grayscaleBufferSize);
        if (grayscaleBuffer == NULL) {
            NSLog(@"ERROR in %@:%@:%d: couldn't allocate memory for grayscaleBuffer!", NSStringFromClass([self class]), NSStringFromSelector(_cmd), __LINE__);
            return nil; }
        memset(grayscaleBuffer, 0, grayscaleBufferSize);
        void *sourceMemPos = baseAddress + 1;
        void *destinationMemPos = grayscaleBuffer;
        void *destinationEnd = grayscaleBuffer + grayscaleBufferSize;
        while (destinationMemPos <= destinationEnd) {
            memcpy(destinationMemPos, sourceMemPos, 1);
            destinationMemPos += 1;
            sourceMemPos += 2;
        }       
    }

    if (pixelFormat == '420v' || pixelFormat == '420f') {
        // kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange = '420v', 
        // kCVPixelFormatType_420YpCbCr8BiPlanarFullRange  = '420f',
        // Bi-Planar Component Y'CbCr 8-bit 4:2:0, video-range (luma=[16,235] chroma=[16,240]).  
        // Bi-Planar Component Y'CbCr 8-bit 4:2:0, full-range (luma=[0,255] chroma=[1,255]).
        // baseAddress points to a big-endian CVPlanarPixelBufferInfo_YCbCrBiPlanar struct
        // i.e.: Y-channel in this format is in the first third of the buffer!
        int bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(imageBuffer, 0);
        baseAddress = CVPixelBufferGetBaseAddressOfPlane(imageBuffer,0);
        grayscaleBufferSize = resolution.height * bytesPerRow ;
        grayscaleBuffer = malloc(grayscaleBufferSize);
        if (grayscaleBuffer == NULL) {
            NSLog(@"ERROR in %@:%@:%d: couldn't allocate memory for grayscaleBuffer!", NSStringFromClass([self class]), NSStringFromSelector(_cmd), __LINE__);
            return nil; }
        memset(grayscaleBuffer, 0, grayscaleBufferSize);
        memcpy (grayscaleBuffer, baseAddress, grayscaleBufferSize); 
    }

    // do whatever you want with the grayscale buffer
    ...

    // clean-up
    free(grayscaleBuffer);
}
#endif

Questo è semplicemente il culmine di tutti gli altri è un lavoro duro, sopra e su altri thread, convertito in rapida 3 per tutti coloro che ritiene utile.

func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
    if let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) {
        CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags.readOnly)

        let pixelFormatType = CVPixelBufferGetPixelFormatType(pixelBuffer)
        if pixelFormatType == kCVPixelFormatType_420YpCbCr8BiPlanarFullRange
           || pixelFormatType == kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange {

            let bufferHeight = CVPixelBufferGetHeight(pixelBuffer)
            let bufferWidth = CVPixelBufferGetWidth(pixelBuffer)

            let lumaBytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0)
            let size = bufferHeight * lumaBytesPerRow
            let lumaBaseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0)
            let lumaByteBuffer = unsafeBitCast(lumaBaseAddress, to:UnsafeMutablePointer<UInt8>.self)

            let releaseDataCallback: CGDataProviderReleaseDataCallback = { (info: UnsafeMutableRawPointer?, data: UnsafeRawPointer, size: Int) -> () in
                // https://developer.apple.com/reference/coregraphics/cgdataproviderreleasedatacallback
                // N.B. 'CGDataProviderRelease' is unavailable: Core Foundation objects are automatically memory managed
                return
            }

            if let dataProvider = CGDataProvider(dataInfo: nil, data: lumaByteBuffer, size: size, releaseData: releaseDataCallback) {
                let colorSpace = CGColorSpaceCreateDeviceGray()
                let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.noneSkipFirst.rawValue)

                let cgImage = CGImage(width: bufferWidth, height: bufferHeight, bitsPerComponent: 8, bitsPerPixel: 8, bytesPerRow: lumaBytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo, provider: dataProvider, decode: nil, shouldInterpolate: false, intent: CGColorRenderingIntent.defaultIntent)

                let greyscaleImage = UIImage(cgImage: cgImage!)
                // do what you want with the greyscale image.
            }
        }

        CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags.readOnly)
    }
}

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow