Frage

I want to do some pattern recognition on my screen and will use the Quartz/PyObjc libraries to get the screenshots.

I get the screenshot as a CGImage. I want to search for a pattern in it using the openCV library, but can't seem to find how to convert the data to be readable by opencv.

So what I want to do is this:

#get screenshot and reference pattern
img = getScreenshot() # returns CGImage instance, custom function, using Quartz
reference = cv2.imread('ref/reference_start.png') #get the reference pattern

#search for the pattern using the opencv library
result = cv2.matchTemplate(screen, reference, cv2.TM_CCOEFF_NORMED)

#this is what I need
minVal,maxVal,minLoc,maxLoc = cv2.minMaxLoc(result)

I have no idea how to do this and can't find information through google.

War es hilfreich?

Lösung 2

All these answers ignore Tom Gangemis comment to this answer. Picture of widths which are not multiples of 64 will be screwed up. I made an efficient approach using np strides:

cg_img = CG.CGWindowListCreateImage(
    CG.CGRectNull,
    CG.kCGWindowListOptionIncludingWindow,
    wnd_id,
    CG.kCGWindowImageBoundsIgnoreFraming | CG.kCGWindowImageNominalResolution
)

bpr = CG.CGImageGetBytesPerRow(cg_img)
width = CG.CGImageGetWidth(cg_img)
height = CG.CGImageGetHeight(cg_img)

cg_dataprovider = CG.CGImageGetDataProvider(cg_img)
cg_data = CG.CGDataProviderCopyData(cg_dataprovider)

np_raw_data = np.frombuffer(cg_data, dtype=np.uint8)

np_data = np.lib.stride_tricks.as_strided(np_raw_data,
                                          shape=(height, width, 3),
                                          strides=(bpr, 4, 1),
                                          writeable=False)

Andere Tipps

To add to Arqu's answer, you may find it faster to use np.frombuffer instead of creating a PIL Image first if your ultimate goal is to use opencv or numpy, because np.frombuffer takes about the same time as Image.frombuffer, but saves you the step of converting from an Image to a numpy array (which takes about 100ms on my machine (everything else takes ~50ms)).

import Quartz.CoreGraphics as CG
from PIL import Image 
import time
import numpy as np

ct = time.time()
region = CG.CGRectInfinite

# Create screenshot as CGImage
image = CG.CGWindowListCreateImage(
    region,
    CG.kCGWindowListOptionOnScreenOnly,
    CG.kCGNullWindowID,
    CG.kCGWindowImageDefault)

width = CG.CGImageGetWidth(image)
height = CG.CGImageGetHeight(image)
bytesperrow = CG.CGImageGetBytesPerRow(image)

pixeldata = CG.CGDataProviderCopyData(CG.CGImageGetDataProvider(image))
image = np.frombuffer(pixeldata, dtype=np.uint8)
image = image.reshape((height, bytesperrow//4, 4))
image = image[:,:width,:]

print('elapsed:', time.time() - ct)

I've been playing around with this also however I needed a bit more performance, so saving to a file and then reading from it again was a bit too slow. In the end after a lot of searching and fiddling around I came up with this:

#get_pixels returns a image reference from CG.CGWindowListCreateImage
imageRef = self.get_pixels()
pixeldata = CG.CGDataProviderCopyData(CG.CGImageGetDataProvider(imageRef))
image = Image.frombuffer("RGBA", (self.width, self.height), pixeldata, "raw", "RGBA", self.stride, 1)
#Color correction from BGRA to RGBA
b, g, r, a = image.split()
image = Image.merge("RGBA", (r, g, b, a))

Also note that since my image was not of a standard size (had to be padded) it had some weird behavior so I had to adapt the stride of the buffer, if you are taking full screen screenshots from standard screen widths you can go with a stride of 0 and it will calculate it automatically.

You can now convert from PIL format to a numpy array to make it easier to work with in OpenCV with:

image = np.array(image)

Here is code that will take a screenshot and save it to a file. To read that in to PIL, just use the standard Image(path). This code is surprisingly fast if you keep the size of the region small. For an 800x800 pixel region, each shot takes less than 50ms on my i7. For the full resolution of a dual monitor setup (2880x1800 + 2560x1440), each shot takes about 1.9 seconds.

Source: https://github.com/troq/flappy-bird-player/blob/master/screenshot.py

import Quartz
import LaunchServices
from Cocoa import NSURL
import Quartz.CoreGraphics as CG

def screenshot(path, region=None):
    """saves screenshot of given region to path
    :path: string path to save to
    :region: tuple of (x, y, width, height)
    :returns: nothing
    """
    if region is None:
        region = CG.CGRectInfinite

    # Create screenshot as CGImage
    image = CG.CGWindowListCreateImage(
        region,
        CG.kCGWindowListOptionOnScreenOnly,
        CG.kCGNullWindowID,
        CG.kCGWindowImageDefault)

    dpi = 72 # FIXME: Should query this from somewhere, e.g for retina displays

    url = NSURL.fileURLWithPath_(path)

    dest = Quartz.CGImageDestinationCreateWithURL(
        url,
        LaunchServices.kUTTypePNG, # file type
        1, # 1 image in file
        None
        )

    properties = {
        Quartz.kCGImagePropertyDPIWidth: dpi,
        Quartz.kCGImagePropertyDPIHeight: dpi,
        }

    # Add the image to the destination, characterizing the image with
    # the properties dictionary.
    Quartz.CGImageDestinationAddImage(dest, image, properties)

    # When all the images (only 1 in this example) are added to the destination,
    # finalize the CGImageDestination object.
    Quartz.CGImageDestinationFinalize(dest)


if __name__ == '__main__':
    # Capture full screen
    screenshot("testscreenshot_full.png")

    # Capture region (100x100 box from top-left)
    region = CG.CGRectMake(0, 0, 100, 100)
    screenshot("testscreenshot_partial.png", region=region)

Here's an enhanced version of Arqu's answer. PIL (at least Pillow) can load BGRA data directly, without need to split & merge.

width = Quartz.CGImageGetWidth(cgimg)
height = Quartz.CGImageGetHeight(cgimg)
pixeldata = Quartz.CGDataProviderCopyData(Quartz.CGImageGetDataProvider(cgimg))
bpr = Quartz.CGImageGetBytesPerRow(image)
# Convert to PIL Image.  Note: CGImage's pixeldata is BGRA
image = Image.frombuffer("RGBA", (width, height), pixeldata, "raw", "BGRA", bpr, 1)
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top