Detecting a Specific Watermark in a Photo with Python (without SciPy)

Question 1

Another possibility is to use machine learning. My background is natural language processing (not computer vision), but I tried creating a training and testing set using the description of your problem and it seems to work (100% accuracy on unseen data).

Training set

The training set consisted of the same images with the watermark (positive example), and without the watermark (negative example).

Testing set

The testing set consists of images that were not in the training set.

Example data

If interested, you can try it with the example training and testing images.

Code:

Full version available as a gist. Excerpt below:

import glob

from classify import MultinomialNB
from PIL import Image


TRAINING_POSITIVE = 'training-positive/*.jpg'
TRAINING_NEGATIVE = 'training-negative/*.jpg'
TEST_POSITIVE = 'test-positive/*.jpg'
TEST_NEGATIVE = 'test-negative/*.jpg'

# How many pixels to grab from the top-right of image.
CROP_WIDTH, CROP_HEIGHT = 100, 100
RESIZED = (16, 16)


def get_image_data(infile):
    image = Image.open(infile)
    width, height = image.size
    # left upper right lower
    box = width - CROP_WIDTH, 0, width, CROP_HEIGHT
    region = image.crop(box)
    resized = region.resize(RESIZED)
    data = resized.getdata()
    # Convert RGB to simple averaged value.
    data = [sum(pixel) / 3 for pixel in data]
    # Combine location and value.
    values = []
    for location, value in enumerate(data):
        values.extend([location] * value)
    return values


def main():
    watermark = MultinomialNB()
    # Training
    count = 0
    for infile in glob.glob(TRAINING_POSITIVE):
        data = get_image_data(infile)
        watermark.train((data, 'positive'))
        count += 1
        print 'Training', count
    for infile in glob.glob(TRAINING_NEGATIVE):
        data = get_image_data(infile)
        watermark.train((data, 'negative'))
        count += 1
        print 'Training', count
    # Testing
    correct, total = 0, 0
    for infile in glob.glob(TEST_POSITIVE):
        data = get_image_data(infile)
        prediction = watermark.classify(data)
        if prediction.label == 'positive':
            correct += 1
        total += 1
        print 'Testing ({0} / {1})'.format(correct, total)
    for infile in glob.glob(TEST_NEGATIVE):
        data = get_image_data(infile)
        prediction = watermark.classify(data)
        if prediction.label == 'negative':
            correct += 1
        total += 1
        print 'Testing ({0} / {1})'.format(correct, total)
    print 'Got', correct, 'out of', total, 'correct'


if __name__ == '__main__':
    main()

Example output

Training 1
Training 2
Training 3
Training 4
Training 5
Training 6
Training 7
Training 8
Training 9
Training 10
Training 11
Training 12
Training 13
Training 14
Testing (1 / 1)
Testing (2 / 2)
Testing (3 / 3)
Testing (4 / 4)
Testing (5 / 5)
Testing (6 / 6)
Testing (7 / 7)
Testing (8 / 8)
Testing (9 / 9)
Testing (10 / 10)
Got 10 out of 10 correct
[Finished in 3.5s]

Question 2

Is the position of the watermark exact? How is the watermark being applied to the background image?

I'll assume the watermark is a partial add or multiply function. The watermarked image is probably calculated as such:

resultPixel = imagePixel + (watermarkPixel*mixinValue)

mixinValue would be 0.0-1.0, you could therefore complete the mix by reapplying the watermark with a multiplier of (1-mixinValue). This should result in pixels that match the watermark. Just test to color of the result image against the original watermark.

testPixel = resultPixel + (watermarkPixel*(1-mixinValue))
assert testPixel == watermarkPixel

Of course compression of the watermarked image will probably cause some variance in your testPixel.

Question 3

You can always use the Specialized Image Recognition API of restb.ai to automate the watermark detection process.

import requests

url = "https://api.restb.ai/segmentation"

querystring = {"client_key":"your-free-key-here","model_id":"re_logo","image_url":"http://demo.restb.ai/img/gallery/realestate/logos-watermarks/re_logo-1.jpg"}

response = requests.request("GET", url, params=querystring)

print(response.text)

Screenshot of Logo & Watermark detection demo