Question

I have a large number of images that I need to classify for training a clustering algorithm, and I would like to do so offline (the data is proprietary). Basically, I'd like to build a desktop survey tool that enables me to rapidly place each image into one or two categories. Ideally, the tool would:

  1. Search in a pre-specified desktop folder for an image;
  2. Display the image and a static list of categories, allowing me to click on one;
  3. Upon clicking, record the category associated with the image;
  4. Store the image filename and associated category in a dataset somewhere;
  5. Display the next untagged image in the folder and repeat the process.

Is there an easy way to build this kind of tool in Python, or some other pre-built utility that I could use for free offline?

Was it helpful?

Solution

I would recommend building your own database-backed Web app, since you have proprietary data and few (only two?) classes. I would create tables for the images, users, and labels.

user: (id, name)
image: (id, url)
label: (user.id, image.id, time, class)

The label class can be an enum. If you don't want to let users rate the same image multiple times you can drop the time column, and set the first two ID columns as the compound primary key.

If you've never dealt with Web applications and databases it will seem complicated, but it is easy once you get the hang of it. Here is a tutorial. The benefit of this approach is persistence; you can turn off your computer and the start where you left off thanks to the database.

A simpler alternative is to collect all your data in one session using GUI components such as ipywidgets for jupyter, and writing the labels to a file. With this approach you do not get persistence.

OTHER TIPS

One great online service is Dataturks. Super streamlined UX and easy to use. Also supports your private data on our internal cloud.

enter image description here

Also supports, polygons, segmentation etc.

enter image description here

Here is a demo you can try (with no signup required):

Demo Image Classification

P.S: Since browsers are not allowed to access files on your local disk directly, you might need to run a dummy web server to get local URLs to files.

I'd suggest to use Labelbox (https://www.labelbox.io/).

With Labelbox, you can easily classify or segment images with your expert labeling team. Pre installed labeling interfaces support customizable single or multiple choice forms, bounding boxes, polygon, point and line tools.

Labelbox can be also be used with your data hosted on premise or a private cloud.

enter image description here

I have created a code fulfilling the 5 requirements you have, it is available on GitHub as image-sorter2. Compared to the other suggested scripts here image-sorter2 is 100% free of charges and you don't need to spend time on drawing bounding boxes - the script simply opens a GUI for you, you click on one of multiple buttons and correspondingly each image is sorted into the desired class-folder, e.g. "cats", "dogs", "trucks" a.s.o.

enter image description here

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top