Question

I want to build a database of image data for machine learning. But how should this be done? I'm assuming people don't just dump all of their image data into a folder? Do they use a relational database management system, like MySQL? Or do they use a NoSQL database, like MongoDB? Is there a textbook that explores this part of machine learning in particular? Is this what "data warehouse" refers to?

Était-ce utile?

La solution

There are several approaches to this as you need both the input (images) and if your problem is a classification one, you need to reliably store the labels. You might also have some additional information about the images that could be useful for your problem:

  • you can store the images in such a way that all information is contained in the permanent store (for instance folder names with the labels that you want to learn and all the images of a given class within that folder). Keras has a method that allows you to create a dataset from a directory tf.keras.preprocessing.image_dataset_from_directory.

  • another way (which I prefer) is to store in a (SQL) database all of the metadata (label, image url in a table for instance). This is more flexible because you can easily change a label, add a new category without having to move images around. This also allows you to change the format and add additional data related to each image.

Licencié sous: CC-BY-SA avec attribution
scroll top