I want to build a database of image data for machine learning. But how should this be done? I'm assuming people don't just dump all of their image data into a folder? Do they use a relational database management system, like MySQL? Or do they use a NoSQL database, like MongoDB? Is there a textbook that explores this part of machine learning in particular? Is this what "data warehouse" refers to?

有帮助吗?

解决方案

There are several approaches to this as you need both the input (images) and if your problem is a classification one, you need to reliably store the labels. You might also have some additional information about the images that could be useful for your problem:

  • you can store the images in such a way that all information is contained in the permanent store (for instance folder names with the labels that you want to learn and all the images of a given class within that folder). Keras has a method that allows you to create a dataset from a directory tf.keras.preprocessing.image_dataset_from_directory.

  • another way (which I prefer) is to store in a (SQL) database all of the metadata (label, image url in a table for instance). This is more flexible because you can easily change a label, add a new category without having to move images around. This also allows you to change the format and add additional data related to each image.

许可以下: CC-BY-SA归因
scroll top