Question

Well, I post the same question in the main stack before finding the right place, sorry.

A friend of mine is working with more than a 100 videos as sample for his neural network. Each video last more than a couple of minutes with around 24 frames per second. The objective, using deep learning, is to detect movement through all the samples.

The problem for him is the quantity of data he is dealing with. The training part require/consumes too much time. I'm no expert with data preparation, but I thought maybe he could turn all frame into dataframe, clean them from mono color image (full black/white), turn them into gray instead of full rgb and compress them but, I'm not sure if it will be enough.

Do you think of better method to reduce the training sample?

Was it helpful?

Solution

  • Reduce the size e.g. using cv2.resize()
  • Compress the image (it is not lossless) e.g. cv2.imencode()
  • Lower the frame rate
  • Use lower precision - images are uint8 when loaded, but the deep learning frameworks use float32 by default. You could try float16 or mixed precision.

Using JPEG compression has been shown to be fairly good in terms of the reduction in memory and minimal loss of performance. Have a look a this research.

You could also drop the frame rate, so say 10 FPS. The actually values could be computed based on the expected velocity of the moving objects -> do you really require 24 FPS for the task?

Otherwise, the hardware you are using will determine which steps to take afterwards. Memory, number of operations, inference speed etc. will change how you optimise the process.


You mentioned "dataframe", so I will just point out that using Pandas Dataframes to hold raw image data, whilst looking easy, it generally very inefficient due to the number of data-points involved (pixels), and the fact that Pandas DataFrames are essentially annotated NumPy arrays - the annotations take a lot of space. Better to load into pure numpy arrays and use OpenCV for things suchs as making gray-scale (black and white) images from RGB, resizing them, normalising pixel values, and so on.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top