Spark/Databricks: GPU does not appear to be utilized for ML regression (cross-validation, prediction) notebook

https://datascience.stackexchange.com/questions/70125

10-12-2020
|

Frage

I have created and attached a notebook to a GPU-enabled Databricks cluster (6.4 ML (includes Apache Spark 2.4.5, GPU, Scala 2.11), EC2 type: p2.xlarge).

I have started running the notebook that includes cells with PySpark/MLlib code for performing cross-validation and prediction using a Pipeline consisting of a VectorAssembler, MinMaxScaler, and GBTRegressor.

When I run this job it appears to be utilizing only CPU (Ganglia UI shows no GPU activity whatsoever, but plenty of CPU being used). Perhaps there is PyCpark code I need to add to my notebook and/or configuration settings for the cluster to allow for running this code with the help of my cluster's GPU?

I am new with Spark/MLlib, it's very possible that I am missing something obvious. Thanks in advance for any suggestions!

Lösung

Spark itself does not use GPUs at all, so this is not surprising.

The operations it performs are at best L3 BLAS ops of moderate size, and most are small L1 operations, so generally a GPU isn't a win. It does use BLAS to accelerate those ops in hardware if a BLAS library like MKL or OpenBLAS is present.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit datascience.stackexchange