EDIT2: The method described in this video requires the use of the fortran "thunking library" bindings for cublas. These steps worked for me:
Download octave 3.6.3 from here:
wget ftp://ftp.gnu.org/gnu/octave/octave-3.6.3.tar.gz
extract all files from the archive:
tar -xzvf octave-3.6.3.tar.gz
change into the octave directory just created:
cd octave-3.6.3
make a directory for your "thunking cublas library"
mkdir mycublas
change into that directory
cd mycublas
build the "thunking cublas library"
g++ -c -fPIC -I/usr/local/cuda/include -I/usr/local/cuda/src -DCUBLAS_GFORTRAN -o fortran_thunking.o /usr/local/cuda/src/fortran_thunking.c ar rvs libmycublas.a fortran_thunking.o
switch back to the main build directory
cd ..
run octave's
configure
with additional options:./configure --disable-docs LDFLAGS="-L/usr/local/cuda/lib64 -lcublas -lcudart -L/home/user2/octave/octave-3.6.3/mycublas -lmycublas"
Note that in the above command line, you will need to change the directory for the second
-L
switch to that which matches the path to yourmycublas
directory that you created in step 4Now edit
octave-3.6.3/liboctave/dMatrix.cc
according to the instructions given in the video. It should be sufficient to replace every instance ofdgemm
withcublas_dgemm
and every instance ofDGEMM
withCUBLAS_DGEMM
. In the octave 3.6.3 version I used, there were 3 such instances of each (lower case and upper case).Now you can build octave:
make
(make sure you are in the
octave-3.6.3
directory)
At this point, for me, Octave built successfully. I did not pursue make install
although I assume that would work. I simply ran octave using the ./run-octave
script in the octave-3.6.3
directory.
The above steps assume a proper and standard CUDA 5.0 install. I will try to respond to CUDA-specific questions or issues, but there are any number of problems that may arise with a general Octave install on your platform. I'm not an octave expert and I won't be able to respond to those. I used CentOS 6.2 for this test.
This method, as indicated, involves modification of the C source files of octave.
Another method was covered in some detail in the S3527 session at the GTC 2013 GPU Tech Conference. This session was actually a hands-on laboratory exercise. Unfortunately the materials on that are not conveniently available. However the method there did not involve any modification of GNU Octave source, but instead uses the LD_PRELOAD
capability of Linux to intercept the BLAS library calls and re-direct (the appropriate ones) to the cublas library.
A newer, better method (using the NVBLAS intercept library) is discussed in this blog article