Converting Octave to Use CuBLAS

Question 1

EDIT2: The method described in this video requires the use of the fortran "thunking library" bindings for cublas. These steps worked for me:

Download octave 3.6.3 from here:

wget ftp://ftp.gnu.org/gnu/octave/octave-3.6.3.tar.gz

extract all files from the archive:
```
tar -xzvf octave-3.6.3.tar.gz
```
change into the octave directory just created:
```
cd octave-3.6.3
```
make a directory for your "thunking cublas library"
```
mkdir mycublas
```
change into that directory
```
cd mycublas
```

build the "thunking cublas library"

g++ -c -fPIC -I/usr/local/cuda/include -I/usr/local/cuda/src -DCUBLAS_GFORTRAN -o fortran_thunking.o /usr/local/cuda/src/fortran_thunking.c
ar rvs libmycublas.a fortran_thunking.o

switch back to the main build directory
```
cd ..
```
run octave's configure with additional options:
```
./configure --disable-docs LDFLAGS="-L/usr/local/cuda/lib64 -lcublas -lcudart -L/home/user2/octave/octave-3.6.3/mycublas -lmycublas"
```
Note that in the above command line, you will need to change the directory for the second -L switch to that which matches the path to your mycublas directory that you created in step 4
Now edit octave-3.6.3/liboctave/dMatrix.cc according to the instructions given in the video. It should be sufficient to replace every instance of dgemm with cublas_dgemm and every instance of DGEMM with CUBLAS_DGEMM. In the octave 3.6.3 version I used, there were 3 such instances of each (lower case and upper case).
Now you can build octave:
```
make
```
(make sure you are in the octave-3.6.3 directory)

At this point, for me, Octave built successfully. I did not pursue make install although I assume that would work. I simply ran octave using the ./run-octave script in the octave-3.6.3 directory.

The above steps assume a proper and standard CUDA 5.0 install. I will try to respond to CUDA-specific questions or issues, but there are any number of problems that may arise with a general Octave install on your platform. I'm not an octave expert and I won't be able to respond to those. I used CentOS 6.2 for this test.

This method, as indicated, involves modification of the C source files of octave.

Another method was covered in some detail in the S3527 session at the GTC 2013 GPU Tech Conference. This session was actually a hands-on laboratory exercise. Unfortunately the materials on that are not conveniently available. However the method there did not involve any modification of GNU Octave source, but instead uses the LD_PRELOAD capability of Linux to intercept the BLAS library calls and re-direct (the appropriate ones) to the cublas library.

A newer, better method (using the NVBLAS intercept library) is discussed in this blog article

Question 2

I was able to produce a compiled executable using the information supplied. It's a horrible hack, but it works.

The process looks like this:

First produce an object file for fortran_thunking.c

sudo /usr/local/cuda-5.0/bin/nvcc -O3 -c -DCUBLAS_GFORTRAN fortran_thunking.c

Then move that object file to the src subdirectory in octave

cp /usr/local/cuda-5.0/src/fortran_thunking.o ./octave/src

run make. The compile will fail on the last step. Change to the src directory.

cd src

Then execute the failing final line with the addition of ./fortran_thunking.o -lcudart -lcublas just after octave-main.o. This produces the following command

g++ -I/usr/include/freetype2 -Wall -W -Wshadow -Wold-style-cast -Wformat
 -Wpointer-arith -Wwrite-strings -Wcast-align -Wcast-qual
 -I/usr/local/cuda/include -o .libs/octave octave-main.o 
./fortran_thunking.o -lcudart -lcublas  -L/usr/local/cuda/lib64 
../libgui/.libs/liboctgui.so ../libinterp/.libs/liboctinterp.so 
../liboctave/.libs/liboctave.so -lutil -lm -lpthread -Wl,-rpath 
-Wl,/usr/local/lib/octave/3.7.5

An octave binary will be created in the src/.libs directory. This is your octave executable.

Question 3

In a most recent version of CUDA you don't have to recompile anything. At least as I found in Debian. First, create a config file for NVBLAS (a cuBLAS wrapper). It won't work without it, at all.

tee nvblas.conf <<EOF
NVBLAS_CPU_BLAS_LIB $(dpkg -L libopenblas-base | grep libblas)
NVBLAS_GPU_LIST ALL
EOF

Then use Octave as you would usually do running it with:

LD_PRELOAD=libnvblas.so octave

NVBLAS will do what it can on a GPU while relaying everything else to OpenBLAS.

Converting Octave to Use CuBLAS

UPDATE