Introduction

RaijinCL is a library for matrix operations for OpenCL. GPU architectures vary widely so it is difficult to provide a single implementation of kernels that work well everywhere. Therefore, RaijinCL is an autotuning library. Instead of providing a single optimized implementation of kernels, it generates many different kernels, tests it on the user's machine and records the best performing kernel.

Initial results are very encouraging. For example, RaijinCL is competitive with GEMM provided by AMD's OpenCL BLAS on AMD GPUs and competitive with CUBLAS on Nvidia GPUs that I tested. A detailed description and results can be found in this technical report.

Author Info

The library was written by Rahul Garg and is a part of his ongoing PhD thesis at McGill University about compiling array-based languages to GPUs. The work was done under the supervision of Prof. Laurie Hendren at the SABLE research group. You can send me an email at my firstname.lastname with domain being mail.mcgill.ca.

Download

You can clone the repository at Bitbucket. The library is now considered stable for use. However, an experimental extension to the library (related to use of proper FMA instructions) that enhances performance on some GPUs is available by private request. Device profiles, which contain pretuned information for devices, is also available upon request for some devices such as AMD Radeon 7970, Nvidia Tesla C2050, Intel HD 4000. Device profiles will be made public very soon.

Supported platforms

RaijinCL is written in C++, currently supported on Linux and usually tested against AMD and Nvidia SDKs. If you are interested in my help in porting to a different hardware or OS, contact me.

Status

Provides basic SGEMM, DGEMM, CGEMM and ZGEMM implementation with good performance.
Matrix transpose kernels
Matrix-vector multiply
Provides initial support for sum and product reduction routines for float, double and complex. This is quite general and applicable to reduction along one particular axis dimension of multi-dimensional strided matrices. The common one-dimensional reduction is a supported special case.
Element-wise unitary operations (such as computing sin, cos, exponentation etc.) on multidimensional matrices.

Long term plans

Long term plan is to also provide support for some matrix decomposition operators as well. However, no timeline can be guaranteed.

License

Licensed under Apache v2.0 license. Have fun!