Thanks. I'm very grateful, and so proud to be a part of this community.

This post is about how to use the petsc4py and PyCUDA together to write applications that use several GPUs in parallel.

PETSc supports the use of CUDA GPUs via the
CUSP C++ library.
The PETSc provided `VECCUSP`

and `AIJCUSP`

classes
are used to store vectors and matrices respectively on GPUs.
Vector operations and matrix vector products
with these classes are performed on the GPU.
All the PETSc linear solvers (except BiCG)
are thus able to run entirely on the GPU.

Sometimes, you'll want to perform computations with vectors that can't be expressed using the available vector operations or matrix-vector products. In this case, you'll need access to the underlying GPU (device) buffer, which you pass to your own CUDA kernel that performs the computation.

In C++, this is done using the PETSc functions
`VecCUSPGetArrayRead`

and `VecCUSPGetArrayWrite`

that expose the underlying CUSP vectors.
From the CUSP vector,
you can get a pointer to the raw device memory
using the `thrust::raw_pointer_cast`

function.
This device pointer can be passed to your own custom kernels,
functions from other GPU libraries, whatever.
See a few examples of this here.

Here's how to do it with `petsc4py`

and PyCUDA.
We'll multiply two vectors (elementwise)
using a custom kernel.
The following bits of code all go in a single Python script `gpumult.py`

.
First, we'll create the input and output vectors:

Next, we'll use the `getCUDAHandle`

method
to get the raw CUDA pointers
of the underlying GPU buffers:

Next, we'll write a CUDA kernel implementing the elementwise product, and use PyCUDA to interface with it:

Now, we'll perform the multiplication:

The API requires us to "restore" the CUDA handles.

Actually, it doesn't matter what the input is to
the `restoreCUDAHandle`

function - but whatever.

Now look at the output vector:

Here's a run of the above program on 2 processes:

```
$ mpiexec -n 2 python gpumult.py
Vec Object: 2 MPI processes
type: mpicusp
Process [0]
6.
6.
6.
6.
Process [1]
6.
6.
6.
6.
```

You can also use the
raw pointer to construct a PyCUDA GPUArray,
or pass it to a function from the
`scikits.cuda`

library, whatever.

I wanted to look up all the top-level, grid related functions available in NumPy. On the IPython shell, on a whim, I typed:

```
import numpy as np
np.*grid*?
```

not knowing what to expect. It did just what I wanted:

```
np.meshgrid
np.mgrid
np.ogrid
```

That made my day. Thanks, IPython!

Page: 1 of 2
Older