xnd

xnd is a package for general typed containers. xnd relies on the libndtypes library for typing and memory layout information.

Libxnd

C library.

libxnd

libxnd implements support for typed memory blocks using the libndtypes type library.

Types include ndarrays, ragged arrays (compatible with the Arrow list type), optional data (bitmaps are compatible with Arrow), tuples, records (structs), strings, bytes and categorical values.

Data structures

libxnd is a lightweight container library that leaves most of the work to libndtypes. The underlying idea is to have a small struct that contains bitmaps, a linear index, a type and a data pointer.

The type contains all low level access information.

Since the struct is small, it can easily be passed around by value when it serves as a view on the data.

The owner of the data is a master buffer with some additional bookkeeping fields.

Bitmaps
typedef struct xnd_bitmap xnd_bitmap_t;

struct xnd_bitmap {
    uint8_t *data;      /* bitmap */
    int64_t size;       /* number of subtree bitmaps in the "next" array */
    xnd_bitmap_t *next; /* array of bitmaps for subtrees */
};

libxnd supports data with optional values. Any type can have a bitmap for its data. Bitmaps are addressed using the linear index in the xnd_t struct.

For a scalar, the bitmap contains just one addressable bit.

For fixed and variable arrays, the bitmap contains one bit for each item.

Container types like tuples and records have new bitmaps for each of their fields if any of the field subtrees contains optional data.

These field bitmaps are in the next array.

View
/* Typed memory block, usually a view. */
typedef struct xnd {
    xnd_bitmap_t bitmap; /* bitmap tree */
    int64_t index;       /* linear index for var dims */
    const ndt_t *type;   /* type of the data */
    char *ptr;           /* data */
} xnd_t;

This is the xnd view, with a typed data pointer, the current linear index and a bitmap.

When passing the view around, the linear index needs to be maintained because it is required to determine if a value is missing (NA).

The convention is to apply the linear index (adjust the ptr and set it to 0) only when a non-optional element at ndim == 0 is actually accessed.

This happens for example when getting the value of an element or when descending into a record.

Flags
#define XND_OWN_TYPE     0x00000001U /* type pointer */
#define XND_OWN_DATA     0x00000002U /* data pointer */
#define XND_OWN_STRINGS  0x00000004U /* embedded string pointers */
#define XND_OWN_BYTES    0x00000008U /* embedded bytes pointers */
#define XND_OWN_POINTERS 0x00000010U /* embedded pointers */

#define XND_OWN_ALL (XND_OWN_TYPE |    \
                     XND_OWN_DATA |    \
                     XND_OWN_STRINGS | \
                     XND_OWN_BYTES |   \
                     XND_OWN_POINTERS)

#define XND_OWN_EMBEDDED (XND_OWN_DATA |    \
                          XND_OWN_STRINGS | \
                          XND_OWN_BYTES |   \
                          XND_OWN_POINTERS)

The ownership flags for the xnd master buffer (see below). Like libndtypes, libxnd itself has no notion of how many exported views a master buffer has.

This is deliberately done in order to prevent two different memory management schemes from getting in each other’s way.

However, for deallocating a master buffer the flags must be set correctly.

XND_OWN_TYPE is set if the master buffer owns the ndt_t.

XND_OWN_DATA is set if the master buffer owns the data pointer.

The string, bytes and ref types have pointers that are embedded in the data. Usually, these are owned and deallocated by libxnd.

For strings, the Python bindings use the convention that NULL strings are interpreted as the empty string. Once a string pointer is initialized it belongs to the master buffer.

Macros
/* Convenience macros to extract embedded values. */
#define XND_POINTER_DATA(ptr) (*((char **)ptr))
#define XND_BYTES_SIZE(ptr) (((ndt_bytes_t *)ptr)->size)
#define XND_BYTES_DATA(ptr) (((ndt_bytes_t *)ptr)->data)

These macros should be used to extract embedded ref, string and bytes data.

Master buffer
/* Master memory block. */
typedef struct xnd_master {
    uint32_t flags; /* ownership flags */
    xnd_t master;   /* typed memory */
} xnd_master_t;

This is the master buffer. flags are explained above, the master buffer should be considered constant.

For traversing memory, copy a new view buffer by value.

Slice and index keys
enum xnd_key { Index, FieldName, Slice };
typedef struct {
  enum xnd_key tag;
  union {
    int64_t Index;
    const char *FieldName;
    ndt_slice_t Slice;
  };
} xnd_index_t;

Slicing and indexing uses the same model as Python. Indices are usually integers, but record fields may also be indexed with field names.

ndt_slice_t has start, stop, step fields that must be filled in with normalized values following the same protocol as PySlice_Unpack.

Functions

Create typed memory blocks

The main use case for libxnd is to create and manage typed memory blocks. These blocks are fully initialized to 0. References to additional memory blocks are allocated and initialized recursively.

bytes and string types are initialized to NULL, since their actual length is not known yet.

xnd_master_t *xnd_empty_from_string(const char *s, uint32_t flags, ndt_context_t *ctx);

Return a new master buffer according to the type string in s. flags must include XND_OWN_TYPE.

xnd_master_t *xnd_empty_from_type(const ndt_t *t, uint32_t flags, ndt_context_t *ctx);

Return a new master buffer according to type. flags must not include XND_OWN_TYPE, i.e. the type is externally managed.

This is the case in the Python bindings, where the ndtypes module creates and manages types.

Delete typed memory blocks
void xnd_del(xnd_master_t *x);

Delete the master buffer according to its flags. x may be NULL. x->master.ptr and x->master.type may be NULL.

The latter situation should only arise when breaking up reference cycles. This is used in the Python module.

Bitmaps
xnd_bitmap_t xnd_bitmap_next(const xnd_t *x, int64_t i, ndt_context_t *ctx);

Get the next bitmap for the Tuple, Record, Ref and Constr types.

This is a convenience function that checks if the types have optional subtrees.

If yes, return the bitmap at index i. If not, it return an empty bitmap that must not be accessed.

void xnd_set_valid(xnd_t *x);

Set the validity bit at x->index. x must have an optional type.

void xnd_set_na(xnd_t *x);

Clear the validity bit at x->index. x must have an optional type.

int xnd_is_valid(const xnd_t *x);

Check if the element at x->index is valid. If x does not have an optional type, return 1. Otherwise, return the validity bit (zero or nonzero).

int xnd_is_na(const xnd_t *x);

Check if the element at x->index is valid. If x does not have an optional type, return 0. Otherwise, return the negation of the validity bit.

xnd_t xnd_subtree(const xnd_t *x, const xnd_index_t indices[], int len,
                  ndt_context_t *ctx);

Apply zero or more indices to the input x and return a typed view. Valid indices are integers or strings for record fields.

This function is more general than pure array indexing, hence the name. For example, it is possible to index into nested records that in turn contain arrays.

xnd_t xnd_multikey(const xnd_t *x, const xnd_index_t indices[], int len,
                   ndt_context_t *ctx);

Apply zero or more keys to the input x and return a typed view. Valid keys are integers or slices.

This function differs from xnd_subtree in that it allows mixed indexing and slicing for fixed dimensions. Records and tuples cannot be sliced.

Variable dimensions can be sliced, but do not support mixed indexing and slicing.

Xnd

Python bindings for libxnd.

xnd

The xnd module implements a container type that maps most Python values relevant for scientific computing directly to typed memory.

Whenever possible, a single, pointer-free memory block is used.

xnd supports ragged arrays, categorical types, indexing, slicing, aligned memory blocks and type inference.

Operations like indexing and slicing return zero-copy typed views on the data.

Importing PEP-3118 buffers is supported.

Types

The xnd object is a container that maps a wide range of Python values directly to memory. xnd unpacks complex types of arbitrary nesting depth to a single memory block.

Pointers only occur in explicit pointer types like Ref (reference), Bytes and String, but not in the general case.

Type inference

If no explicit type is given, xnd supports type inference by assuming types for the most common Python values.

Fixed arrays
>>> from xnd import *
>>> x = xnd([[0, 1, 2], [3, 4, 5]])
>>> x
xnd([[0, 1, 2], [3, 4, 5]], type='2 * 3 * int64')

As expected, lists are mapped to ndarrays and integers to int64. Indexing and slicing works the usual way. For performance reasons these operations return zero-copy views whenever possible:

>>> x[0][1] # Indexing returns views, even for scalars.
xnd(1, type='int64')
>>>
>>> y = x[:, ::-1] # Containers are returned as views.
>>> y
xnd([[2, 1, 0], [5, 4, 3]], type='2 * 3 * int64')

Subarrays are views and properly typed:

>>> x[1]
xnd([3, 4, 5], type='3 * int64')

The representation of large values is abbreviated:

>>> x = xnd(10 * [200 * [1]])
>>> x
xnd([[1, 1, 1, 1, 1, 1, 1, 1, 1, ...],
     [1, 1, 1, 1, 1, 1, 1, 1, 1, ...],
     [1, 1, 1, 1, 1, 1, 1, 1, 1, ...],
     [1, 1, 1, 1, 1, 1, 1, 1, 1, ...],
     [1, 1, 1, 1, 1, 1, 1, 1, 1, ...],
     [1, 1, 1, 1, 1, 1, 1, 1, 1, ...],
     [1, 1, 1, 1, 1, 1, 1, 1, 1, ...],
     [1, 1, 1, 1, 1, 1, 1, 1, 1, ...],
     [1, 1, 1, 1, 1, 1, 1, 1, 1, ...],
     ...],
    type='10 * 200 * int64')

Values can be accessed in full using the value property:

>>> x = xnd(11 * [1])
>>> x
xnd([1, 1, 1, 1, 1, 1, 1, 1, 1, ...], type='11 * int64')
>>> x.value
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Types can be accessed using the type property:

>>> x = xnd(11 * [1])
>>> x.type
ndt("11 * int64")
Ragged arrays

Ragged arrays are compatible with the Arrow list representation. The data is pointer-free, addressing the elements works by having one offset array per dimension.

>>> xnd([[0.1j], [3+2j, 4+5j, 10j]])
xnd([[0.1j], [(3+2j), (4+5j), 10j]], type='var * var * complex128')

Indexing and slicing works as usual, returning properly typed views or values in the case of scalars:

>>> x = xnd([[0.1j], [3+2j, 4+5j, 10j]])
>>> x[1, 2]
xnd(10j, type='complex128')

>>> x[1]
xnd([(3+2j), (4+5j), 10j], type='var * complex128')

Eliminating dimensions through mixed slicing and indexing is not supported because it would require copying and adjusting potentially huge offset arrays:

>>> y = x[:, 1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: mixed indexing and slicing is not supported for var dimensions
Records (structs)

From Python 3.6 on, dicts retain their order, so they can be used directly for initializing C structs.

>>> xnd({'a': 'foo', 'b': 10.2})
xnd({'a': 'foo', 'b': 10.2}, type='{a : string, b : float64}')
Tuples

Python tuples are directly translated to the libndtypes tuple type:

>>> xnd(('foo', b'bar', [None, 10.0, 20.0]))
xnd(('foo', b'bar', [None, 10.0, 20.0]), type='(string, bytes, 3 * ?float64)')
Nested arrays in structs

xnd seamlessly supports nested values of arbitrary depth:

>>> lst = [{'name': 'John', 'internet_points': [1, 2, 3]},
...        {'name': 'Jane', 'internet_points': [4, 5, 6]}]
>>> xnd(lst)
xnd([{'name': 'John', 'internet_points': [1, 2, 3]}, {'name': 'Jane', 'internet_points': [4, 5, 6]}],
    type='2 * {name : string, internet_points : 3 * int64}')
Optional data (missing values)

Optional data is currently specified using None. It is under debate if a separate NA singleton object would be more suitable.

>>> lst = [0, 1, None, 2, 3, None, 5, 10]
>>> xnd(lst)
xnd([0, 1, None, 2, 3, None, 5, 10], type='8 * ?int64')
Categorical data

Type inference would be ambiguous, so it cannot work directly. xnd supports the levels argument that is internally translated to the type.

>>> levels = ['January', 'August', 'December', None]
>>> x = xnd(['January', 'January', None, 'December', 'August', 'December', 'December'], levels=levels)
>>> x.value
['January', 'January', None, 'December', 'August', 'December', 'December']
>>> x.type
ndt("7 * categorical('January', 'August', 'December', NA)")

The above is equivalent to specifying the type directly:

>>> from ndtypes import *
>>> t = ndt("7 * categorical('January', 'August', 'December', NA)")
>>> x = xnd(['January', 'January', None, 'December', 'August', 'December', 'December'], type=t)
>>> x.value
['January', 'January', None, 'December', 'August', 'December', 'December']
>>> x.type
ndt("7 * categorical('January', 'August', 'December', NA)")
Explicit types

While type inference is well-defined, it necessarily makes assumptions about the programmer’s intent.

There are two cases where types should be given:

Different types are intended
>>> xnd([[0,1,2], [3,4,5]], type="2 * 3 * uint8")
xnd([[0, 1, 2], [3, 4, 5]], type='2 * 3 * uint8')

Here, type inference would deduce int64, so uint8 needs to be passed explicitly.

All supported types
Fixed arrays

Fixed arrays are similar to NumPy’s ndarray. One difference is that internally xnd uses steps instead of strides. One step is the amount of indices required to move the linear index from one dimension element to the next.

This facilitates optional data, whose bitmaps need to be addressed by the linear index. The equation stride = step * itemsize always holds.

>>> xnd([[[1,2], [None, 3]], [[4, None], [5, 6]]])
xnd([[[1, 2], [None, 3]], [[4, None], [5, 6]]], type='2 * 2 * 2 * ?int64')

This is a fixed array with optional data.

>>> xnd([(1,2.0,3j), (4,5.0,6j)])
xnd([(1, 2.0, 3j), (4, 5.0, 6j)], type='2 * (int64, float64, complex128)')

An array with tuple elements.

Fortran order

Fortran order is specified by prefixing the dimensions with an exclamation mark:

>>> lst = [[1, 2, 3], [4, 5, 6]]
>>> x = xnd(lst, type='!2 * 3 * uint16')
>>>
>>> x.type.shape
(2, 3)
>>> x.type.strides
(2, 4)

Alternatively, steps can be passed as arguments to the fixed dimension type:

>>> from ndtypes import *
>>> lst = [[1, 2, 3], [4, 5, 6]]
>>> t = ndt("fixed(shape=2, step=1) * fixed(shape=3, step=2) * uint16")
>>> x = xnd(lst, type=t)
>>> x.type.shape
(2, 3)
>>> x.type.strides
(2, 4)
Ragged arrays

Ragged arrays with explicit types are easiest to construct using the dtype argument to the xnd constructor.

>>> lst = [[0], [1, 2], [3, 4, 5]]
>>> xnd(lst, dtype="int32")
xnd([[0], [1, 2], [3, 4, 5]], type='var * var * int32')

Alternatively, offsets can be passed as arguments to the var dimension type:

>>> from ndtypes import ndt
>>> t = ndt("var(offsets=[0,3]) * var(offsets=[0,1,3,6]) * int32")
>>> xnd(lst, type=t)
xnd([[0], [1, 2], [3, 4, 5]], type='var * var * int32')
Tuples

In memory, tuples are the same as C structs.

>>> xnd(("foo", 1.0))
xnd(('foo', 1.0), type='(string, float64)')

Indexing works the same as for arrays:

>>> x = xnd(("foo", 1.0))
>>> x[0]
xnd('foo', type='string')

Nested tuples are more general than ragged arrays. They can a) hold different data types and b) the trees they represent may be unbalanced.

They do not allow slicing though and are probably less efficient.

This is an example of an unbalanced tree that cannot be represented as a ragged array:

>>> unbalanced_tree = (((1.0, 2.0), (3.0)), 4.0, ((5.0, 6.0, 7.0), ()))
>>> x = xnd(unbalanced_tree)
>>> x.value
(((1.0, 2.0), 3.0), 4.0, ((5.0, 6.0, 7.0), ()))
>>> x.type
ndt("(((float64, float64), float64), float64, ((float64, float64, float64), ()))")
>>>
>>> x[0]
xnd(((1.0, 2.0), 3.0), type='((float64, float64), float64)')
>>> x[0][0]
xnd((1.0, 2.0), type='(float64, float64)')

Note that the data in the above tree example is packed into a single contiguous memory block.

Records

In memory, records are C structs. The field names are only stored in the type.

The following examples use Python-3.6, which keeps the dict initialization order.

>>> x = xnd({'a': b'123', 'b': {'x': 1.2, 'y': 100+3j}})
>>> x.value
{'a': b'123', 'b': {'x': 1.2, 'y': (100+3j)}}
>>> x.type
ndt("{a : bytes, b : {x : float64, y : complex128}}")

Indexing works the same as for arrays. Additionally, fields can be indexed by name:

>>> x[0]
xnd(b'123', type='bytes')
>>> x['a']
xnd(b'123', type='bytes')
>>> x['b']
xnd({'x': 1.2, 'y': (100+3j)}, type='{x : float64, y : complex128}')

The nesting depth is arbitrary. In the following example, the data – except for strings, which are pointers – is packed into a single contiguous memory block:

>>> from pprint import pprint
>>> item = {
...   "id": 1001,
...   "name": "cyclotron",
...   "price": 5998321.99,
...   "tags": ["connoisseur", "luxury"],
...   "stock": {
...     "warehouse": 722,
...     "retail": 20
...   }
... }
>>> x = xnd(item)
>>>
>>> pprint(x.value)
{'id': 1001,
 'name': 'cyclotron',
 'price': 5998321.99,
 'stock': {'retail': 20, 'warehouse': 722},
 'tags': ['connoisseur', 'luxury']}
>>>
>>> x.type.pprint()
{
  id : int64,
  name : string,
  price : float64,
  tags : 2 * string,
  stock : {
    warehouse : int64,
    retail : int64
  }
}

Strings can be embedded into the array by specifying the fixed string type. In this case, the memory block is pointer-free.

>>> from ndtypes import ndt
>>>
>>> t = """
...   { id : int64,
...     name : fixed_string(30),
...     price : float64,
...     tags : 2 * fixed_string(30),
...     stock : {warehouse : int64, retail : int64}
...   }
... """
>>>
>>> x = xnd(item, type=t)
>>> x.type.pprint()
{
  id : int64,
  name : fixed_string(30),
  price : float64,
  tags : 2 * fixed_string(30),
  stock : {
    warehouse : int64,
    retail : int64
  }
}
Record of arrays

Often it is more memory efficient to store an array of records as a record of arrays. This example with columnar data is from the Arrow homepage:

>>> data = {'session_id': [1331247700, 1331247702, 1331247709, 1331247799],
...         'timestamp': [1515529735.4895875, 1515529746.2128427, 1515529756.4485607, 1515529766.2181058],
...         'source_ip': ['8.8.8.100', '100.2.0.11', '99.101.22.222', '12.100.111.200']}
>>> x = xnd(data)
>>> x.type
ndt("{session_id : 4 * int64, timestamp : 4 * float64, source_ip : 4 * string}")
References

References are transparent pointers to new memory blocks (meaning a new data pointer, not a whole new xnd buffer).

For example, this is an array of pointer to array:

>>> t = ndt("3 * ref(4 * uint64)")
>>> lst = [[0,1,2,3], [4,5,6,7], [8,9,10,11]]
>>> xnd(lst, type=t)
xnd([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]], type='3 * ref(4 * uint64)')

The user sees no difference to a regular 3 by 4 array, but internally the outer dimension consists of three pointers to the inner arrays.

For memory blocks generated by xnd itself the feature is not so useful – after all, it is usually better to have a single memory block than one with additional pointers.

However, suppose that in the above columnar data example another application represents the arrays inside the record with pointers. Using the ref type, data structures borrowed from such an application can be properly typed:

>>> t = ndt("{session_id : &4 * int64, timestamp : &4 * float64, source_ip : &4 * string}")
>>> x = xnd(data, type=t)
>>> x.type
ndt("{session_id : ref(4 * int64), timestamp : ref(4 * float64), source_ip : ref(4 * string)}")

The ampersand is the shorthand for “ref”.

Constructors

Constructors are xnd’s way of creating distinct named types. The constructor argument is a regular type.

Constructors open up a new dtype, so named arrays can be the dtype of other arrays. Type inference currently isn’t aware of constructors, so types have to be provided.

>>> t = ndt("3 * SomeMatrix(2 * 2 * float32)")
>>> lst = [[[1,2], [3,4]], [[5,6], [7,8]], [[9,10], [11,12]]]
>>> x = xnd(lst, type=t)
>>> x
xnd([[[1.0, 2.0], [3.0, 4.0]], [[5.0, 6.0], [7.0, 8.0]], [[9.0, 10.0], [11.0, 12.0]]],
    type='3 * SomeMatrix(2 * 2 * float32)')
>>> x[0]
xnd([[1.0, 2.0], [3.0, 4.0]], type='SomeMatrix(2 * 2 * float32)')
Categorical

Categorical types contain values. The data stored in xnd buffers are indices (int64) into the type’s categories.

>>> t = ndt("categorical('a', 'b', 'c', NA)")
>>> data = ['a', 'a', 'b', 'a', 'a', 'a', 'foo', 'c']
>>> x = xnd(data, dtype=t)
>>> x.value
['a', 'a', 'b', 'a', 'a', 'a', None, 'c']
Fixed String

Fixed strings are embedded into arrays. Supported encodings are ‘ascii’, ‘utf8’, ‘utf16’ and ‘utf32’. The string size argument denotes the number of code points rather than bytes.

>>> t = ndt("10 * fixed_string(3, 'utf32')")
>>> x = xnd.empty(t)
>>> x.value
['', '', '', '', '', '', '', '', '', '']
>>> x[3] = "\U000003B1\U000003B2\U000003B3"
>>> x.value
['', '', '', 'αβγ', '', '', '', '', '', '']
Fixed Bytes

Fixed bytes are embedded into arrays.

>>> t = ndt("3 * fixed_bytes(size=3)")
>>> x = xnd.empty(t)
>>> x[2] = b'123'
>>> x.value
[b'\x00\x00\x00', b'\x00\x00\x00', b'123']
>>> x.align
1

Alignment can be requested with the requirement that size is a multiple of alignment:

>>> t = ndt("3 * fixed_bytes(size=32, align=16)")
>>> x = xnd.empty(t)
>>> x.align
16
String

Strings are pointers to NUL-terminated UTF-8 strings.

>>> x = xnd.empty("10 * string")
>>> x.value
['', '', '', '', '', '', '', '', '', '']
>>> x[0] = "abc"
>>> x.value
['abc', '', '', '', '', '', '', '', '', '']
Bytes

Internally, bytes are structs with a size field and a pointer to the data.

>>> xnd([b'123', b'45678'])
xnd([b'123', b'45678'], type='2 * bytes')

The bytes constructor takes an optional align argument that specifies the alignment of the allocated data:

>>> x = xnd([b'abc', b'123'], type="2 * bytes(align=64)")
>>> x.value
[b'abc', b'123']
>>> x.align
8

Note that x.align is the alignment of the array. The embedded pointers to the bytes data are aligned at 64.

Primitive types

As a short example, here is a tuple that contains all primitive types:

>>> s = """
...    (bool,
...     int8, int16, int32, int64,
...     uint8, uint16, uint32, uint64,
...     bfloat16, float16, float32, float64,
...     bcomplex32, complex32, complex64, complex128)
... """
>>> x = xnd.empty(s)
>>> x.value
(False, 0, 0, 0, 0, 0, 0, 0, 0, 0.0, 0.0, 0.0, 0.0, 0j, 0j, 0j, 0j)

Alignment and packing

The xnd memory allocators support explicit alignment. Alignment is specified in the types.

Tuples and records

Tuples and records have the align and pack keywords that have the same purpose as gcc’s aligned and packed struct attributes.

Field alignment

The align keyword can be used to specify an alignment that is greater than the natural alignment of a field:

>>> from xnd import *
>>> s = "(uint8, uint64 |align=32|, uint64)"
>>> x = xnd.empty(s)
>>> x.align
32
>>> x.type.datasize
64
Field packing

The pack keyword can be used to specify an alignment that is smaller than the natural alignment of a field:

>>> s = "(uint8, uint64 |pack=2|, uint64)"
>>> x = xnd.empty(s)
>>> x.align
8
>>> x.type.datasize
24
Struct packing

The pack and align keywords can be applied to the entire struct:

>>> s = "(uint8, uint64, uint64, pack=1)"
>>> x = xnd.empty(s)
>>> x.align
1
>>> x.type.datasize
17

Individual field and struct directives are mutually exclusive:

>>> s = "2 * (uint8 |align=16|, uint64, pack=1)"
>>> x = xnd.empty(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot have 'pack' tuple attribute and field attributes
Array alignment

An array has the same alignment as its elements:

>>> s = "2 * (uint8, uint64, pack=1)"
>>> x = xnd.empty(s)
>>> x.align
1
>>> x.type.datasize
18

Buffer protocol

xnd supports importing PEP-3118 buffers.

From NumPy

Import a simple ndarray:

>>> import numpy as np
>>> from xnd import *
>>> x = np.array([[[0,1,2], [3,4,5]], [[6,7,8], [9,10,11]]])
>>> y = xnd.from_buffer(x)
>>> y.type
ndt("2 * 2 * 3 * int64")
>>> y.value
[[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]]

Import an ndarray with a struct dtype:

>>> x = np.array([(1000, 400.25, 'abc'), (-23, -1e10, 'cba')],
...              dtype=[('x', '<i4'), ('y', '>f4'), ('z', 'S3')])
>>> y = xnd.from_buffer(x)
>>> y.type
ndt("2 * {x : int32, y : >float32, z : fixed_bytes(size=3)}")
>>> y.value
[{'x': 1000, 'y': 400.25, 'z': b'abc'}, {'x': -23, 'y': -10000000000.0, 'z': b'cba'}]

Quick Start

Prerequisites

Python2 is not supported. If not already present, install the Python3 development packages:

# Debian, Ubuntu:
sudo apt-get install gcc make
sudo apt-get install python3-dev

# Fedora, RedHat:
sudo yum install gcc make
sudo yum install python3-devel

# openSUSE:
sudo zypper install gcc make
sudo zypper install python3-devel

# BSD:
# You know what to do.

# Mac OS X:
# Install Xcode and Python 3 headers.
Install

If pip is present on the system, installation should be as easy as:

pip install xnd

Otherwise:

tar xvzf xnd.2.0b1.tar.gz
cd xnd.2.0b1
python3 setup.py install
Windows

Refer to the instructions in the vcbuild directory in the source distribution.

gumath

gumath is a package for extensible dispatch of computational kernels that target xnd containers. Kernels can be added at runtime, which allows the use of JIT compilers.

Libgumath

C library.

libgumath

libgumath is a library for dispatching computational kernels using ndtypes function signatures. Kernels are multimethods and can be JIT-generated and inserted in lookup tables at runtime.

Kernels target XND containers.

libgumath has a small number of generic math library kernels.

Data structures

libgumath is a lightweight library for managing and dispatching computational kernels that target XND containers.

Functions are multimethods in a lookup table. Typically, applications that use libgumath should create a new lookup table for each namespace. For example, Python modules generally should have a module-specific lookup table.

Kernel signatures
typedef int (* gm_xnd_kernel_t)(xnd_t stack[], ndt_context_t *ctx);

The signature of an xnd kernel. stack contains incoming and outgoing arguments. In case of an error, kernels are expected to set a context error message and return -1.

In case of success, the return value is 0.

typedef int (* gm_strided_kernel_t)(char **args, intptr_t *dimensions, intptr_t *steps, void *data);

The signature of a NumPy compatible kernel. These signatures are for applications that want to use existing NumPy compatible kernels on XND containers.

XND containers are automatically converted to a temporary ndarray before kernel application.

Kernel set
/* Collection of specialized kernels for a single function signature. */
typedef struct {
   ndt_t *sig;
   const ndt_constraint_t *constraint;

   /* Xnd signatures */
   gm_xnd_kernel_t C;       /* dispatch ensures c-contiguous */
   gm_xnd_kernel_t Fortran; /* dispatch ensures f-contiguous */
   gm_xnd_kernel_t Xnd;     /* selected if non-contiguous or both C and Fortran are NULL */

   /* NumPy signature */
   gm_strided_kernel_t Strided;
} gm_kernel_set_t;

A kernel set contains the function signature, an optional constraint function, and up to four specialized kernels, each of which may be NULL.

The dispatch calls the kernels in the following order of preference:

If the inner dimensions of the incoming arguments are C-contiguous, the C kernel is called first. In case of Fortran inner dimensions, Fortran is called first.

If an Xnd kernel is present, it is called next, then the Strided kernel.

Kernel set initialization
typedef struct {
   const char *name;
   const char *sig;
   const ndt_constraint_t *constraint;

   gm_xnd_kernel_t C;
   gm_xnd_kernel_t Fortran;
   gm_xnd_kernel_t Xnd;
   gm_strided_kernel_t Strided;
} gm_kernel_init_t;

int gm_add_kernel(gm_tbl_t *tbl, const gm_kernel_init_t *kernel, ndt_context_t *ctx);

The gm_kernel_init_t is used for initializing a kernel set. Usually, a C translation unit contains an array of hundreds of gm_kernel_init_t structs together with a function that initializes a specific lookup table.

Multimethod struct
/* Multimethod with associated kernels */
typedef struct gm_func gm_func_t;
typedef const gm_kernel_set_t *(*gm_typecheck_t)(ndt_apply_spec_t *spec,
                  const gm_func_t *f, const ndt_t *in[], int nin, ndt_context_t *ctx);
struct gm_func {
   char *name;
   gm_typecheck_t typecheck; /* Experimental optimized type-checking, may be NULL. */
   int nkernels;
   gm_kernel_set_t kernels[GM_MAX_KERNELS];
};

This is the multimethod struct for a given function name. Each multimethod has a nkernels associated kernel sets with unique type signatures.

If typecheck is NULL, the generic libndtypes multimethod dispatch is used to locate the kernel. This is an O(N) operation, whose search time is negligible for large array operations.

The typecheck field can be set to an optimized lookup function that has internal knowledge of kernel set locations. The only restriction to the function is that it must behave exactly as the generic libndtypes typecheck.

Functions

Create a new multimethod
gm_func_t *gm_add_func(gm_tbl_t *tbl, const char *name, ndt_context_t *ctx);

Add a new multimethod with no associated kernels to a lookup table. If name is already present in tbl or if name contains invalid characters, return NULL and set an error.

On success, return the pointer to the new multimethod. The multimethod belongs to tbl.

Add a kernel to a multimethod
int gm_add_kernel(gm_tbl_t *tbl, const gm_kernel_init_t *kernel, ndt_context_t *ctx);

Add a kernel set to a multimethod. For convenience the multimethod is created and inserted into the table if not already present.

int gm_add_kernel_typecheck(gm_tbl_t *tbl, const gm_kernel_init_t *kernel, ndt_context_t *ctx, gm_typecheck_t f);

Add a kernel set to a multimethod, using a custom typecheck function. For convenience, the multimethod is created and inserted into the table if not already present.

Select a kernel based on the input types
gm_kernel_t gm_select(ndt_apply_spec_t *spec, const gm_tbl_t *tbl, const char *name,
                      const ndt_t *in_types[], int nin, const xnd_t args[],
                      ndt_context_t *ctx);

The function looks up a multimethod by name, using table tbl. If the multimethod has an optimized custom typecheck function, it is called on the input types for kernel selection.

Otherwise, the generic ndt_typecheck is called on each kernel associated with the multimethod in order to find a match for the input arguments.

Apply a kernel to input
int gm_apply(const gm_kernel_t *kernel, xnd_t stack[], int outer_dims, ndt_context_t *ctx);

Apply a kernel to input arguments. stack is expected to contain a list of input arguments followed by output arguments. outer_dims are the number of dimensions to traverse before applying the kernel to the inner dimensions.

Builtin kernels

libgumath has a number of builtin kernels that use optimized type checking and kernel lookup.

Unary kernels
int gm_init_unary_kernels(gm_tbl_t *tbl, ndt_context_t *ctx);

Add all builtin unary kernels to tbl. The kernels include fabs, exp, exp2, expm1, log, log2, log10, log1p, logb, sqrt, cbrt, sin, cos, tan, asin, acos, atan, sinh, cosh, tanh, asinh, acosh, atanh, erf, erfc, lgamma, tgamma, ceil, floor, trunc, round, rearbyint.

Binary kernels
int gm_init_binary_kernels(gm_tbl_t *tbl, ndt_context_t *ctx);

Add all binary kernels to tbl. The kernels currently only include add, subtract, multiply, divide.

Gumath

Python bindings for libgumath.

gumath

The gumath Python module provides the infrastructure for managing and dispatching libgumath kernels. Kernels target xnd containers from the xnd Python module.

gumath supports modular namespaces. Typically, a namespace is implemented as one Python module that uses gumath for calling kernels.

The xndtools project automates generating kernels and creating namespace modules.

Builtin functions

The gumath.functions module wraps the builtin libgumath kernels and serves as an example of a modular namespace.

All builtin functions
>>> from gumath import functions as fn
>>> dir(fn)
['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'add', 'asin', 'asinh', 'atan', 'atanh', 'bitwise_and', 'bitwise_or', 'bitwise_xor', 'cbrt', 'ceil', 'copy', 'cos', 'cosh', 'divide', 'erf', 'erfc', 'exp', 'exp2', 'expm1', 'fabs', 'floor', 'greater', 'greater_equal', 'invert', 'less', 'less_equal', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'logb', 'multiply', 'nearbyint', 'negative', 'round', 'sin', 'sinh', 'sqrt', 'subtract', 'tan', 'tanh', 'tgamma', 'trunc']
Unary functions
>>> from xnd import xnd
>>> x = xnd([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
>>> fn.log(x)
xnd([[0.0, 0.6931471805599453, 1.0986122886681098], [1.3862943611198906, 1.6094379124341003, 1.791759469228055]],
    type='2 * 3 * float64')

On an array with a float64 dtype, log works as expected.

>>> x = xnd([[1, 2, 3], [4, 5, 6]])
>>> x.type
ndt("2 * 3 * int64")
>>> fn.log(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid dtype

This function call would require an implicit inexact conversion from int64 to float64. All builtin libgumath kernels only allow exact conversions, so the example fails.

>>> x = xnd([[1, 2, 3], [4, 5, 6]], dtype="int32")
>>> fn.log(x)
xnd([[0.0, 0.6931471805599453, 1.0986122886681098], [1.3862943611198906, 1.6094379124341003, 1.791759469228055]],
    type='2 * 3 * float64')

int32 to float64 conversions are exact, so the call succeeds.

XndTools

XndTools is a Python package containing development tools for the XND project.

xndtools

xndtools is a Python package containing development tools for the XND project.

kernel_generator

Python package for generating gumath kernels from C function prototypes.

Kernel generator

kernel_generator is a Python package for generating gumath kernels from C function prototypes.

Notes

Developer notes

Notes

Developer notes on xnd project.

Representation of XND objects

The XND project aims at providing a set of C libraries for storing data (libxnd), for describing the data (libndtypes), and for manipulating the data (libgumath). Each library is developed separately, however, libxnd requires libndtypes, and libgumath depends on both libxnd and libndtypes. In this note libxnd and libndtypes are discussed, in particular, how xnd objects are represented in memory and what are the relations between various members of data structures.

Embedding data in memory

In libxnd the data is stored using the following data strucure:

xnd_master_t
  flags : uint32_t
  master : xnd_t

xnd_t
  bitmap : xnd_bitmap_t
  index : int64_t
  type : ndt_t*
  ptr : char*

xnd_bitmap_t
  data : uint8_t*
  size : int64_t
  next : xnd_bitmap_t*

ndt_bytes_t
  size : int64_t
  data : uint8_t*

where * represents pointer values of the corresponding data types. The corresponding C typedef definitions are in libxnd/xnd.h.

A short decription of data type members is given in the following table:

Member Description
xnd_master_t::flags Contains ownership information about the data, type, and ptr members. Needed for memory management.
xnd_master_t::master An xnd view of data.
xnd_t::bitmap Implements optional value support.
xnd_t::index Linear index of data items. Used only when type->tag is FixedDim|VarDim (data is an array of items).
xnd_t::type Points to a type value (ndt_t is provided by libndtypes, see below).
xnd_t::ptr Points to a computer memory where data is embedded. Data can be stored as bytes (ptr points to ndt_bytes_t value) or referred by its pointer value (ptr points to data value pointer). In the case of arrays and type->ndim==0, ptr points to the data item given by index.
xnd_bitmap_t::data Points to a bitmap data. Each data item (in ptr) has the corresponding bit value in data. Bit value 0 means that data item value has been provided.
xnd_bitmap_t::size Number of subtree bitmaps. This is nonzero when type->tag is Tuple|Record|Ref|Constr|Nominal.
xnd_bitmap_t::next Refers to bitmaps of subtrees. Used when data has tree-like structure.
ndt_bytes_t::size Number of bytes needed to store data.
ndt_bytes_t::data Points to computer memory where data is stored as bytes.

Note that the linear index is varied during iterations.

Typing the data

Typing the data means attaching a meaning to a junk of data stored in computer memory. The libndtypes implements data types using ndtypes type language ( Blaze datashape) that combines the shape and element type information in one unit. Note that ndtypes is similar to the Python based implementation of Blaze datashape but there are several differences to make these distinct.

Ndtypes uses the following data structure:

ndt_t
  tag : enum ndt {Module, Function, AnyKind, ..., Typevar}
  access : enum ndt_access {Abstract, Concrete}
  flags : uint32_t
  ndim : int
  datasize : int64_t
  align : uint16_t

  // Abstract part
  union
    Module
      name : char*
      type : ndt_t*
    Function
      nin : int64_t
      nout : int64_t
      nargs : int64_t
      types : ndt_t**
    FixedDim | SymbolicDim | EllipsisDim | Constr
      shape : int64_t
      type : ndt_t*
    VarDim | Ref
      type : ndt_t*
    Tuple
      flag : enum ndt_variadic {Nonvariadic, Variadic}
      shape : int64_t
      types : ndt_t**
    Record
      flag : enum ndt_variadic
      shape : int64_t
      names : char**
      types : ndt_t**

  Concrete
    union
      FixedDim
        itemsize : int64_t
        step : int64_t
      VarDim
        flag : enum ndt_offsets {InternalOffsets, ExternalOffsets}
        itemsize : int64_t
        noffsets : int32_t
        offsets : int32_t*
        nslices : int
        slices : ndt_slice_t*;
      Tuple | Record
        offset : int64_t*
        align : uint16_t*
        pad : uint16_t*
      Nominal
        name : char*
        type : ndt_t*
        meth : ndt_methods_t*
      Categorical
        ntypes : int64_t
        types : ndt_value_t*
      FixedString
        size : int64_t
        encoding : enum ndt_encoding {Ascii, Utf8, Utf16, Utf32, Ucs2}
      FixedBytes
        size : int64_t
        align : uint16_t
      Bytes
        align : uint16_t
      Char
        encoding : enum ndt_encoding
      Typevar
        name : char*

While the definition of ndt_t looks long, the union parts share the same memory and the interpretation of this depends on the ndtypes kind (specified by ndt_t::tag member).

Note that ndt_t holds both the shape and item type information of the xnd view object.

The ndtypes implementation ndt_t can be used in two modes, abstract or concrete, specified by ndt_t::access member. The ndtypes is in concrete mode when it contains enough information needed to compute what is the contiguous memory size (datasize) that fits the first and last item of the data. Otherwise the ndtypes is in abstract mode.

The abstract ndtypes can be used only as patterns. The concrete ndtypes can be used as patterns as well as for describing the structure of a data stored in a xnd view object (xnd_t instance).

Here follows a summary of data type members:

Member Description
xnd_t::tag Specifies ndtypes kind.
xnd_t::access Specifies ndtypes mode, abstract or concrete.
xnd_t::flags Contains various information about the data type: endianess, optional, subtree, ellipses.
xnd_t::ndim Specifies dimension index. Ndtypes with ndim==0 is interpreted as the ndtypes of a scalar value.
xnd_t::datasize Size of data item in bytes [undefined in abstract mode]
xnd_t::align Alignemnt of data in bytes [undefined in abstract mode]
xnd_t::Module Abstract part of Module type kind.
...
xnd_t::Concrete::FixedDim Concrete part of FixedDim type kind.
...

In the following each ndtypes kind is described in separate subsections.

Arrays

Array is a data structure that contains data items of the same data type. When data items use the same amount of memory, representation of an array is particularly simple: one only needs to know the location of the first data item in memory and the byte-size of data item type in order to have access to any data item in the array. However, there exists data types such as strings or ragged arrayswhere the data item byte-size depends on the data item content and a more general representation of array structure is needed.

In libndtypes several kinds of array representations are supported.

In abstract mode one can represent arrays of different item types (named type variable), of different dimensions (ellipses), and of different shapes (symbolic dimensions) using a single ndtypes instance. The purpose of such ndtypes instances is to define patterns of arrays that is used in libgumath. The libgumath library provides computational kernels that implement algorithms to manipulate data with specific structure. The kernels can be called only on data that ndtypes matches the signature of a particular kernel.

In concrete mode the main purpose of a ndtypes instance is to provide information how to access the data items in an array (using also the member xnd_t::index).

Abstract array ndtypes

Ndtypes is in abstract mode when ndt_t::access==Abstract.

Member Description
ndt_t::tag==FixedDim Ndtypes represents an array dimension with fixed shape value
ndt_t::FixedDim::shape Specifies the shape value of the array dimension
ndt_t::FixedDim::type Points to data item type specification.
xnd_t::datasize undefined
xnd_t::align undefined
xnd_t::Concrete::... undefined

Undefined means that the value is set to 0 or NULL.

Note that FixedDim ndtypes is in abstract mode when type ndtypes is in abstract mode.

Member Description
ndt_t::tag==VarDim Ndtypes represents an array dimension that shape may vary
ndt_t::VarDim::type Points to data item type specification.
xnd_t::datasize undefined
xnd_t::align undefined
xnd_t::Concrete::... undefined
Member Description
ndt_t::tag==SymbolicDim Ndtypes represents an array dimension that shape is a symbolic.
ndt_t::SymbolicDim::name Contains symbol name.
ndt_t::SymbolicDim::type Points to data item type specification.
xnd_t::datasize undefined
xnd_t::align undefined
xnd_t::Concrete::... undefined

The shape symbol must start with a capital letter.

Member Description
ndt_t::tag==EllipsisDim Ndtypes represents 0 or more dimensions.
ndt_t::EllipsisDim::name Contains ellipsis name.
ndt_t::EllipsisDim::type Points to data item type specification.
xnd_t::datasize undefined
xnd_t::align undefined
xnd_t::Concrete::... undefined

The named ellipsis name must start with a capital letter or be equal to 'var'. Ellipsis var... is special and it is used only for ragged arrays (when ndt_t::tag==VarDim).

Concrete array ndtypes

Ndtypes is in concrete mode when ndt_t::access==Concrete.

In the case of fixed shape arrays we have:

Member Description
ndt_t::tag==FixedDim Ndtypes represents an array dimension with fixed shape value
ndt_t::FixedDim::shape Specifies the shape value of the array dimension
ndt_t::FixedDim::type Points to data item type specification.
xnd_t::datasize Specifies the byte-size of array data
xnd_t::align Alignment of data item.
xnd_t::Concrete::FixedDim::itemsize Specifies the byte-size of array item.
xnd_t::Concrete::FixedDim::step Specifies the byte-step to the next item in array. E.g. in the case of slice view the step is a integer multiple of itemsize.

Note that the datasize is defined as the byte-size of memory that is occupied between the first and the last array item (including the items that are discarded due to slicing with step!=1). So, datasize does not correspond to the size of memory needed to hold the data defined by xnd view, it only defines the upper bound.

In the case of ragged arrays we have:

Member Description
ndt_t::tag==VarDim Ndtypes represents an array dimension that shape may vary
ndt_t::FixedDim::type Points to data item type specification.
xnd_t::datasize Specifies the byte-size of array data
xnd_t::align Alignment of data item.
xnd_t::Concrete::VarDim::flag Specifies the ownership of offsets data.
xnd_t::Concrete::VarDim::itemsize Specifies the byte-size of array item.
xnd_t::Concrete::VarDim::noffsets Specifies the byte-size of offsets member.
xnd_t::Concrete::VarDim::offsets  
xnd_t::Concrete::VarDim::nslices Specifies the byte-size of slices member.
xnd_t::Concrete::VarDim::slices  

Tutorials

Tutorials on using XND tools.

Generating new kernels for gumath

Operations on XND containers live in the gumath library. It already provides binary operations like add, subtract, multiply and divide, and unary operations like abs, exponential, logarithm, power, trigonometric functions, etc. They can operate on various data types, e.g. int32 or float64. But you might want to create your own kernel, either from a code that you wrote or from an existing library. This tutorial will show how to do that using the kernel generator from XND Tools.

Kernel generator

XND Tools provide development tools for XND. Among them, xndtools.kernel_generator facilitates the creation of new kernels by semi-automatically wrapping e.g. C code. Fortran will be supported in the future, as there are tons of high performance libraries out there.

Wrapping C code

Let’s say we want to make a kernel out of the following C function, which just squares a number:

// file: square.c

#include "square.h"

double square(double a)
{
    return a * a;
}

This is the implementation of the function that we want to turn into a kernel, but the kernel generator is only concerned about its prototype, which looks like this:

// file: square.h

extern double square(double);

Note the extern keyword: because this header file will be used by the kernel generator to build the kernel, the square function will be assumed to be available somewhere else. We will then be able to compile the kernel and the square.c file independently, and link them later. This is not so important in our simple example, but it could be if we were wrapping an existing library like LAPACK, which has its own build process. The kernel generator doesn’t need to know how to build the library, all it needs is a header file with the function prototypes.

Now run the following command:

$ xnd_tools config square.h

This creates an initial kernel configuration file square-kernels.cfg, which looks like this:

[MODULE square]
typemaps =
      double: float64
includes =
      square.h
include_dirs =

libraries =

library_dirs =

header_code =
kinds = Xnd
ellipses = ..., var...

[KERNEL square]
skip = # REMOVE THIS LINE WHEN READY
prototypes =
      double square(double   a);
description =
dimension =
input_arguments = a
inplace_arguments =
inout_arguments =
output_arguments =
hide_arguments =

For this simple kernel, we don’t actually have to change anything, so we’ll just remove the skip line as indicated, and save the file.

The next step is about creating the interface to gumath, and registering our square function as a kernel. The following command will create a square-kernels.c file:

$ xnd_tools kernel square-kernels.cfg

For now we are still in the C world, so we also need to expose our kernel to Python. This is done by creating an extension module. Fortunately, XND tools does that for us as well. The following command will create the square-python.c file. Note that it also creates the square-kernels.c file if it does not already exists, so the previous command is not necessary here.

$ xnd_tools module square-kernels.cfg

Assuming the variable $SITE_PACKAGES contains the path to your Python site-packages directory, where xnd, ndtypes, gumath and xndtools are installed (given by python -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())"), you can compile the square function, its kernel, and create a static library with the following commands:

$ gcc -fPIC                                   \
  -c square.c                                 \
  -c square-kernels.c -fPIC                   \
  -I$SITE_PACKAGES/ndtypes                    \
  -I$SITE_PACKAGES/xnd                        \
  -I$SITE_PACKAGES/gumath                     \
  -I$SITE_PACKAGES/xndtools/kernel_generator
$ ar rcs libsquare-kernels.a square-kernels.o square.o

Then building a C extension for CPython can be done using distutils. It just needs a setup.py script, which for our simple case looks like this:

# file: setup.py

from distutils.core import setup, Extension
from distutils.sysconfig import get_python_lib

site_packages = get_python_lib()
libs = ['ndtypes','gumath', 'xnd']
lib_dirs = [f'{site_packages}/{lib}' for lib in libs]

module1 = Extension('square',
                    include_dirs = lib_dirs,
                    libraries = ['square-kernels'] + libs,
                    library_dirs = ['.'] + lib_dirs,
                    sources = ['square-python.c'])

setup (name = 'square',
       version = '1.0',
       description = 'This is a gumath kernel extension that squares an XND container',
       ext_modules = [module1])

Finally, we can build and install our extension with the following command:

$ python setup.py install

If everything went fine, we can now test it in the Python console:

>>> from xnd import xnd
>>> from square import square
>>> a = xnd([1., 2., 3.])
>>> a
xnd([1.0, 2.0, 3.0], type='3 * float64')
>>> square(a)
xnd([1.0, 4.0, 9.0], type='3 * float64')
Running kernels on the GPU

We will see how we can run kernels on the GPU. The following is a typical CUDA code which adds the elements of two arrays of a given size:

// file: add_gpu.cu

#include "gpu.h"

__global__
void add(int n, float* x, float* y, float* r)
{
    int index = blockIdx.x * blockDim.x + threadIdx.x;
    int stride = blockDim.x * gridDim.x;
    for (int i = index; i < n; i += stride)
        r[i] = x[i] + y[i];
}

void add_gpu(int n, float* x, float* y, float* r)
{
    int blockSize = 256;
    int numBlocks = (n + blockSize - 1) / blockSize;
    add<<<numBlocks, blockSize>>>(n, x, y, r);
    cudaDeviceSynchronize();
}

The add function is called a CUDA kernel (not to be confused with the gumath kernels!). This is what will actually run on the GPU. The reason why a GPU is faster than a CPU is because it can massively parallelize computations, and this is why we have these index and stride variables: the kernel will be applied on different parts of the data at the same time.

Our gumath kernel however will use the add_gpu function, which internally calls add with a special CUDA syntax and some extra-parameters (basically specifying how much it will be parallelized). The cudaDeviceSynchronize() function call blocks the CPU execution until all GPU computations are done.

The GPU has its own memory, which is different from the CPU memory. When we want to do a computation on the GPU, we first have to copy the data from the CPU side to the GPU side, and when we want to retrieve the results from the GPU, we have to copy its data back to the CPU. This can be taken care of by the so called “unified memory”, which provides a single memory space accessible by the GPU and the CPU. The following file contains functions to allocate memory in the unified memory and to delete it, here again through special CUDA functions:

// file: gpu.cu

#include "gpu.h"

float* get_array(int n)
{
    float* x;
    cudaMallocManaged(&x, n * sizeof(float));
    return x;
}

void del_array(float* x)
{
    cudaFree(x);
}

Now let’s see how the gpu.h file looks like:

// file: gpu.h

extern "C" void add_gpu(int n, float* x, float* y, float *r);
extern "C" float* get_array(int n);
extern "C" void del_array(float* x);

It consists of the prototypes of the add_gpu function for which we want to make a kernel, and the get_array and del_array functions which we will use to manage the memory for our data. Note the extern "C" declaration: because nvcc (the CUDA compiler) is a C++ compiler, we need to expose them as C functions to Python.

Since gpu.cu only manages the GPU memory, and is independent of the gumath kernel generation, we will simply access its functions through a shared library. This is how we compile it:

$ nvcc -o libgpu.so --compiler-options "-fPIC" --shared gpu.cu

This gives us a libgpu.so library that we can interface with in Python using ctypes. The following code wraps the C functions to Python functions:

# file: gpu.py

import ctypes
import numpy as np
from xnd import xnd

gpu = ctypes.CDLL('./libgpu.so')
gpu.get_array.restype = ctypes.POINTER(ctypes.c_float)
gpu.del_array.argtypes = [ctypes.POINTER(ctypes.c_float), ]

def xnd_gpu(size):
    addr = gpu.get_array(size)
    a = np.ctypeslib.as_array(addr, shape=(size,))
    x = xnd.from_buffer(a)
    return x, addr

def del_gpu(addr):
    gpu.del_array(addr)

xnd_gpu returns an XND container (and its data pointer) whose data live in the unified memory, and del_gpu frees the memory referenced by a pointer.

Now we need to generate the gumath kernel for our add_gpu function. We save its prototype in the following file:

// file: add_gpu.h

extern void add_gpu(int n, float* x, float* y, float *r);

The corresponding configuration file looks like this:

# file: add_gpu-kernels.cfg

[MODULE add_gpu]
typemaps =
     float: float32
     int: int32
includes =
     add_gpu.h
include_dirs =
sources =
     add_gpu.c

libraries =

library_dirs =

header_code =
kinds = C
ellipses = none

[KERNEL add_gpu]
prototypes =
     void add_gpu(int   n, float *  x, float *  y, float *  r);
description =
dimension = x(n), y(n), r(n)
input_arguments = x, y
inplace_arguments = r
hide_arguments = n = len(x)

We can now generate the kernel:

$ xnd_tools kernel add_gpu-kernels.cfg
$ xnd_tools module add_gpu-kernels.cfg

And create a static library:

$ nvcc --compiler-options '-fPIC' -c add_gpu.cu
$ gcc -fPIC                                               \
  -c add_gpu-kernels.c                                    \
  -c $SITE_PACKAGES/xndtools/kernel_generator/xndtools.c  \
  -I$SITE_PACKAGES/xndtools/kernel_generator              \
  -I$SITE_PACKAGES/xnd                                    \
  -I$SITE_PACKAGES/ndtypes                                \
  -I$SITE_PACKAGES/gumath
$ ar rcs libadd_gpu-kernels.a add_gpu.o add_gpu-kernels.o xndtools.o

Finally, launch python setup.py install with this setup.py file:

# file: setup.py

from distutils.core import setup, Extension
from distutils.sysconfig import get_python_lib

site_packages = get_python_lib()
lib_dirs = [f'{site_packages}/{i}' for i in ['ndtypes', 'gumath', 'xnd']]

module1 = Extension('add_gpu',
                    include_dirs = lib_dirs,
                    libraries = ['add_gpu-kernels', 'ndtypes','gumath', 'xnd', 'cudart', 'stdc++'],
                    library_dirs = ['.', '/usr/local/cuda-9.2/lib64'] + lib_dirs,
                    sources = ['add_gpu-python.c'])

setup (name = 'add_gpu',
       version = '1.0',
       description = 'This is a gumath kernel extension that adds two XND containers on the GPU',
       ext_modules = [module1])

If everything went fine, you should be able to run the kernel on the GPU:

>>> from gpu import xnd_gpu, del_gpu
>>> from add_gpu import add_gpu
>>> size = 1 << 20
>>> x0, a0 = xnd_gpu(size)
>>> x1, a1 = xnd_gpu(size)
>>> x2, a2 = xnd_gpu(size)
>>> for i in range(size):
...     x0[i] = i
...     x1[i] = 1
>>> x0
xnd([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, ...], type='1048576 * float32')
>>> x1
xnd([1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ...], type='1048576 * float32')
>>> add_gpu(x0, x1, x2)
>>> x2
xnd([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, ...], type='1048576 * float32')
>>> del_gpu(a0)
>>> del_gpu(a1)
>>> del_gpu(a2)

Releases

Releases

v0.2.0b2 (February 5th 2018)

Second release (beta2). This release addresses several build and packaging issues:

  • Avoid copying libraries into the Python package if system libraries are used.
  • The build and install partially relied on the dev setup (ndtypes checked out in the xnd directory). This dependency has been removed.
  • The conda build now supports separate library and Python module installs.
  • Configure now has a –without-docs option for skipping the doc install.
  • The generated parsers are now checked into the source tree to avoid bison/flex dependencies and unnecessary rebuilds after cloning.
  • Non-API global symbols are hidden on Linux (as long as the compiler supports gcc pragmas).
  • The conda build supports separate library and Python module installs.
  • Configure now has a –without-docs option for skipping the doc install.

v0.2.0b1 (January 20th 2018)

First release (beta1).