Representation of XND objects¶
The XND project aims at providing a set of C libraries for storing data (libxnd), for describing the data (libndtypes), and for manipulating the data (libgumath). Each library is developed separately, however, libxnd requires libndtypes, and libgumath depends on both libxnd and libndtypes. In this note libxnd and libndtypes are discussed, in particular, how xnd objects are represented in memory and what are the relations between various members of data structures.
Embedding data in memory¶
In libxnd the data is stored using the following data strucure:
xnd_master_t
flags : uint32_t
master : xnd_t
xnd_t
bitmap : xnd_bitmap_t
index : int64_t
type : ndt_t*
ptr : char*
xnd_bitmap_t
data : uint8_t*
size : int64_t
next : xnd_bitmap_t*
ndt_bytes_t
size : int64_t
data : uint8_t*
where *
represents pointer values of the corresponding data types.
The corresponding C typedef definitions are in libxnd/xnd.h.
A short decription of data type members is given in the following table:
Member | Description |
---|---|
xnd_master_t::flags |
Contains ownership information about the data , type , and ptr members. Needed for memory management. |
xnd_master_t::master |
An xnd view of data. |
xnd_t::bitmap |
Implements optional value support. |
xnd_t::index |
Linear index of data items. Used only when type->tag is FixedDim|VarDim (data is an array of items). |
xnd_t::type |
Points to a type value (ndt_t is provided by libndtypes, see below). |
xnd_t::ptr |
Points to a computer memory where data is embedded. Data can be stored as bytes (ptr points to ndt_bytes_t value) or referred by its pointer value (ptr points to data value pointer). In the case of arrays and type->ndim==0 , ptr points to the data item given by index . |
xnd_bitmap_t::data |
Points to a bitmap data. Each data item (in ptr ) has the corresponding bit value in data . Bit value 0 means that data item value has been provided. |
xnd_bitmap_t::size |
Number of subtree bitmaps. This is nonzero when type->tag is Tuple|Record|Ref|Constr|Nominal . |
xnd_bitmap_t::next |
Refers to bitmaps of subtrees. Used when data has tree-like structure. |
ndt_bytes_t::size |
Number of bytes needed to store data. |
ndt_bytes_t::data |
Points to computer memory where data is stored as bytes. |
Note that the linear index is varied during iterations.
Typing the data¶
Typing the data means attaching a meaning to a junk of data stored in computer memory. The libndtypes implements data types using ndtypes type language ( Blaze datashape) that combines the shape and element type information in one unit. Note that ndtypes is similar to the Python based implementation of Blaze datashape but there are several differences to make these distinct.
Ndtypes uses the following data structure:
ndt_t
tag : enum ndt {Module, Function, AnyKind, ..., Typevar}
access : enum ndt_access {Abstract, Concrete}
flags : uint32_t
ndim : int
datasize : int64_t
align : uint16_t
// Abstract part
union
Module
name : char*
type : ndt_t*
Function
nin : int64_t
nout : int64_t
nargs : int64_t
types : ndt_t**
FixedDim | SymbolicDim | EllipsisDim | Constr
shape : int64_t
type : ndt_t*
VarDim | Ref
type : ndt_t*
Tuple
flag : enum ndt_variadic {Nonvariadic, Variadic}
shape : int64_t
types : ndt_t**
Record
flag : enum ndt_variadic
shape : int64_t
names : char**
types : ndt_t**
Concrete
union
FixedDim
itemsize : int64_t
step : int64_t
VarDim
flag : enum ndt_offsets {InternalOffsets, ExternalOffsets}
itemsize : int64_t
noffsets : int32_t
offsets : int32_t*
nslices : int
slices : ndt_slice_t*;
Tuple | Record
offset : int64_t*
align : uint16_t*
pad : uint16_t*
Nominal
name : char*
type : ndt_t*
meth : ndt_methods_t*
Categorical
ntypes : int64_t
types : ndt_value_t*
FixedString
size : int64_t
encoding : enum ndt_encoding {Ascii, Utf8, Utf16, Utf32, Ucs2}
FixedBytes
size : int64_t
align : uint16_t
Bytes
align : uint16_t
Char
encoding : enum ndt_encoding
Typevar
name : char*
While the definition of ndt_t
looks long, the union parts share
the same memory and the interpretation of this depends on the
ndtypes kind (specified by ndt_t::tag
member).
Note that ndt_t
holds both the shape and item type information of
the xnd view object.
The ndtypes implementation ndt_t
can be used in two modes,
abstract or concrete, specified by ndt_t::access
member. The
ndtypes is in concrete mode when it contains enough information
needed to compute what is the contiguous memory size (datasize
)
that fits the first and last item of the data. Otherwise the
ndtypes is in abstract mode.
The abstract ndtypes can be used only as patterns. The concrete
ndtypes can be used as patterns as well as for describing the
structure of a data stored in a xnd view object (xnd_t
instance).
Here follows a summary of data type members:
Member | Description |
---|---|
xnd_t::tag |
Specifies ndtypes kind. |
xnd_t::access |
Specifies ndtypes mode, abstract or concrete. |
xnd_t::flags |
Contains various information about the data type: endianess, optional, subtree, ellipses. |
xnd_t::ndim |
Specifies dimension index. Ndtypes with ndim==0 is interpreted as the ndtypes of a scalar value. |
xnd_t::datasize |
Size of data item in bytes [undefined in abstract mode] |
xnd_t::align |
Alignemnt of data in bytes [undefined in abstract mode] |
xnd_t::Module |
Abstract part of Module type kind. |
... |
… |
xnd_t::Concrete::FixedDim |
Concrete part of FixedDim type kind. |
... |
… |
In the following each ndtypes kind is described in separate subsections.
Arrays¶
Array is a data structure that contains data items of the same data type. When data items use the same amount of memory, representation of an array is particularly simple: one only needs to know the location of the first data item in memory and the byte-size of data item type in order to have access to any data item in the array. However, there exists data types such as strings or ragged arrayswhere the data item byte-size depends on the data item content and a more general representation of array structure is needed.
In libndtypes several kinds of array representations are supported.
In abstract mode one can represent arrays of different item types (named type variable), of different dimensions (ellipses), and of different shapes (symbolic dimensions) using a single ndtypes instance. The purpose of such ndtypes instances is to define patterns of arrays that is used in libgumath. The libgumath library provides computational kernels that implement algorithms to manipulate data with specific structure. The kernels can be called only on data that ndtypes matches the signature of a particular kernel.
In concrete mode the main purpose of a ndtypes instance is to provide
information how to access the data items in an array (using also the member xnd_t::index
).
Abstract array ndtypes¶
Ndtypes is in abstract mode when ndt_t::access==Abstract
.
Member | Description |
---|---|
ndt_t::tag==FixedDim |
Ndtypes represents an array dimension with fixed shape value |
ndt_t::FixedDim::shape |
Specifies the shape value of the array dimension |
ndt_t::FixedDim::type |
Points to data item type specification. |
xnd_t::datasize |
undefined |
xnd_t::align |
undefined |
xnd_t::Concrete::... |
undefined |
Undefined means that the value is set to 0
or NULL
.
Note that FixedDim
ndtypes is in abstract mode when type
ndtypes is in abstract mode.
Member | Description |
---|---|
ndt_t::tag==VarDim |
Ndtypes represents an array dimension that shape may vary |
ndt_t::VarDim::type |
Points to data item type specification. |
xnd_t::datasize |
undefined |
xnd_t::align |
undefined |
xnd_t::Concrete::... |
undefined |
Member | Description |
---|---|
ndt_t::tag==SymbolicDim |
Ndtypes represents an array dimension that shape is a symbolic. |
ndt_t::SymbolicDim::name |
Contains symbol name. |
ndt_t::SymbolicDim::type |
Points to data item type specification. |
xnd_t::datasize |
undefined |
xnd_t::align |
undefined |
xnd_t::Concrete::... |
undefined |
The shape symbol must start with a capital letter.
Member | Description |
---|---|
ndt_t::tag==EllipsisDim |
Ndtypes represents 0 or more dimensions. |
ndt_t::EllipsisDim::name |
Contains ellipsis name. |
ndt_t::EllipsisDim::type |
Points to data item type specification. |
xnd_t::datasize |
undefined |
xnd_t::align |
undefined |
xnd_t::Concrete::... |
undefined |
The named ellipsis name must start with a capital letter or be equal
to 'var'
. Ellipsis var...
is special and it is used only for
ragged arrays (when ndt_t::tag==VarDim
).
Concrete array ndtypes¶
Ndtypes is in concrete mode when ndt_t::access==Concrete
.
In the case of fixed shape arrays we have:
Member | Description |
---|---|
ndt_t::tag==FixedDim |
Ndtypes represents an array dimension with fixed shape value |
ndt_t::FixedDim::shape |
Specifies the shape value of the array dimension |
ndt_t::FixedDim::type |
Points to data item type specification. |
xnd_t::datasize |
Specifies the byte-size of array data |
xnd_t::align |
Alignment of data item. |
xnd_t::Concrete::FixedDim::itemsize |
Specifies the byte-size of array item. |
xnd_t::Concrete::FixedDim::step |
Specifies the byte-step to the next item in array. E.g. in the case of slice view the step is a integer multiple of itemsize. |
Note that the datasize
is defined as the byte-size of memory that
is occupied between the first and the last array item (including the
items that are discarded due to slicing with step!=1
). So,
datasize
does not correspond to the size of memory needed to hold
the data defined by xnd view, it only defines the upper bound.
In the case of ragged arrays we have:
Member | Description |
---|---|
ndt_t::tag==VarDim |
Ndtypes represents an array dimension that shape may vary |
ndt_t::FixedDim::type |
Points to data item type specification. |
xnd_t::datasize |
Specifies the byte-size of array data |
xnd_t::align |
Alignment of data item. |
xnd_t::Concrete::VarDim::flag |
Specifies the ownership of offsets data. |
xnd_t::Concrete::VarDim::itemsize |
Specifies the byte-size of array item. |
xnd_t::Concrete::VarDim::noffsets |
Specifies the byte-size of offsets member. |
xnd_t::Concrete::VarDim::offsets |
|
xnd_t::Concrete::VarDim::nslices |
Specifies the byte-size of slices member. |
xnd_t::Concrete::VarDim::slices |