Ruby/DL

Takaaki Tateishi June 17th Heisei 14
Hastily translated by Benjamin Peterson, ben@jbrowse.com
2 unclear areas are marked with (???), please let me know if you figure them out

Contents

1 Tutorial

1.1 Importing library functions : : : : : 1
1.2 Type definitions: : : : : : : : : : : : : : : : : 2
1.3 Arrays, pointer type data, and symbols: : 2
1.4 Callback functions: : : : : : : : : : : : 2
1.5 Arguments passed by reference: : : : : : : : : : : : : 3

2 Ruby/DL details

2.1 How to call functions: : : : : : : : : : : : 3
2.2 Mutable arguments: : : : : : : : : : : : : 4
2.3 How to use pointers: : : : : : : : : : : : 4
2.4 How to handle arrays: : : : : : : : : : : : : : 5
2.5 How to handle structures: : : : : : : : : : : : : 6
2.6 How to handle callbacks: : : : : : : 7
2.7 Defining the 'free' function : : : : : : : : : : : : : 8

3 Ruby/DL Manual

3.1 DL module: : : : : : : : : : : : : 8
3.2 DL::Handle class: : : : : : : : : : : 8
3.3 DL::Symbol class: : : : : : : : : : : 8
3.4 DL::PtrData class: : : : : : : : : : 8
3.5 Type specifiers: : : : : : : : : : : : : : : : : 8
3.6 DL::MemorySpace module: : : : : 8

4 Interoperating with other extension libraries.

1 Tutorial

In most OSes, there are object files called by programs at run time, which are called DLLs or shared libraries. In this document, these object files are called by the general name of 'system libraries'. This section will introduce the calling of functions defined in such libraries (hereinafter 'library functions') using import.rb, which is part of Ruby/DL. It is also possible to use library functions via the basic example-level functions defined in the DL module. This method will be explained in the next 2 sections. In import.rb, the DL::Importable module is defined. Using this module, library functions can be used from Ruby in a visually clear way.

1.1 Importing library functions

Using import.rb, in order to call library functions from Ruby, you must take the following 3 steps. The module for the system library must extend the DL::Importable module. This is because DL::Importable defines methods to load and use the library. First, we will describe the definition of this module. For instance, if you want to use the C library 'libc.so', the module definition is as follows:
require 'dl/import'
module LIBC
extend DL::Importable
end
Here, something else could be used instead of the name LIBC, but please be sure to extend it from DL::Importable. To load the system library, the dlload function from this LIBC module will be used. Further, the library functions will be made usable from Ruby via the 'extern' function. This process of making library functions available to Ruby is called 'importing'.
require 'dl/import'
module LIBC
extend DL::Importable
dlload "libc.so"
extern "int strlen(const char *)"
end
In the example above, the strlen function found in libc.so has been made accessible. strlen() is one of the standard C library functions, returning the length of a string. It is possible to specify multiple system libraries as the arguments to dlload. The extern function must be given a C function prototype. However, because not every type is necessarily defined, it may be necessary to define each undefined type that you want to use using the method given in section 1.1.2. The function can be called from Ruby when using the LIBC module as follows:
LIBC.strlen("abc") # => 3
The return value should be 3. In ruby, a method name cannot begin with a capital letter, but in C it can. Because of this, if the imported function's name begins with a capital letter, it is automatically replaced with a lower case one. For example, the library function GetString() is accessed from Ruby via the getString() method.

1.2 Type definitions

Ruby/DL understands ordinary C types, but types specific to a particular environment or library are mostly not supported. Accordingly, when describing function prototypes, it is necessary to define all the unsupported types in advance, in the module. For instance, size_t is not defined in Ruby/DL. To define it as an unsigned int, add a line like the following:
typealias("size_t", "unsigned int")
You could also leave out the parentheses and write it like so:
typealias "size_t", "unsigned int"

1.3 Arrays, pointer type data, and symbols

1.4 Callbacks

A few library functions require callback functions. For instance, the C library function qsort() requires a pointer to a function that will be used to compare elements.
void qsort(
    void *base,
    size_t nmemb,
    size_t size,
    int (*compar)(const void *, const void *)
)
First, put the following definition in LIBC to allow qsort() to be used from Ruby:
require 'dl/import'
module LIBC
extend DL::Importable
dlload "libc.so"
typealias ("size_t", "unsigned int")
extern "void *qsort(size_t, size_t, void*)"
end
Because the function type "int (*)(...)" is not understood by Ruby/DL, it will be defined as a pointer type. Next, the callback function is defined in Ruby.
module LIBC
def my_compare(ptr1, ptr2)
    ptr1.ptr.to_s <=> ptr2.ptr.to_s
end
MY_COMPARE = callback "int my_compare(char**, char**)"
end
First, the Ruby method my_compare is defined. This method takes 2 pointers to strings and compares them. Next, the "callback" method is used to make the method we have defined work with C. The argument is a function prototype describing a function with the same name as the method we defined. The return value of "callback" is a Symbol object representing the C symbol.
ary = ["a","c","b"]
ptr = ary.to_ptr
LIBC.qsort(ptr, ary.length, LIBC::MY_COMPARE)

1.5 Reference arguments

2 Ruby/DL details

In this section, using a function in a C library as an example, we aim to briefly explain the usage of Ruby/DL. In most Unix-like operating systems, I think the standard C library is in /library/libc.so or /usr/lib/libc.so, but here we well use /usr/lib/libc.so. The reader should have a basic knowledge of C -- at least, I think it will be hard to understand this part without any knowledge of pointers.

2.1 How to call functions

We will take the C library functions isdigit() and atoi() as our examples. First, in order to use the libc.so library, we use the DL.dlopen module function.
require 'dl'
libc = DL.dlopen('/usr/lib/libc.so')
DL.dlopen(path) opens the library denoted by path and returns the handle as an instance of the DL::Handle class. In the example above, the variable 'libc' has been assigned a Handle object.
Next, we extract pointers to the C library functions atoi() and isdigit(). To obtain the function pointers, either the Handle#sym method, or the shortened form Handle#[], can be used.
atoi = libc['atoi', 'IS']
isdigit = libc['isdigit', 'II']
In Ruby/DL, these function pointers are handled by the DL::Symbol class. Thus, the atoi and isdigit variables denote instances of DL::Symbol. In the first argument, the desired function's symbol is specified as a string. Then in the second argument, the type of the function is specified. The type is represented as a string composed of characters known as type specifiers. The first character represents the function's return type, and the k-th character represents the (k-1)-th argument's type. In the case of atoi, the return type is 'I', indicationg a C 'int', while 'S' indicates a 'const char *'. When the return type is not necessary, the first character can be '0' (zero). There is a detailed description of type specifiers in section 3.5.

To use the function pointer thus obtained to actually call a function, the 'call' method is used.

r1 = isdigit.call(?1) # => [2048, [49]]
r2 = isdigit.call(?a) # => [0, [97]]
r3 = atoi.call('10') # => [10, ["10"]]
Again, instead of "call", you can use brackets.
r1 = isdigit[?1] # => [2048, [49]]
r2 = isdigit[?a] # => [0, [97]]
r3 = atoi['10'] # => [10, ["10"]]
r1, r2 and r3 contain an array into which a list of the values returned after the function has been called are substituted. The various comments written on the right are examples of the values returned by Symbol#call/[].
Supposing you want to define a str2int function that takes a string of digits and converts it to an integer value using atoi, using the process flow described above; it can be done as follows:
require 'dl'

module LIBC
LIB = DL.dlopen('/usr/lib/libc.so')
SYM = {
    :atoi => LIB['atoi', 'IS']
    :isdigit => LIB['isdigit', 'II']
}
def atoi(str)
    r,rs = SYM[:atoi].call(str)
    return r
end
def isdigit(c)
    r,rs = SYM[:isdigit].call(c)
    return (r == 0)
end
end

include LIBC
def str2int(str)
    str.each_byte{|c|
        if( LIBC.isdigit(c) )
        return nil
        end
    }
    return LIBC.atoi(str)
end
In this example, by the wondrous power of the LIBC module, functionality offered by libc.so can now be provided by Ruby. The constant LIB holds a library Handle object; SYM is defined as a hash for holding Symbol objects. The module functions atoi and isdigit expose the library functions of the same names.

2.2 mutable arguments

With C functions, the values of the arguments may change during the execution of the function. For instance, strcat() appends the string given in the second argument to that given in the first. The first string is therefore not the same after the execution of the function. In order to handle this kind of 'mutable' function, the type specifier 's' (lower case) is used. Adding the strcat() function to our LIBC module would look like this:
module LIBC
SYM[:strcat] = LIB['strcat', 'SsS']
def strcat(str1,str2)
    r,rs =
    SYM[:strcat].call("#{str1}\0#{str2}",str2)
    return rs[0]
end
end
What you have to take notice of here is that the first argument of SYM[:strcat].call does not just recieve simply str1, but "#{str1}\0#{str2}". This is because, when strcat() is used in C, the first argument must refer to a buffer that has enough space to hold the completed string after the function has finished. With the inputs we are giving to this module function, "abc" and "def", the result will be to return "abcdef".
include LIBC
LIBC.strcat("abc", "def") # => "abcdef"

2.3 How to use pointers

When writing a program in C, there are a lot of places where you use pointers. For instance, you use fopen() to open a file. The function takes 2 arguments. The first argument is the file name as a string. The second is the mode in which to open the file, as a string. The return value is a pointer to a FILE structure. Thus, the type of fopen() is represented as 'PSS' in Ruby/DL. The type specifier 'P' represents a pointer type. LIBC module functions to handle fopen(), fclose(), and fgetc() in Ruby can be defined as follows:
module LIBC
SYM[:fopen] = LIB['fopen', 'PSS']
SYM[:fclose] = LIB['fclose', '0P']
SYM[:fgetc] = LIB['fgetc', 'IP']
def fopen(filename, mode)
    r,rs = SYM[:fopen].call(filename, mode)
    return r
end
def fclose(ptr)
    SYM[:fclose].call(ptr)
    return nil
end
def fgetc(ptr)
    r,rs = SYM[:fgetc].call(ptr)
    return r
end
end
LIBC.fopen gets you a pointer depending on the file name and mode specified when you open the file. The file pointer is an instance of DL::PtrData. The PtrData class is the class for handling pointers from Ruby, and defines methods for managing pointers. A function defined in LIBC to use these functions to output the contents of a file would go like this:
include LIBC
def print_file(filename)
fp = LIBC.fopen(filename, "r")
if( !fp ) # -- (note 1)
return nil
end
while( (c = LIBC.fgetc(fp)) > 0 )
print(c.chr)
end
LIBC.fclose(fp)
end
C's 'NULL' is treated as 'nil' in Ruby. Thus, to see if file opening has failed, check the value (note 1) returned from LIBC.fopen to see whether it is nil or not. In Ruby/DL, when 'P' or 'p' is given as a type specifier, at the time when the function is called you can check whether the objects given as arguments are PtrData or not. If an argument is not a PtrData, you can convert it to a PtrData by using the to_ptr method. Using this technique, C arrays and strings can be handled alike, as pointers. After 'require'-ing the Ruby/DL module, the methods String#to_ptr and Array#to_ptr are defined, so, for instance, you could use 'P' and 'p' instead of the 'S' and 's' type specifiers. For example, the C library function strlen() can be handled in the LIBC module as follows:
module LIBC
SYM[:strlen] = LIB['strlen', 'IP']
def strlen(str)
    r,rs = SYM[:strlen].call(str)
    return r
end
end

2.4 How to use arrays

To pass a Ruby array to a C function, the type specifier 'A' or 'a' are used. For instance, the C library function qsort() from glibc2 has this prototype:
void qsort(
    void *base,
    size_t nmemb,
    size_t size,
    int (*compare)(const void *, const void *)
)
Thus, the Ruby/DL type specifiers '0aIIP' would be used. qsort() is a general-purpose sort function, but on this occasion we will only consider sorting an array of strings. Since we have already described how to define the function for qsort()'s 'compare' argument in Ruby in section 2.6, here we will use a function that we have put in a library ourselves. This is a function to compare 2 strings:
#include 
int mystrcmp(char **str1, char **str2)
{
    return strcmp(*str1, *str2);
}
We will call the library which contains this function 'libmy.so'. Now, to use this library we will have to make a Ruby module like this:
module LIBMY
LIB = DL.dlopen('libmy.so')
SYM = {:mystrcmp => LIB['mystrcmp', 'IPP']}
end
If we do this, then we can obtain a pointer to mystrcmp() in the library with LIBMY.SYM[:mystrcmp]. Lastly, we define a LIBC module function for using qsort() thusly:
module LIBC
SYM[:qsort] = LIB['qsort', '0aIIP']
def qsort(ary, comp)
    len = ary.length
    r,rs = SYM[:qsort].call(ary,
        len,
        DL.sizeof('P'),
        comp)
    return rs[0].to_a('S', len)
end
end
rs[0] is the value of the first argument after the function invocation, but because the type specifier 'a' was used, the value that can be obtained on the Ruby side is a PtrData object. Therefore, rs[0] is converted to an array with the to_a method. When that is done, you specify how to convert the pointer by supplying 2 arguments. In the first argument, you pass the type of the array elements; in the second, the number of elements in the array. In this case, the elements are strings, so 'S' is passed in the first argument. As the number of elements is the same as before the invocation, the value that was obtained earlier is passed in argument 2. In this way, LIBC.qsort can handle an array of strings and compare them with the C function given in 'comp', returning the result again as an array of strings. Using the LIBC and LIBMY modules, we can define a Ruby method that sorts a string array using mystrcmp() like this:
include LIBC
include LIBMY
def strsort(ary)
    LIBC.qsort(ary, LIBMY[:mystrcmp])
end

2.5 How to use structures

Is there are many situation in which structures are defined in C, it is possible to create new types. For instance, the timeval structure used by gettimeofday() is defined as follows:
struct timeval {
    long tv_sec;
    long tv_usec;
};
In order to reserve memory space for a structure like this, either DL.malloc or Array#pack can be used. DL.malloc takes the amount of memory needed as an argument. It returns a PtrData object that indicates the memory range allocated. In such a case, you can use DL.sizeof to get the size of a given structure. DL.sizeof can also calculate the amount of memory required for a structure type. It takes alignment into account. The two lines below may appear to be doing the same thing, but in fact (1) displays the size of a structure that contains a char and a long, and (2) simply returns the size of a char type plus the size of a long type.
ptr = DL.malloc(DL.sizeof('CL')) # (1)
ptr = DL.malloc(DL.sizeof('C') + DL.sizeof('L')) # (2)
When using Array#pack, having once made a string in Ruby you can convert it to a PtrData with to_ptr:
ptr = [0,0].pack('ll').to_ptr
Using pack, each element can be initialized at once.

When a structure is represented by a PtrData object, the various elements can be accessed by supplying an offset. For instance, with the following code you can find the elements corresponding to tv_sec and tv_usec:

sec = ptr[0].to_i
usec = ptr[DL.sizeof('L')].to_i
We now present the C code for comparison with the equivalent Ruby code:
#include 
#include 
#include 
int main()
{
    void *ptr;
    long sec, usec;
    /* ptr = [0,0].pack('ll').to_ptr */
    ptr = (void*)malloc(sizeof(struct timeval));
    ((struct timeval *)ptr)->tv_sec = 10;
    ((struct timeval *)ptr)->tv_usec = 100;
    /* sec = ptr[0].to_i */
    /* sec = (ptr + 0).ptr.to_i */
    sec = *(long*)(ptr + 0);
    /* usec = ptr[DL.sizeof('L')].to_i */
    /* usec = (ptr + DL.sizeof('L')).ptr.to_i */
    usec = *(long*)(ptr + sizeof(long));
    printf("sec = %ld, usec = %ld\n", sec, usec);
    exit(0);
}
However, normally, starting with the base address of the structure and applying an offset followed by a cast is not what you do to access structure members. In C, the code looks something like this:
#include 
#include 
#include 
int main()
{
    struct timeval *ptr;
    long sec, usec;
Depending on the C compiler there might be a warning, or it might not go through at all.
    ptr = (struct timeval *)malloc(sizeof(struct timeval));
    ptr->tv_sec = 10;
    ptr->tv_usec = 100;
    sec = ptr->tv_sec;
    usec = ptr->tv_usec;
    printf("sec = %ld, usec = %ld\n", sec, usec);
    exit(0);
}
This is possible because it is understood that the variable called ptr has a value which is a timeval structure. In Ruby/DL, the struct! method is provided to fill in the individual elements of a structure. The first argument of this method is a collection of the types of the structure members. The second argument consists of symbols to be used as keys in accessing the members.
ptr = DL.malloc(DL.sizeof('LL'))
ptr.struct!('LL', :tv_sec, :tv_usec)
ptr[:tv_sec] = 10
ptr[:tv_usec] = 100
sec = ptr[:tv_sec]
usec = ptr[:tv_usec]
Again, in the same way as the struct! method is provided for members of structures, there is a union! method for members of unions.

2.6 How to use callbacks

In C, function pointers can be passed as arguments. For instance, in the qsort() method mentioned before, the comparison function needed for sorting is supplied as an argument. In the example just now, we used a function already defined in C, but this time we will define the function on the Ruby side. Firstly, the function would look like this in C:
#include 
int mystrcmp(char **str1, char **str2)
{
    return strcmp(*str1, *str2);
}
To define the callback function in Ruby/DL, use the DL.callback module function.
cb = DL.callback('IPP'){|ptr1, ptr2|
    str1 = ptr1.ptr.to_s
    str2 = ptr2.ptr.to_s
    str1 <=> str2
}
The type specifiers for the function are provided as arguments; in this case, the function takes 2 pointer types and returns an integer. However, in Ruby callbacks can only be defined so far as they are countable(???). ptr1.pointer corresponds to C's '@str1'. The to_s method is used to convert the data already obtained from the pointer into a string. DL.callback's return value is a Symbol object. This can be used with the qsort method we have already defined in the LIBC module.
include LIBC
qsort(["c","a","b"], cb1)

2.7 Defining the 'free' function

In C the usual system is to allocate memory with malloc() and release it with free(). For instance, if the string "abc" is converted to a PtrData type, the result is like this:
ptr = "abc".to_ptr
# => #<DL::PtrData:0x8096f48
# ptr=0x8096788
# free=0x402e28c8>
'ptr' is a pointer value representing this PtrData. 'free' is a function that will be run on this PtrData object when it is recycled by the GC; in this case free() will be called. A Symbol object for free() is pre-defined in Ruby/DL as DL::FREE.
DL::FREE
# => #<DL::Symbol:0x8096658
# func=0x402e28c8
# 'void free(void *);'>
The value denoted by 'func' is a pointer to this function.

The PtrData object's 'free' member can be changed by the user. For instance, to define the 'free' function of the PtrData object 'ptr' as the function 'sym', you can do 'ptr.sym = sym'. (??? surely this should be ptr.free = sym)

3 Ruby/DL Manual

3.1 DL Module

3.2 DL::Handle class

3.3 DL::Symbol class

3.4 DL::PtrData class

3.5 Type Specifiers

3.6 DL::MemorySpace module

4 Interoperability with other extension libraries.