logo

Using R with Guix

三月 24, 2017

Introducing the actors

For the past few years I have been working on GNU Guix, a functional package manager. One of the most obvious benefits of a functional package manager is that it allows you to install any number of variants of a software package into a separate environment, without polluting any other environment on your system. This feature makes a lot of sense in the context of scientific computing where it may be necessary to use different versions or variants of applications and libraries for different projects.

Many programming languages come with their own package management facilities, that some users rely on despite their obvious limitations. In the case of GNU R the built-in install.packages procedure makes it easy for users to quickly install packages from CRAN, and the third-party devtools package extends this mechanism to install software from other sources, such as a git repository.

ABI incompatibilities

Unfortunately, limitations in how binaries are executed and linked on GNU+Linux systems make it hard for people to continue to use the package installation facilities of the language when also using R from Guix on a distribution of the GNU system other than GuixSD. Packages that are installed through install.packages are built on demand. Some of these packages provide bindings to other libraries, which may be available at the system level. When these bindings are built, R uses the compiler toolchain and the libraries the system provides. All software in Guix, on the other hand, is completely independent from any libraries the host system provides, because that's a direct consequence of implementing functional package management. As a result, binaries from Guix do not have binary compatibility with binaries built using system tools and linked with system libraries. In other words: due to the lack of a shared ABI between Guix binaries and system binaries, packages built with the system toolchain and linked with non-Guix libraries cannot be loaded into a process of a Guix binary (and vice versa).

Of course, this is not always a problem, because not all R packages provide bindings to other libraries; but the problem usually strikes with more complicated packages where using Guix makes a lot of sense as it covers the whole dependency graph.

Because of this nasty problem, which cannot be solved without a redesign of compiler toolchains and file formats, I have been recommending people to just use Guix for everything and avoid mixing software installation methods. Guix comes with many R packages and for those that it doesn't include it has an importer for the CRAN and Bioconductor repositories, which makes it easy to create Guix package expressions for R packages. While this is certainly valid advice, it ignores the habits of long-time R users, who may be really attached to install.packages or devtools.

Schroedinger's Cake

There is another way; you can have your cake and eat it too. The problem arises from using the incompatible libraries and toolchain provided by the operating system. So let's just not do this, mmkay? As long as we can make R from Guix use libraries and the compiler toolchain from Guix we should not have any of these ABI problems when using install.packages.

Let's create an environment containing the current version of R, the GCC toolchain, and the GNU Fortran compiler with Guix. We could use guix environment --ad-hoc here, but it's better to use a persistent profile.

$ guix package -p /path/to/.guix-profile 
    -i r gcc-toolchain gfortran

To "enter" the profile I recommend using a sub-shell like this:

$ bash
$ source /path/to/.guix-profile/etc/profile
$ …
$ exit

When inside the sub-shell we see that we use both the GCC toolchain and R from Guix:

$ which gcc
$ /gnu/store/…-profile/bin/gcc
$ which R
$ /gnu/store/…-profile/bin/R

Note that this is a minimal profile; it contains the GCC toolchain with a linker that ensures that e.g. the GNU C library from Guix is used at link time. It does not actually contain any of the libraries you may need to build certain packages.

Take the R package "Cairo", which provides bindings to the Cairo rendering libraries as an example. Trying to build this in this new environment will fail, because the Cairo libraries are not found. To privide the required libraries we exit the environment, install the Guix packages providing the libraries and re-enter the environment.

$ exit
$ guix package -p /path/to/.guix-profile -i cairo libxt
$ bash
$ source /path/to/.guix-profile/etc/profile
$ R
> install.packages("Cairo")
…
 * DONE (Cairo)
> library(Cairo)
>

Yay! This should work for any R package with bindings to any libraries that are in Guix. For this particular case you could have installed the r-cairo package using Guix, of course.

Potential problems and potential solutions

What happens if the system provides the required header files and libraries? Will the GCC toolchain from Guix use them? Yes. But that's okay, because it won't be able to compile and link the binaries anyway. When the files are provided by both Guix and the system the toolchain prefers the Guix stuff.

It is possible to prevent the R process and all its children from ever seeing system libraries, but this requires the use of containers, which are not available on somewhat older kernels that are commonly used in scientific computing environments. Guix provides support for containers, so if you use a modern Linux kernel on your GNU system you can avoid some confusion by using either guix environment --container or guix container. Check out the glorious manual.

Another problem is that the packages you build manually do not come with the benefits that Guix provides. This means, for example, that these packages won't be bit-reproducible. If you want bit-reproducible software environments: use Guix and don't look back.

Summary

Learn more!

If you want to learn more about GNU Guix I recommend taking a look at the excellent GNU Guix project page, which offers links to talks, papers, and the manual. Feel free to contact me if you want to learn more about packaging scientific software for Guix. It is not difficult and we all can benefit from joining efforts in adopting this usable, dependable, hackable, and liberating platform for scientific computing with free software.

The Guix community is very friendly, supportive, responsive and welcoming. I encourage you to visit the project’s IRC channel #guix on Freenode, where I go by the handle “rekado”.

Read more posts about GNU Guix here.

← other posts

Comments? Then send me an email! Interesting comments may be published here.