For the past few years I have been working on GNU Guix, a functional package manager. One of the most obvious benefits of a functional package manager is that it allows you to install any number of variants of a software package into a separate environment, without polluting any other environment on your system. This feature makes a lot of sense in the context of scientific computing where it may be necessary to use different versions or variants of applications and libraries for different projects.
Many programming languages come with their own package management
facilities, that some users rely on despite their obvious limitations.
In the case of GNU R the built-in install.packages
procedure
makes it easy for users to quickly install packages from CRAN, and the
third-party devtools
package extends this mechanism to
install software from other sources, such as a git repository.
Unfortunately, limitations in how binaries are executed and
linked on GNU+Linux systems make it hard for people to continue to use
the package installation facilities of the language when also using R
from Guix on a distribution of the GNU system other than GuixSD.
Packages that are installed through install.packages
are
built on demand. Some of these packages provide bindings to other
libraries, which may be available at the system level. When these
bindings are built, R uses the compiler toolchain and the libraries the
system provides. All software in Guix, on the other hand, is
completely independent from any libraries the host system provides,
because that's a direct consequence of implementing functional package
management. As a result, binaries from Guix do not have binary
compatibility with binaries built using system tools and linked with
system libraries. In other words: due to the lack of a shared ABI
between Guix binaries and system binaries, packages built with the
system toolchain and linked with non-Guix libraries cannot be loaded
into a process of a Guix binary (and vice versa).
Of course, this is not always a problem, because not all R packages provide bindings to other libraries; but the problem usually strikes with more complicated packages where using Guix makes a lot of sense as it covers the whole dependency graph.
Because of this nasty problem, which cannot be solved without a
redesign of compiler toolchains and file formats, I have been
recommending people to just use Guix for everything and avoid mixing
software installation methods. Guix comes with many R packages and
for those that it doesn't include it has an importer for the CRAN and
Bioconductor repositories, which makes it easy to create Guix package
expressions for R packages. While this is certainly valid advice, it
ignores the habits of long-time R users, who may be really attached to
install.packages
or devtools
.
There is another way; you can have your cake and eat it too. The
problem arises from using the incompatible libraries and toolchain
provided by the operating system. So let's just not do this,
mmkay? As long as we can make R from Guix use libraries and the
compiler toolchain from Guix we should not have any of these
ABI problems when using install.packages
.
Let's create an environment containing the current version of R,
the GCC toolchain, and the GNU Fortran compiler with Guix. We could
use guix environment --ad-hoc
here, but it's better to use a
persistent profile.
$ guix package -p /path/to/.guix-profile
-i r gcc-toolchain gfortran
To "enter" the profile I recommend using a sub-shell like this:
$ bash
$ source /path/to/.guix-profile/etc/profile
$ …
$ exit
When inside the sub-shell we see that we use both the GCC toolchain and R from Guix:
$ which gcc
$ /gnu/store/…-profile/bin/gcc
$ which R
$ /gnu/store/…-profile/bin/R
Note that this is a minimal profile; it contains the GCC toolchain with a linker that ensures that e.g. the GNU C library from Guix is used at link time. It does not actually contain any of the libraries you may need to build certain packages.
Take the R package "Cairo", which provides bindings to the Cairo rendering libraries as an example. Trying to build this in this new environment will fail, because the Cairo libraries are not found. To privide the required libraries we exit the environment, install the Guix packages providing the libraries and re-enter the environment.
$ exit
$ guix package -p /path/to/.guix-profile -i cairo libxt
$ bash
$ source /path/to/.guix-profile/etc/profile
$ R
> install.packages("Cairo")
…
* DONE (Cairo)
> library(Cairo)
>
Yay! This should work for any R package with bindings to any
libraries that are in Guix. For this particular case you could have
installed the r-cairo
package using Guix, of course.
What happens if the system provides the required header files and libraries? Will the GCC toolchain from Guix use them? Yes. But that's okay, because it won't be able to compile and link the binaries anyway. When the files are provided by both Guix and the system the toolchain prefers the Guix stuff.
It is possible to prevent the R process and all its
children from ever seeing system libraries, but this requires the use
of containers, which are not available on somewhat older kernels that
are commonly used in scientific computing environments. Guix provides
support for containers, so if you use a modern Linux kernel on your
GNU system you can avoid some confusion by using either guix
environment --container
or guix container
. Check out
the glorious manual.
Another problem is that the packages you build manually do not come with the benefits that Guix provides. This means, for example, that these packages won't be bit-reproducible. If you want bit-reproducible software environments: use Guix and don't look back.
install.packages
let R from Guix use
the GCC toolchain and libraries from Guix.If you want to learn more about GNU Guix I recommend taking a look at the excellent GNU Guix project page, which offers links to talks, papers, and the manual. Feel free to contact me if you want to learn more about packaging scientific software for Guix. It is not difficult and we all can benefit from joining efforts in adopting this usable, dependable, hackable, and liberating platform for scientific computing with free software.
The Guix community is very friendly, supportive, responsive and welcoming. I encourage you to visit the project’s IRC channel #guix on Freenode, where I go by the handle “rekado”.
Comments? Then send me an email! Interesting comments may be published here.