Compiling Autotooled projects to LLVM Bitcode

To be able to analyze a C/C++ project with LLVM Datalog, we must generate a single whole-program LLVM bitcode file. Many such projects are built with Autotools though, so we must manually tweak the build process to do that.

In what follows, I describe the necessary steps for compiling GNU coreutils so that we end up with a single LLVM bitcode file per coreutil command (instead of an executable file), but the process should be roughly the same for other autotooled software as well.

Prerequisites

  1. Check that the gold linker is installed on your system. On Fedora, it is normally installed under /usr/bin/ld.gold. We could change the default linker (i.e., ld) to point to ld.gold by using alternatives, but it is not necessary.

  2. Verify that ld.gold accepts plugin arguments by running:

    $ ld.gold -plugin
    

    You should see a warning such as -plugin: missing argument.

  3. Verify that your LLVM installation contains the gold plugin (look for lib/LLVMgold.so under the main LLVM directory that also contains bin/clang). This is probably not the case for LLVM prebuilt binaries, so you may need to compile LLVM from source (see next section).

Compiling LLVM from source with gold plugin

See the Getting Started page on how to build LLVM. Choose the "Unix Makefiles" as a generator. In order to build the gold plugin, you also need to specify the path to plugin-api.h of binutils (which you may also need to build). The command to build LLVM should be something similar to:

$ cd /path/to/llvm/
$ mkdir build
$ cd build
$ cmake -G "Unix Makefiles" -DLLVM_BINUTILS_INCDIR=/path/to/binutils/include ../llvm/

Building will take a long time, but when it is over it should have created lib/LLVMgold.so.

Now you should be able to create whole program binaries via clang. By applying the -flto flag while compiling, you create an LLVM bitcode file instead of an object file. Then you need to pass the emit-llvm plugin option to the gold linker, so that it, too, generates bitcode instead of an executable. Test the previous scenario on a trivial single file C project by running:

$ clang -flto -c test.c
$ file test.o
test.o: LLVM IR bitcode
$ clang -flto -fuse-ld=gold -Wl,-plugin-opt=emit-llvm test.o
$ file a.out
a.out: LLVM IR bitcode

Building GNU coreutils with GNU autoconf

So now, all we have to do is override some environment variables used by the compiling and linking commands that will be generated by the configure script of coreutils.

Almost. The thing is that configure makes many complicate tests that actually require that linking generates an executable file. LLVM bitcode is not executable, however.

You may be tempted to change the configure script so that it tries to execute the bitcode file with lli, which is an interpreter for LLVM bitcode. Do not go down that road though, since it requires understanding and then performing various changes all over an intricate, autogenerated, and not that human friendly gigantic bourne script of about 70k lines!

It is much easier to first run the configure script so that it generates normal executable files by clang (so that it passes the tests) and then slightly change the linking command of the generated Makefile, to add the aforementioned plugin option.

To recap:

  1. Setup your environment and run ./configure.

    $ export CC=clang
    $ export CXX=clang++
    $ export RANLIB=llvm-ranlib
    $ export CFLAGS=" -flto -std=gnu99 "
    $ export LDFLAGS=" -flto -fuse-ld=gold "
    $ ./configure
    
  2. Open the generated Makefile, locate the initialization of the LDFLAGS variable, and change it so that it includes the plugin option to emit LLVM bitcode. It should now look like:

    LDFLAGS =  -flto -fuse-ld=gold -Wl,-plugin-opt=emit-llvm
    
  3. At some point, there used to be an also-emit-llvm plugin option, which generated both an executable and a bitcode file. I do not see such an option at LLVM 3.7.0, however. Had it been there, we could just set the LDFLAGS to its final value before calling ./configure, and leave Makefile unchanged.

    UPDATE Turns out this option still exists but has been renamed to save-temps. So if you want to create both normal executables and bitcode files, skip the previous step and try instead:

    $ ...
    $ export LDFLAGS=" -flto -fuse-ld=gold  -Wl,-plugin-opt=save-temps "
    $ ./configure
    
  4. And now it’s time to compile GNU coreutils at last!

    $ make
    

    The build may fail at some point near the testing stage, but it should have already generated all our precious whole-program LLVM bitcode coreutils, nonetheless.

  5. Verify that this is indeed the case by sampling the src/ directory. E.g.:

    $ file src/who
    src/who: LLVM IR bitcode
    
  6. (Optional) To rename all the whole-program generated bitcode files (adding the standard .bc suffix) and gather them to some out/ directory, run:

    $ mkdir out/
    $ for i in `find src/ -type f -not -name '*.o' -maxdepth 1`
    > do [[ $(file -b $i) = "LLVM IR bitcode" ]] && cp $i ${i/src/out}.bc
    > done
    

And now we are done!