Compiling Autotooled projects to LLVM Bitcode
To be able to analyze a C/C++ project with LLVM Datalog, we must generate a single whole-program LLVM bitcode file. Many such projects are built with Autotools though, so we must manually tweak the build process to do that.
In what follows, I describe the necessary steps for compiling GNU coreutils so that we end up with a single LLVM bitcode file per coreutil command (instead of an executable file), but the process should be roughly the same for other autotooled software as well.
Prerequisites
-
Check that the gold linker is installed on your system. On Fedora, it is normally installed under
/usr/bin/ld.gold
. We could change the default linker (i.e.,ld
) to point told.gold
by using alternatives, but it is not necessary. -
Verify that
ld.gold
accepts plugin arguments by running:$ ld.gold -plugin
You should see a warning such as
-plugin: missing argument
. -
Verify that your LLVM installation contains the gold plugin (look for
lib/LLVMgold.so
under the main LLVM directory that also containsbin/clang
). This is probably not the case for LLVM prebuilt binaries, so you may need to compile LLVM from source (see next section).
Compiling LLVM from source with gold plugin
See the Getting Started
page on how to build LLVM. Choose the "Unix Makefiles"
as a
generator. In order to build the gold plugin, you also need to specify
the path to plugin-api.h
of binutils (which you may also need to
build). The command to build LLVM should be something similar to:
$ cd /path/to/llvm/
$ mkdir build
$ cd build
$ cmake -G "Unix Makefiles" -DLLVM_BINUTILS_INCDIR=/path/to/binutils/include ../llvm/
Building will take a long time, but when it is over it should have created
lib/LLVMgold.so
.
Now you should be able to create whole program binaries via
clang
. By applying the -flto
flag while compiling, you create an
LLVM bitcode file instead of an object file. Then you need to pass the
emit-llvm
plugin option to the gold linker, so that it, too,
generates bitcode instead of an executable. Test the previous scenario
on a trivial single file C project by running:
$ clang -flto -c test.c
$ file test.o
test.o: LLVM IR bitcode
$ clang -flto -fuse-ld=gold -Wl,-plugin-opt=emit-llvm test.o
$ file a.out
a.out: LLVM IR bitcode
Building GNU coreutils with GNU autoconf
So now, all we have to do is override some environment variables used
by the compiling and linking commands that will be generated by the
configure
script of coreutils.
Almost. The thing is that configure
makes many complicate tests that
actually require that linking generates an executable file. LLVM
bitcode is not executable, however.
You may be tempted to change the configure
script so that it tries
to execute the bitcode file with
lli
, which is an
interpreter for LLVM bitcode. Do not go down that road though, since
it requires understanding and then performing various changes all over
an intricate, autogenerated, and not that human friendly gigantic
bourne script of about 70k lines!
It is much easier to first run the configure
script so that it
generates normal executable files by clang (so that it passes the
tests) and then slightly change the linking command of the generated
Makefile, to add the aforementioned plugin option.
To recap:
-
Setup your environment and run
./configure
.$ export CC=clang $ export CXX=clang++ $ export RANLIB=llvm-ranlib $ export CFLAGS=" -flto -std=gnu99 " $ export LDFLAGS=" -flto -fuse-ld=gold " $ ./configure
-
Open the generated
Makefile
, locate the initialization of theLDFLAGS
variable, and change it so that it includes the plugin option to emit LLVM bitcode. It should now look like:LDFLAGS = -flto -fuse-ld=gold -Wl,-plugin-opt=emit-llvm
-
At some point, there used to be an
also-emit-llvm
plugin option, which generated both an executable and a bitcode file. I do not see such an option atLLVM 3.7.0
, however. Had it been there, we could just set theLDFLAGS
to its final value before calling./configure
, and leaveMakefile
unchanged.UPDATE Turns out this option still exists but has been renamed to
save-temps
. So if you want to create both normal executables and bitcode files, skip the previous step and try instead:$ ... $ export LDFLAGS=" -flto -fuse-ld=gold -Wl,-plugin-opt=save-temps " $ ./configure
-
And now it’s time to compile GNU coreutils at last!
$ make
The build may fail at some point near the testing stage, but it should have already generated all our precious whole-program LLVM bitcode coreutils, nonetheless.
-
Verify that this is indeed the case by sampling the
src/
directory. E.g.:$ file src/who src/who: LLVM IR bitcode
-
(Optional) To rename all the whole-program generated bitcode files (adding the standard
.bc
suffix) and gather them to someout/
directory, run:$ mkdir out/ $ for i in `find src/ -type f -not -name '*.o' -maxdepth 1` > do [[ $(file -b $i) = "LLVM IR bitcode" ]] && cp $i ${i/src/out}.bc > done
And now we are done!