How Dost Thy Linker Link?

Now at last we come to how the linker performs its magic. Once again the discussion divides between static and dynamic linking.

Static Linker

Static linking happens at build time. Object files are collected together; a distinct list of all function names is created, and the linker is tasked with finding a definition for each one.

Different linkers have different command-line options to support OS-specific features. This document isn't intended to teach how to use any particular linker. Our task here is to understand the principles involved, so that you can apply them to your particular situation.

The static linker needs three kinds of information:

Static linker inputs

  1. Object modules to be linked, including libraries

  2. Locations of libraries

  3. Search order

Knitting together the object modules

The static linker merges your object files into one executable. Your project's object files may refer freely (usually) to each other's functions, and the linker will match them up. It will catenate them together, compute every function's offset from the start of the executable, and replace every function reference with the actual address needed for the executable it's constructing. For library functions, definitions are copied from the library and appended to the output file (executable). The placeholder addresses left by the compiler are similarly replaced by offsets.

Specifying libraries

An application programmer using FreeTDS™ will need to mention the name fo the FreeTDS™ library being used. Failure to do so will provoke the dread undefined reference linker error:

Example A.1. Missing library name

$ gcc -o bsqldb bsqldb.o  
bsqldb.o: In function `get_login':
../../../src/apps/bsqldb.c:816: undefined reference to `dblogin'
../../../src/apps/bsqldb.c:823: undefined reference to `dbsetlname'
../../../src/apps/bsqldb.c:874: undefined reference to `dbsetlname'
../../../src/apps/bsqldb.c:884: undefined reference to `dbsetlname'
../../../src/apps/bsqldb.c:889: undefined reference to `dbsetlname'
…


Finding libraries

Specifying the library is necessary but may be insufficient. The linker may need to be told where to look for the library. This is often the case for the application programmer using FreeTDS™ because the FreeTDS™ libraries may be installed in a location not on the linker's default search path. Linkers are usually pretty blunt about missing libraries:

Example A.2. Library not found

$ gcc -o bsqldb bsqldb.o  -l sybdb
ld: cannot find -lsybdb


Order matters. Linkers tend to be fussy about library search order, some more than others. It's good practice to tell the linker to search project libraries first, third-party libraries (e.g. iconv or kerberos) next, and finally system libraries.

Dynamic Linker

The dynamic linker — also known as the runtime linker — is, like the rest of dynamic linking, more complicated than its static counterpart. Whereas it's impossible even to generate an executable with missing static function references, an executable that uses dynamic libraries depends on the runtime environment to have its references satisfied.

When a dynamically linked application is launched, the OS invokes the runtime linker to resolve any undefined references. Much as the static linker does, the runtime linker consults a list of dynamic libraries along its configured search path. The names of the libraries to search for are embedded in the executable. Sometimes, not always, the search path is found in the executable too. Usually any embedded path can be overridden.

Information in the executable

Exactly what information is in the executable and how to display it depends on the format of the executable. Different OSes use different formats and most Unix derivatives actually support at least two. The most commonly encountered format for the FreeTDS™ programmer is the ELF format. In the interest of your time and mine, that's the one we'll examine here.

The GNU bintool utility readelf displays the information in the executable that is input to the runtime linker:

$ readelf -d src/apps/.libs/bsqldb
Dynamic section at offset 0x6028 contains 20 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libsybdb.so.5]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.12]
 0x000000000000000f (RPATH)              Library rpath: [/usr/pkg/lib:/usr/local/lib]
…

What is this telling us? First, the bsqldb executable uses three shared libraries, namely sybdb for DB-Library, pthread for POSIX threads, and c, the C standard library. The runtime linker is going to have to find those somewhere, and it's going to use only those libraries to resolve unresolved references in the executable.

Second, readelf displays the RPATH. The runtime linker searches for the required dynamic libraries in the directories listed in the RPATH, if extant.

The RPATH is placed in the executable by the static linker. It can be thought of as a hint from the application builder to the system administrator. If an executable is built with an appropriate RPATH, the runtine linker will have all the information it needs to find the required libraries.

Information outside the executable

[Note]Note

Runtime linkers differ. The advice and observations that follow apply in many situations, but not all. The best way to know how yours works is to consult your system's documentation. RTFM!

The NetBSD and GNU linkers both (as of this writing on machines used by the author) honor a configuration file and environment variables. They also have compiled-in default search locations. At a minimum, the default is /usr/lib. Sometimes a configuration file extends this to /usr/local/lib.

The primary environment variable is LD_LIBRARY_PATH. On some systems this overrides the RPATH in the executable. In others it doesn't. Where ineffective, specific libraries (not their paths) can be forceably used with LD_PRELOAD.

Displaying what the Runtime linker will do

The ldd(1) shows which dynamic libraries an executable requires and where, if at all, they'll be found:

$ ldd $(command -v bsqldb)
/usr/local/bin/bsqldb:
        -lc.12 => /usr/lib/libc.so.12
        -lpthread.0 => /usr/lib/libpthread.so.0
        -lsybdb.5 => /usr/local/lib/libsybdb.so.5

Important to understand: ldd is not figuring out this information by itself. It just reports the results of its interrogation of the runtime linker. As the configuration of the runtime linker is changed, so changes the output of ldd.

A Word about Windows®

Windows executables use PE format (derived from the older COFF format), which has no provision for an RPATH. The runtime linker searches the PATH instead, after some built-in locations that usually include the current working directory. Neither ldd nor any similar utility is included in the basic product.

It has been said that Unix is for programmers and Windows is for users, and perhaps that roughly describes the intention. But the Unix features listed above — RPATH and ldd — as well as a canonical filesystem hierarchy and dynamic library versioning, all promote a better user experience. Because of them, the problem of DLL conflicts in Windows hardly exists in Unix. Yet they are neither new nor secrect nor patented nor complicated; Microsoft could have adopted them years ago (as Apple finally did). We therefore know that the 20-year old phenomemon known as “DLL hell” is not inevitable, but a choice signifying nothing so much as Microsoft's indifference to its customers. Recently Microsoft added support to configure different search paths and other attributes based on Application Manifests and Application Configuration Files.

Advice for the lazy

To avoid tinkering with your runtime linker, embed an RPATH in your executable commensurate with its intended runtime environment. If ldd doesn't show the libraries you want, or some are not found, use readelf to see which libraries are used and the RPATH. Relink with a better RPATH if needed.

When testing with new libraries, use LD_PRELOAD to override the default, taking care that the semantics haven't changed.