First, why would we want to do that? Most architectures have a single popular ABI accepted by the kernel and supported by the binutils, on Linux this is usually the System V R4 defined ABI. This is the case of i386. X86-64 also has a single standard ABI based on the i386 ABI but it’s not a System V standard because System V doesn’t seem to have one for x86-64 yet. The ARM case is different because there are more than one ABIs in use and you can get a mismatch when pairing user-space and kernel images or libraries for a program. The older and unstandardised one is called OABI and Schwartz can (attempt to) translate between OABI calls issued by an OABI-compiled program and whatever ABI the host uses. This will be enabled automatically if an OABI executable is detected, no command line switch needed.
Why it seems this hasn’t been done before? Because it’s non-trivial. Currently people resort to using an entire OABI rootfs sitting in a subdirectory of the host rootfs and chrooting to it, if they need to run a OABI binary in a system that uses EABI.
Why is it non-trivial and how does Schwartz do it? In a nutshell if an executable is compiled with a different ABI than the host, we need to translate everything that’s being passed between the program and the libraries it uses (this is assuming the executable is dynamically linked and issues no syscalls directly – otherwise only the syscalls would have to be translated but that cannot be done in user-space so we’re not concerned with this) and the format of this interaction is precisely what ABIs define. Two types of interaction occur that I know of: through data and control. The control is always passed to and from libraries in the same way, through jumps aka. branches, and there isn’t any space for differences between ABIs so we’ll concentrate on the data. Data is passed on various occasions. I will divide all the data interaction into three parts:
- static chunks of data shared between program and library. This means mainly global variables in terms of a C program or other. The format of a variable depends on it’s type and the ABI. The most basic types are encoded always the same way, while data types which are constructed of sub-elements, like structs, have a format governed by the ABI. The ABI usually specifies how elements are packed inside an object and there may be important differences between ABIs. Fortunately global objects are not usually shared by libraries, and those that are, are almost always simple types, so we don’t perform any translation. In addition it would be very difficult because we would have to react to every access to such variables, and in some cases completely impossible, for example for C union types, because the data has more than one interpretation in such cases, and we can’t tell which interpretation is used in which access.
- on program entry. Entry happens only once, when the control is passed to the program at start and is accompanied by some data being passed too (for example the command line arguments). This part is easy because we can have a separate entry for each ABI, and some ABIs just don’t specify any requirements for the entry point (this is the case of OABI and EABI, and the Linux implementation is exactly identical for both of them). So currently there’s only one main() call per architecture in Schwartz.
- on function calls. This is responsible for the biggest part of ABI translation in Schwartz. A function call between a program and a library is accompanied by data being passed both ways, from caller to callee in call arguments, and from callee to caller in the return value. We will see below that a library can be both a callee and a caller, for different functions. Function parameters as well as their return values can be passed differently depending on the ABI. The ABI usually specifies when and which parameter values (or parts of them) are passed in registers (of the CPU or FPU) and which are marshalled on stack, and possibly which are passed as pointers. They can also have different types, ranging from simple to compound, where the packing is important again, as it was in 1.
How does Schwartz handle function calls to different ABIs? We simply make a wrapper for every library function that we suspect may be used, and we resolve function symbols to our wrappers instead of the original functions. Again this is not a generic solution if we want to load arbitrary executables but practically is good enough. If there is an executable that uses symbols we haven’t a wrapper for, we can easily add information about the new function and recompile. The information is generated automatically based on system headers and a list of symbol names (and the list is extracted automatically from a list of executables). Such wrapper will accept parameters in the program’s ABI format, adapt them to the library ABI if needed and call the real function passing the same parameters but in the library’s ABI again. The same has to be done with the return value, just in the reverse order.
But here’s the trick: a function pointer is also a data type, so it can be passed as a parameter or a return value from a library function, and we have to handle it very carefully. Example library functions that take a function pointer as parameter are signal(), qsort() or __libc_start_main() (specified in Linux Standard Base). Example function that returns a function pointer is signal() again. So how do we handle translation of the function pointer data type? We have to generate a wrapper for every value passed that is a function pointer, and since there may be different such values passed in successive calls to the same function as parameters, we have to do it dynamically in the run-time, for every value separately. Fortunately there’s only a finite number of such values because the only valid values are those that point at functions in the program (plus optionally NULL, which we pass intact) and there is a finite number of functions, they aren’t generated dynamically. Now the wrappers will be of two types: those for parameters and those for return values. To see the difference between these two, let’s look at what the callee can do with the value it is passed in a parameter and a value a caller gets when it is returned from a call. It can do two things:
- It can make a call to the function pointed to by the function pointer. If we’re a callee and we got a function pointer in a parameter we will want to make the call in our ABI, while the function was passed from the caller so it expects parameters in the caller’s ABI, so we need translation again. But this time the callee (we) becomes a caller and the target of the call is a function passed from the other ABI, so the translation needs to be in reverse direction. If we are the library and the caller was the program, we now need a wrapper that translates from library ABI to program’s ABI. The converse case is easier: we’re now the caller, we called a function and it returned another function pointer. The function which is pointed at will expect parameters in the callee’s ABI so the translation occurs in the “same direction” as before.
- It can remember the value somewhere and the value can later be returned or passed as a parameter back to the other side. Since the function pointer is a value we got in return or in a parameter, we know that it is already wrapped appropriately by Schwartz. But we are now passing it back to the other side, precisely where it came from. If we follow the logic from 1. we will be unnecessarily wrapping it again (wrapping the wrapper) in a translator of opposite direction. Schwartz has to notice the double wrapping and “annihilate” the two translators and just pass the original pointer, in order to inhibit the possibility of DoS’ing ourselves by generating an infinite serie of wrappers. To see this better here’s an example of when this happens in a C piece:
sighandler_t *original_handler; /* Function pointer */ ... /* Let's setup a handler for SIGUSR1 */ original_handler = signal(SIGUSR1, &my_sigusr1_handler); /* External function is being returned, it is wrapped in an ABI translator, so that we can safely call it (but we don't in this example). */ ... /* Let's restore the original handler */ signal(SIGUSR1, original_handler); /* The wrapped external function is being passed as parameter, normally it would be wrapped again so that the callee can safely call it. But instead we "unwrap" it and we get the same effect. */
The bottom line in 1. is that if we decide to do ABI translation from ABI X to Y, we also have to translate from Y to X occasionally, so they are tied together, and we have to be able to do both things dynamically. In 2. the bottom line is that we need to cache pointers to untranslated functions also. If we add to this the fact that pointers can point to functions which also have function pointers as parameters or return types (see man xdr_union(3)), and that struct or array elements can be function pointers too, and that there can be a variable number of parameters of unknown types, we get a pretty complex task.
There’s another case of functions like dlsym() that return a-void-pointer-but-we-know-it’s-a-lie, for which we need a totally custom translator, but this is more easily doable.