4: OABI spec

February 10, 2008 by balrog

Bad news, I’m gonna talk about OABI again.  I just want to write down what I found about it before I forget, so that it gets indexed and a person needing to know something can find it on google.

I was told by a gcc hacker that it was based on the APCS32 ABI whose specification can be found here.  The specification is however very vague about some parts, and other parts are simply different from OABI, so I’ll point these out and refer to APCS32 in other places, and compare with EABI also.

Control arrival. One thing that is not specified at all in either APCS32 or EABI is the program entry point requirements. These may be system-specific but the Linux Standard Base has no mention of ARM entry point either.  The only reference thus is the Linux kernel code. Qemu-arm code is based on it. The requirements don’t seem to have changed between OABI and EABI and they’re also pretty much identical to the x86 entry point requirements which can be found in the SysVr4 docs, modulo some of the tags put on the stack before entry. They can be found in Linux or qemu and I’m not gonna list them here.

APCS Variants.  The APCS32 document specifies 16 incompatible variants based on four different properties that can have two possible values each.  Linux OABI is the 32-bit case (as opposed to 26), with implicit stack-limit checking (as opposed to done in software), floating-point arguments/return values passed in registers and on stack (i.e. FPU registers are not used for that) and is non-reentrant (except libraries).

Arguments passing.  Register have the same meanings as in APCS32 with first four words of argument list passed in registers and the rest on stack, with the possibility of a single argument split between the two.

Floating-point values.  There’s no mention of their encoding in APCS32 but it seems to be the standard IEEE 754 encoding - with a small caveat… Doubles and long doubles have their first 32-bit word swapped with the second word, when compared to EABI or x86. The same applies to both of the individual doubles inside a double _Complex and inside long double _Complex.

Return values.  Here we have the same three variants as in APCS32: no return value, return value in register(s) and return value in an implicit pointer passed as arg0.  There is a very tricky difference from APCS32 though: when is the second variant chosen and when is it the third one.  APCS32 recognises something it calls simple types which it defines as anything that fits in four bytes.  Anything bigger is returned through a pointer.  In OABI there seems to be a similar idea except the simple type is defined differently: All C basic types are simple, even if they exceed the word width.  In addition to this a struct seems to be considered simple as long as it has only a single member whose type is simple (possibly a struct) and not larger than one word.  Arrays are never simple and unions are simple if all their fields are simple.

The $100 question is how do you return an object of a simple type wider than one word in a register?  APCS32 allows only r0 to be used for that, but gcc doesn’t mind using also r1, r2 and r3.  So a long long int, double, long double, int _Complex or a float _Complex will all be returned in the r0-r1 pair, while double _Complex and long double _Complex get returned in r0-r3.

Alignment.  Pointers have to be word-aligned only and this applies also to the stack pointer on call.  This is nothing special but if you want to mix OABI and EABI it becomes a major caveat because EABI requires the stack to be aligned on 8-bytes in inter-linking-unit calls.  If you forget about it and call and an EABI function from OABI context you will get the strangest and extremely hard to debug results, such as glibc sprintf() returning a wrong value, which can very painful.

Another change that happened at the same time as the OABI to EABI switch in Linux was a switch from setjmp/longjmp based C++ exceptions (the generic, cross-platform way) to a new, faster model (EABI does specify how exceptions should be handled and how stack unwinding works, while APCS doesn’t - this aspect is known as C++ personality across the docs and code).  I am not describing it here.

If something of the above is wrong, please lemme know.

3: Getting gllin to run

January 27, 2008 by balrog

I was going to make a small trip this weekend but I missed my plane and have to wait until next week. But that means I already have a good excuse for not spending the weekend studying for this week’s exams and I have finally put the time into making gllin behave under Schwartz.

Gllin is a closed-source driver for the Global Locate (now Broadcomm?) GPS known as Hammerhead and it’s been said it didn’t work when the folks compiled it for ARM EABI (i.e. what is used on most ARMs currently) so they only released the OABI binary (the ad-hoc ABI that was used on Linux until ARM came up with a standard ABI and hired people to implement it). So the downloadable gllin package comes with an OABI rootfs which will run under chroot if you have OABI support in your kernel. It seemed wrong to me to have a second rootfs on my phone to run a single program, and it has several other drawbacks.

With the Schwartz loader/linker you can run OABI-compiled programs natively on Linux systems that use different ABIs. This is achieved through translation of library calls that I mentioned previously. Schwartz is by no means complete, and more than anything it’s a proof-of-concept, but it seems to be usable and today my Neo1973 had an actual 3d fix and gave me real coordinates as well as satellite time/date and other info. I took my Neo for an excursion to the shopping mall (not so much to show off, but) to make my first GPS trace for OpenStreetMap. It ran quite stably for the whole 2h and I uploaded the trace here. So here’s how to use it.

Download the schwartz binary from here or here (minimal version). The sources are in this git tree, but building them is not exactly straight-forward. Upload the file to your Neo1973 (or qemu-neo1973). Upload also the gllin binary if you don’t have it there already. In the openmoko package the binary is named gllin.real because gllin is a wrapper script that runs the whole chroot thing. You only need the “.real” binary. You can also safely leave out OABI support from your kernel. Next, make the named pipe for your NMEA data, same way the openmoko package does. After that we’re ready to run gllin and then your favourite gps software.

 $ mknod /tmp/nmeaNP p
 $ cat /tmp/nmeaNP | gzip >> /home/root/gps.gz &
 $ ./ld4 --depnofail --weakdummies --settargetname --noinit gllin -low 5
 $ ./ld4 --depnofail --weakdummies --settargetname --noinit gllin -periodic 2

You can modify the scripts from the package to do all that. ld4 is quite verbose and will print lots of stuff tot he console, which just shows how far it is from completeness. The minimal ld4 differs from the full binary in that the “strace” code is not compiled in. With the full binary, if you append –trick-strace to the cmdline options you will get a strace-like (but more pretty!) log of all functions being called and their parameters. This may potentially be useful for the folks reverse engineering the Hammerhead protocol but I’m not really sure. In the ld4 output you can see a lot of debugging messages and other, that gllin doesn’t normally print out. I have not noticed any anomalies when running gllin under Schwartz but it’s totally possible that the floating-point precision is reduced or something else is broken. gllin is a pretty tough test case for the ABI translation thing for various reasons: all the floating-point arithmetics, heavy usage of memory/files/sockets, C++ libraries, C++ exceptions, real-time constraints and more.

Among other things schwartz enables you to do is running gllin without root privileges (chroot normally requires those). Also an interesting thing to do is compare the strace (the real traditional strace) output of gllin running under a chroot with OABI compiled libc, and the strace output of the same gllin running under schwartz and using EABI libc. You’ll see two different sequences of syscalls being made, but having pretty much the same end effect.

I probably won’t have time to hack schwartz further but improvements from others are welcome. I just wish I had the thing running earlier - ironically I already have a GTA02 on my desk, and GTA02 has a different GPS chip in it which needs no driver on the OS side. There’s very little time left till the mass-production and selling of GTA02 starts and gllin slides into oblivion. (It seems that the TomTom Go’s using the same or a similar driver though).

2: ABI translation

January 4, 2008 by balrog

First, why would we want to do that? Most architectures have a single popular ABI accepted by the kernel and supported by the binutils, on Linux this is usually the System V R4 defined ABI. This is the case of i386. X86-64 also has a single standard ABI based on the i386 ABI but it’s not a System V standard because System V doesn’t seem to have one for x86-64 yet. The ARM case is different because there are more than one ABIs in use and you can get a mismatch when pairing user-space and kernel images or libraries for a program. The older and unstandardised one is called OABI and Schwartz can (attempt to) translate between OABI calls issued by an OABI-compiled program and whatever ABI the host uses. This will be enabled automatically if an OABI executable is detected, no command line switch needed.

Why it seems this hasn’t been done before? Because it’s non-trivial. Currently people resort to using an entire OABI rootfs sitting in a subdirectory of the host rootfs and chrooting to it, if they need to run a OABI binary in a system that uses EABI.

Why is it non-trivial and how does Schwartz do it? In a nutshell if an executable is compiled with a different ABI than the host, we need to translate everything that’s being passed between the program and the libraries it uses (this is assuming the executable is dynamically linked and issues no syscalls directly - otherwise only the syscalls would have to be translated but that cannot be done in user-space so we’re not concerned with this) and the format of this interaction is precisely what ABIs define. Two types of interaction occur that I know of: through data and control. The control is always passed to and from libraries in the same way, through jumps aka. branches, and there isn’t any space for differences between ABIs so we’ll concentrate on the data. Data is passed on various occasions. I will divide all the data interaction into three parts:

  1. static chunks of data shared between program and library. This means mainly global variables in terms of a C program or other. The format of a variable depends on it’s type and the ABI. The most basic types are encoded always the same way, while data types which are constructed of sub-elements, like structs, have a format governed by the ABI. The ABI usually specifies how elements are packed inside an object and there may be important differences between ABIs. Fortunately global objects are not usually shared by libraries, and those that are, are almost always simple types, so we don’t perform any translation. In addition it would be very difficult because we would have to react to every access to such variables, and in some cases completely impossible, for example for C union types, because the data has more than one interpretation in such cases, and we can’t tell which interpretation is used in which access.
  2. on program entry. Entry happens only once, when the control is passed to the program at start and is accompanied by some data being passed too (for example the command line arguments). This part is easy because we can have a separate entry for each ABI, and some ABIs just don’t specify any requirements for the entry point (this is the case of OABI and EABI, and the Linux implementation is exactly identical for both of them). So currently there’s only one main() call per architecture in Schwartz.
  3. on function calls. This is responsible for the biggest part of ABI translation in Schwartz. A function call between a program and a library is accompanied by data being passed both ways, from caller to callee in call arguments, and from callee to caller in the return value. We will see below that a library can be both a callee and a caller, for different functions. Function parameters as well as their return values can be passed differently depending on the ABI. The ABI usually specifies when and which parameter values (or parts of them) are passed in registers (of the CPU or FPU) and which are marshalled on stack, and possibly which are passed as pointers. They can also have different types, ranging from simple to compound, where the packing is important again, as it was in 1.

How does Schwartz handle function calls to different ABIs? We simply make a wrapper for every library function that we suspect may be used, and we resolve function symbols to our wrappers instead of the original functions. Again this is not a generic solution if we want to load arbitrary executables but practically is good enough. If there is an executable that uses symbols we haven’t a wrapper for, we can easily add information about the new function and recompile. The information is generated automatically based on system headers and a list of symbol names (and the list is extracted automatically from a list of executables). Such wrapper will accept parameters in the program’s ABI format, adapt them to the library ABI if needed and call the real function passing the same parameters but in the library’s ABI again. The same has to be done with the return value, just in the reverse order.

But here’s the trick: a function pointer is also a data type, so it can be passed as a parameter or a return value from a library function, and we have to handle it very carefully. Example library functions that take a function pointer as parameter are signal(), qsort() or __libc_start_main() (specified in Linux Standard Base). Example function that returns a function pointer is signal() again. So how do we handle translation of the function pointer data type? We have to generate a wrapper for every value passed that is a function pointer, and since there may be different such values passed in successive calls to the same function as parameters, we have to do it dynamically in the run-time, for every value separately. Fortunately there’s only a finite number of such values because the only valid values are those that point at functions in the program (plus optionally NULL, which we pass intact) and there is a finite number of functions, they aren’t generated dynamically. Now the wrappers will be of two types: those for parameters and those for return values. To see the difference between these two, let’s look at what the callee can do with the value it is passed in a parameter and a value a caller gets when it is returned from a call. It can do two things:

  1. It can make a call to the function pointed to by the function pointer. If we’re a callee and we got a function pointer in a parameter we will want to make the call in our ABI, while the function was passed from the caller so it expects parameters in the caller’s ABI, so we need translation again. But this time the callee (we) becomes a caller and the target of the call is a function passed from the other ABI, so the translation needs to be in reverse direction. If we are the library and the caller was the program, we now need a wrapper that translates from library ABI to program’s ABI. The converse case is easier: we’re now the caller, we called a function and it returned another function pointer. The function which is pointed at will expect parameters in the callee’s ABI so the translation occurs in the “same direction” as before.
  2. It can remember the value somewhere and the value can later be returned or passed as a parameter back to the other side. Since the function pointer is a value we got in return or in a parameter, we know that it is already wrapped appropriately by Schwartz. But we are now passing it back to the other side, precisely where it came from. If we follow the logic from 1. we will be unnecessarily wrapping it again (wrapping the wrapper) in a translator of opposite direction. Schwartz has to notice the double wrapping and “annihilate” the two translators and just pass the original pointer, in order to inhibit the possibility of DoS’ing ourselves by generating an infinite serie of wrappers. To see this better here’s an example of when this happens in a C piece:
    sighandler_t *original_handler;         /* Function pointer */
    ...
    /* Let's setup a handler for SIGUSR1 */
    original_handler = signal(SIGUSR1, &my_sigusr1_handler);
                                            /* External function is being returned,
                                               it is wrapped in an ABI translator,
                                               so that we can safely call it (but
                                               we don't in this example).  */
    ...
    /* Let's restore the original handler */
    signal(SIGUSR1, original_handler);      /* The wrapped external function is
                                               being passed as parameter, normally
                                               it would be wrapped again so that the
                                               callee can safely call it.  But
                                               instead we "unwrap" it and we get the
                                               same effect.  */

The bottom line in 1. is that if we decide to do ABI translation from ABI X to Y, we also have to translate from Y to X occasionally, so they are tied together, and we have to be able to do both things dynamically. In 2. the bottom line is that we need to cache pointers to untranslated functions also. If we add to this the fact that pointers can point to functions which also have function pointers as parameters or return types (see man xdr_union(3)), and that struct or array elements can be function pointers too, and that there can be a variable number of parameters of unknown types, we get a pretty complex task.

There’s another case of functions like dlsym() that return a-void-pointer-but-we-know-it’s-a-lie, for which we need a totally custom translator, but this is more easily doable.

1: Presenting Schwartz

January 4, 2008 by balrog
Use the Schwartz, Luke!

It seems everyone needs to code at least one ELF loader of their own, so here’s mine. Schwartz is a yet another ELF loader and linker that can do a couple of tricks that other linkers can’t do (names not included - any similarity is purely coincidental), like ABI translation. I started it when the gllin binary was released to public in November but never had the time to finish it. It aims to be a generic linker not tied to any architecture or host ABI, but gllin was a good reason to start coding. My next couple of posts will be related to Schwartz as well, so you better be interested!

Schwartz doesn’t use the ELF interpreter mechanism like the ld-linux linker - it compiles to a normal user-space program that needs no special privilege level. Typically the user just runs the linker (the executable name is ld4) passing as a parameter the name of the executable to load and run. Supported architectures are at the moment x86-64, ARM and i386 (the last one untested).

For that to work we have to use some tricks at every level, starting from the loader part. Because every hack has its limits (that make it what we call a hack), if you take The Schwartz code and try to extend it you may hit one of the limits and see that things stop working. There’s nothing inherently unfixable in it but you may need to come up with a new hack.

  • The loader

Its task is loading the contents of an ELF executable into memory at the right locations where the ELF will feel especially comfortable. In other words we construct the memory image of the program out of the image in the executable file. This at first seemed like an easy task because I had zero experience with ELF executables and my last experience with executables was from ms-dos times where all executables were relocatable. So in my endless ignorance I was thinking I’d just reserve a piece of memory, dump the contents there and relocate the code. Obviously this didn’t work because it turns out operating systems stopped using relocatable binaries for normal programs about twenty years ago when I wasn’t paying attention. So to make the program feel at home you have to place the code at the exact addresses it wants.

To run fully in user-space we use a linker script that moves our own code to a non-standard location in the memory image, so that the standard location becomes free and we can load the executable there. Such linker script can be pretty much generated automatically for every platform. Obviously on the target executable could have also used a linker script and chosen an address colliding with our non-standard addresses. In this case the dungeon collapses and we don’t support such executables. The user has to go and modify the script (which is fairly trivial) to be able to run such executable. The user can even go farther and support only a single executable and just link the ld4 with her target program into a single file if she wants to only take advantage of (say) the ABI translation feature for this single program.

By doing that we have both programs in a single memory space / single process, happily coexisting and we gain one interesting feature: If we attach a debugger to the process, we will have the symbols from both executables in place. This means we can load the debug info for either of the programs into the debugger and the debugger will see the symbols in the right places and not get confused. In GDB you can switch the debugged binary in runtime without detaching from the process.

  • Linker

The linker is used only for dynamic executables. It looks at the list of symbols in the external libraries that are used by our target program and resolves each of them by loading the necessary library and finding the symbol. Again we have both programs (ld4 and the target) in a single process so we can share the libraries instead of loading them two times. I use libdl for external symbols rather then resolving them manually but there’s no reason the Schwartz couldn’t recursively load the libraries as well. Currently we support only a very small subset of the defined relocation types but this seems to be more than enough for programs built with binutils (i.e. all programs).

Because we control what we resolve every symbol to, we can override the library symbols with our own when we want. This allows us to play different kinds of tricks on the program.

One such trick is a strace-like tracing of the calls made by the program to library functions. I’ve implemented that for most of the <string.h> calls as an example, this functionality is turned on with the –trick-strace switch.

Another feature is a fake chroot done with simply mangling the path strings passed forward and back between the program and libraries. This is ofcourse not as secure as a real chroot if you allow arbitrary executables, because an executable may use libraries or library functions that we haven’t provided a wrapper for, or use syscalls directly. However, it has the advantage that any user can use it, while normal chroot requires root privileges. This is enabled with –trick-chroot <path>.

Yet another trick could be a user-space implementation of a poor man’s debugger, with the capability to set breakpoints, inspect data, etc., but perhaps not watchpoints (at least not easily) and other fancies. I’m not implementing this.

And yet another trick based on overriding library symbols is C++ exception model translation and ABI translation. More about this in the next post. Look out!

At OH-plex

December 22, 2007 by balrog

The Christmas arrived at OpenedHand a week ago, in form of a Millenium Falcon, and on that occasion we had some (traditional British) fun together. This was my first chance to see the famous OH-plex live, the place where the ideas hatch for the near and far future of open-source mobile computing. The interior is full of things you’ve never seen and the reindeers help the elves with some of the heavier tasks we have to complete for our super-secret but top of the market corporate customers. There are surprisingly few actual PCs (I think I counted 2 or 3 desktops). Some pictures here.

The said traditional British Xmas goes like this: first thing in the morning you ride quad bikes, then do some archery and this is followed by laser-modded full-size shotgun shooting. Then, to top off the Christmas spirit you jump into a 4×4 jeep for some blindfolded driving in winter mud on a farm in Kent. Basically a day full of crazy fun and joy of celebrating a globe-wide cross-religion holiday.

To get to the place I landed at the London City Airport which is like a normal international airport except it’s squeezed in a space ten times smaller than a usual airport. That allows it to function relatively close to the city centre so you don’t have to travel far. The single runway is practically built *on* the Thames, surrounded by water from all sides and looks quite impressive.

It was very cool to see London after perhaps some ten years since my last visit, and the Thames river had a new feel after my (not so) recent lecture of “Two and a Half Men in a Boat” by Nigel Williams (great read, btw., and how the book got in my hands in the middle of wild wild south of France is a weird story).

Dereference them null pointers

December 3, 2007 by balrog

Yesterday I played more with executables under Linux and was introduced to linker scripts. Soon it turned out that with a very simple linker script you can have a section starting at vma 0 (zero) in a non-relocatable binary. This wouldn’t be any interesting, but address 0 is where the NULL constant points. So now, if the section was readable and/or writable I could cast NULL to any pointer type, which, as we read below, yields a null pointer of that type, and I could dereference such pointer and read or assign a value to it. This still wouldn’t be interesting, but C defines very specific semantics for null pointers. I reached for the C99 specs (aka ISO/IEC 9899:1999), something I do very rarely, normally only when having a quarrel with someone on IRC, for example over portability to a hypothetical C environment. This is what is says about null pointers in 6.3.2.3.3 and 6.3.2.3.4:

3. An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant. If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
4. Conversion of a null pointer to another pointer type yields a null pointer of that type. Any two null pointers shall compare equal.

First point becomes false because we can now have an object or a function pointed to by a null pointer. Today I stumbled across a phrase in glibc man page for dlsym(3) that explicitly treats function pointers that equal NULL.

Further in ISO/IEC 9899:1999 in footnote 84) we read:

Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime.

so the dereference is incorrect but is accepted by the C implementations. This is the case for a whole lot of operations that people’s C programs use. Another whole lot of operations have unpredictable outcome according to the specs but are relied on by thousands of popular programs. So perhaps the main fun from reading the different C specs, if there is any, is not in what it defines as C but rather what it doesn’t define. In any language specification there are parts defined as implementation-specific but this isn’t the best part, the best part is what is not defined as correct (but we often assume it is by generalisation), thus implicitly defined as incorrect or unpredictable, and what weird scenarios that makes possible. For example by my understanding of 6.3.2 in C99, it is entirely possible for two variables of the same type, to compare unequal even directly after assigning value of one to the other:

T a, b;
...
a = b;
a != b;

The last comparison can be true every time T is a pointer type and b doesn’t point at anything in particular. More things you didn’t know about C for example here.

Update: The ABI specification is another document that has something to say about the C implementation you use and what things are legal for you and for the compiler to do. For example regarding null pointers the “AMD64 Architecture” supplement to “System V ABI, Edition 4″ has to say this:

A null pointer (for all types) has the value zero.

(…)

Programs that dereference null pointers are erroneous. although an implementation is not obliged to detect such erroneous behavior. Such programs may or may not fail on a particular system. To enhance portability, programmers are strongly cautioned not to rely on this behavior.

This makes things clearer but it doesn’t assure that a null pointer doesn’t point to any object of function.

I can see you gdb

December 1, 2007 by balrog

So, as soon as the gllin binary was released for download, I came up with an evil plan - will for sure blog about it after it is executed. But first (as part of usual preparation for an evil plan) I needed to find out whether in a normal program under Linux the heap is executable, or rather what section is executable and writable. While attempting this I made a funny and completely unuseful observation which I’m going to share with you now. Here’s the test program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static void sayhello(int world_number) {
        int local;
        static int stat;

        printf("Hello World %i! Local variable at %p and static at %p\n",
                        world_number, &local, &stat);
}

int main(int argc, char *argv[], char **envp) {
        void (*say[2])(int i) = { sayhello, malloc(0x1000) };

        memcpy(say[1], say[0], 0x50);
        say[0](0);
        say[1](1);
        return 0;
}

After “make hello” I have the ELF under ./hello and load it into gdb and inspect:

 $ gdb ./hello
GNU gdb 6.6
...
(gdb) break sayhello
Breakpoint 1 at 0x40068f: file hello.c, line 9.
(gdb) run
...
Breakpoint 1, sayhello (world_number=0) at hello.c:9
(gdb) up 
#1  ... in main (argc=1, argv=..., envp=...) at hello.c:17
(gdb) disassemble say[0] (say[0] + 15)
Dump of assembler code from 0x400684 to 0x400693:
0x0000000000400684 <sayhello+0>:        push   %rbp
0x0000000000400685 <sayhello+1>:        mov    %rsp,%rbp
0x0000000000400688 <sayhello+4>:        sub    $0x20,%rsp
0x000000000040068c <sayhello+8>:        mov    %edi,0xffffffffffffffec(%rbp)
0x000000000040068f <sayhello+11>:       lea    0xfffffffffffffffc(%rbp),%rdx
End of assembler dump.
(gdb) disassemble say[1] (say[1] + 15)
Dump of assembler code from 0x602010 to 0x60201f:
0x0000000000602010:     push   %rbp
0x0000000000602011:     mov    %rsp,%rbp
0x0000000000602014:     sub    $0x20,%rsp
0x0000000000602018:     mov    %edi,0xffffffffffffffec(%rbp)
0x000000000060201b:     int3
0x000000000060201c:     lea    0xfffffffffffffffc(%rbp),%edx
End of assembler dump.

You may now ask yourself the same question that I asked myself: WTF? or I may first explain what is happening above and you may ask the question then. We loaded the program into the debugger. The program was supposed to greet the world once and then call a copy of sayhello we made with memcpy(). We set a breakpoint at the start of the function and run the program. When it enters sayhello, it hits the break and we have a chance to look at the copy of the function. We step out of the sayhello frame so that we can access the say array. We disassemble the start of the original function and the start of its copy, and we see that they differ (!). Someone is messing in MY functions?! Or memcpy() is perhaps broken?!

No, it’s just gdb. When we set a breakpoint at sayhello it inserted the extra instruction (which I would have maybe recognised if I used x86 asm more often) to get notified in the right moment. We copied the function together with the breakpoint and we hit the original breakpoint. gdb then hid it from out eyes (first disassembly) but it didn’t know that we had secretly made a copy (second disassembly) and we now have a pretty little breakpoint of our own.

So what useful did we learn? Nothing really. That checksumming the program in runtime may sometimes work.

Good news is that memcpy() is fine and the world is safe. Pheww..

Don’t need no stinking themes

November 7, 2007 by balrog

I just noticed that there’s no way in GMail to change the look & feel or even the colours of the screen elements. Seeing how every recent UI, web-based or not, offers themes or skins, even for dead simple stuff like forums, somehow it struck me that GMail doesn’t have it and that I never noticed it, I think I always expected it to be hidden somewhere deep in the config options. Does that mean we don’t need themes for a successful UI? Maybe they are even a factor to make a UI less successful?

GMail is definitely regarded as a successful interface and is complimented a lot, together with Google’s picasaweb, googlemaps, etc. The stock look & feel is not bad but I guess it might not fit in with some people’s overly-themed desktops. Somehow though, it seems nobody misses the theming (I don’t). There’s a couple of Firefox add-ons called “GMail skins” but the ones I’ve seen don’t do what you would expect, they let you add or remove content from the main screen, but tend to stay in the limits google’s stock theme. Do themes stink? Clutter the UI? Did I overlook something?

Let’s see what GMail 2 and further new toys bring.

QPE 4.3.0 plus QEMU

October 28, 2007 by balrog

Trolltech GPLed and released its Qtopia Phone Edition 4.3.0 distro a couple of weeks ago, at the same time adding Neo1973 as a supported device. I had a look at the “Phone” part of the package and while I was never a fan of Qt, I like a number of things in qtopia design, although I have also tried running Qtopia on my phone and the interface was not terribly nice for a first time user. Like, on one hand I like the fact that they came up with a custom input method, avoiding getting into obscure deals with the popular T9 input method which is patented (which they could easily do). On the other hand though I couldn’t comfortably enter message text using this input method. The cool bit is that it’s now all open-source there’s no way back :)

Having for a short time been involved in the development of gsmd in OpenMoko what I like most in Qtopia and at the same time envy the most is that their phone services have a logical design, quite complete set of documentation and probably work. This last thing I haven’t verified but even if not, the logical design alone would be enough to make me happy, seeing how chaotic is gsmd development process. Gsmd has no documentation and also suffers from lack of maintainership which recently changed status to a presence of a very strange maintainership that makes contributing code very hard and probably leads to less progress than when there was no maintainer. Fortunately there’s recently enough work to be done in the GSM support in OpenMoko that doesn’t involve touching gsmd itself.

Qtopia’s phone part is nicely divided into services each of which supports plugins for adding support for exotic modems. The division is quite grain but no too grain and there are full tutorials for writing each type of plugins. The code is not so amusing but it’s quite complete with all standard features implemented, even those not present in any of the supported devices. I was particularly looking in qtopia for GSM multiplexing code and it was there and surprisingly it was written in C (all the rest of Qtopia being C++, making it not directly reusable in other projects) but it was quite ugly and suboptimal, so only useful for comparing the results. At the moment Qtopia doesn’t do multiplexing when running on the Neo1973, there is probably some reason for this and I’m suspecting it is in the Neo1973 hardware or kernel (the kernel’s not a part of Qtopia, it comes from OpenMoko).

What I found useful is development tools that come with QPE, two in particular. The first one is called phonesim and is used for testing the phone services. The second one is atinterface or “phone emulator”. Both tools idea is to simulate a modem which you can talk to using a standard AT command set, but they do it in different ways. Phonesim is strictly a developer tool, segfaults a lot and is supposed to run on the desktop, or wherever you’re coding, although it can run anywhere. It simulates a dummy GSM modem, you can run Qtopia or other tool that talks to a modem (QEMU, gsmd, gnokii) and make it connect to phonesim. Sometimes it will work and sometimes it won’t because phonesim understands just a minimum subset of standard AT (and some of GreenPhone’s modem’s proprietary commands), but is easily extensible. There’s an optional GUI through which you can simulate incoming calls, messages, data packets and more, but basically the GUI is the only source of events. Atinterface on the other hand runs on Qtopia and it takes events from QPE’s phone subsystem. It’s purpose is exposing a modem interface to a laptop or other devices so that they can send faxes or make data calls through a GreenPhone. The interface is hardware independent, i.e. the virtual modem presented by atinterface to your laptop will not depend on whether the QPE is running on a Neo1973 or GreenPhone or HTC. It’s also more standards compliant than the GSM subset emulated by phonesim, but to use it you will need a running QPE and its phone services.

Now what I wanted was tools for easy testing gsmd and/or OpenMoko running in QEMU. Connecting it all together is not exactly simple so I will explain here how to do that. So, we want to run gsmd or QEMU, and we want to use phonesim or atinterface as a virtual modem, so that we don’t have to use a physical modem because the physical modem is a lot of hassle (for example if it’s the Neo modem, it constantly runs out of battery), if you have one. While we’re at it I will also show how to use the physical modem of Neo1973 with a gsmd running on PC, it’s less hassle than testing gsmd on the phone.

We have two parts: a modem (physical or virtual) and a program (gsmd or QEMU). For the communication channel we choose a network socket because sockets are flexible and already supported in many places. For the modem we have three possibilities: 1. a phonesim virtual modem, 2. atinterface virtual modem, 3. a Neo1973 physical modem.

1. Phonesim supports sockets out of the box, so we just need to build and run it. I hacked up a phonesim version that can build outside a Qtopia tree and I included it in the qemu-neo1973 repo at svn.openmoko.org, to build it you only need to check out a recent qemu-neo1973, configure it with

$ ./configure --disable-system --disable-user --target-list=arm-softmmu --enable-phonesim && make

The command

$ (cd phonesim; LD_LIBRARY_PATH=lib ./phonesim -gui ../openmoko/neo1973.xml) &

runs phonesim. The -gui switch is optional. The GUI will only appear after a first client connects. Phonesim now listens on localhost port 12345 and is ready to accept clients. The neo1973.xml file defines a modem behavior resembling the Neo1973 modem (TI Calypso).

2. Atinterface is part of Qtopia, and requires Qtopia. I will not explain here how to build Qtopia. After you’ve built and installed it (I assume the default paths) you will need to first run QPE and then atinterface. For QPE to run you need a modem, we can use phonesim. You can use the phonesim build that comes with Qtopia, to do that run the following command:

$ bin/phonesim -gui src/tools/phonesim/troll.xml &
$ export QTOPIA_PHONE_DEVICE="sim:localhost"

Next, we’ll need to emulate a framebuffer on which QPE will display and then we can run QPE and atinterface:

$ bin/qvfb &
$ echo [SerialDevices] > etc/default/Trolltech/Phone.conf
$ echo ExternalAccessDevice=/dev/ttyS1:115200 >> etc/default/Trolltech/Phone.conf
$ image/bin/qpe &
$ image/bin/atinterface --test -qws

Ready, now we have atinterface listening on localhost:12350.

Phonesim on an OHand laptop

3. To make the Neo1973 modem accessible to a PC over USB we have several options. The u-boot gsm passthrough support turned out unreliable so we will boot Neo into linux, kill gsmd and run netcat:

# killall gsmd
# nc -l -p 5000 < /dev/ttySAC0 > /dev/ttySAC0

Voila, if the usb ethernet is configured (see OpenMoko wiki) , the modem is now listening at 192.168.0.202:5000.

Now we want to connect to our modem from the other side, gsmd or QEMU programs on the desktop. With QEMU the task is easy because it can connect to a socket directly: just append -serial tcp:localhost:12345 (in the phonesim case, in other cases tcp:localhost:12350 or tcp:192.168.0.202:5000) and you should see the system running inside QEMU connect to a GSM network and be operational. Remember that if you haven’t configured QEMU with –enable-phonesim, it uses a builtin modem emulator based on gnokii (yes, yet another virtual phone), which needs to be disabled.

With gsmd the problem is that it wants a character device to connect to, rather than a socket. We will emulate a character device using a tiny program I uploaded to qemu-neo1973 svn yesterday. It will make a pseudo-terminal pair (pty stands for Pseudo-Terminal. Has anyone wondered, if tty is for Tele-Typewriter, why pty is not Pseudo-Typewriter?) and connect the master to the socket. If you’ve checked out qemu-neo1973 and you’re inside the source directory, do:

$ make pty
gcc-3.3.6 -Wall -O2 -g -fno-strict-aliasing -I. -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -g  pty.c   -o pty
$ ./pty localhost 12345
/dev/pts/12

The pty program connected to the modem (change the hostname/port pair accordingly) and it told us that it created a character device /dev/pts/12 to which gsmd can now connect (qemu can also connect to character devices). The device will exist as long as pty is running.

$ /usr/local/sbin/gsmd -p /dev/pts/12 -s 115200 -v ti -m generic

That should be it. Now you can launch a program like openmoko-dialer that uses gsmd and hack away. Phonesim has a nice GUI (alas Qt…) from which you can observe the AT communication with your program.

Moscow and how to not go there

October 21, 2007 by balrog

Last Sunday I arrived home from my first trip to the east (not south-east), to the bigger brother, Russia and particularly Moscow.

For getting there I decided for an unusual Warsaw-Moscow route through Kaliningrad to save on time and costs, but ofcourse my implementation of the plan in practice was neither cheap nor quick, although I still think the plan was good (and original) and it could work out if I had reserved slightly more time buffer. I ended up taking four trains to get to Kaliningrad, then a bus to the Khrabovo Airport and a KD-Avia flight to Moscow Domodedovo. Each of the four trains had a delay from the schedule so I missed my bus from Olsztyn to Kaliningrad (would be much cheaper than train) and my Aeroflot flight from there (with an Asian-Vegetarian on-board meal I chose from the list of like twenty types of food that the web-app presented me with when buying the ticket online).

On the Braniewo-Kaliningrad train I was sitting next to a guy from Ghana who spoke neither Polish nor Russian and everyone believed that he speaks English, which was not entirely true, and it turned out I was the only person to understand English on the train so I was automatically assumed to have to help him get through the border. On cheap country-border-crossing routes like this I find that some 90% of the passengers are smugglers who, when you meet them, are making the route for some third time on this day and they form something which I believe you can very well call a community, they treat every other passenger on the train as their co-workers and they see many of the same people a couple of times every day. Crossing the border is the routine for them and talking to the exceptional tourists and the border zone security officers is their diversion. So I was seen obliged to fill in the Dominico’s immigration forms and other documents which he only limited himself to tell me that he couldn’t do because he was too confused. He was speaking something between French and English but this language was evidently not his native one because he couldn’t express many things in it (apart from my understanding or not). Dominico posessed a valid Ghanaian passport, no Russian visa and a Belgium residence card which the officers deemed a fake (as I understand - they were speaking Russian and me Polish) but I’m not sure if that was based on any reasoning or just their guess. In the end for the sake of my own getting through the border I had to invent the purpose of Dominico’s visit in Russia and his legal situation and the story behind it because I was unable to get this information from himself, and the train was already delayed by over three hours at this point. The enterprise succeeded and nobody in the carriage had been forced out.

In Moscow I stayed mainly in a residence hall room of Kate who I had last seen in France and of her roommate Nastya (diminutive for Anastasia..). Most Moscow residence halls employ various techniques to prevent strangers from entering the building so the way I entered and parted the 2nd floor room every day (in Russia floors count from 1st floor which is the ground level) was through the window. Everyday in the evening I would climb up to the window clinging to the tube installations on the wall and the bars on the 1st floor windows and then knock the glass and if it was the right window, the person inside would open it and very quickly let you in. This procedure also had further complications due to issues like room assignment and people in some rooms being out of home this day so actually almost every day you had to use a different room in the morning to get out and in the evening get in but seemingly it is something completely normal for the people of the dormitory, nobody is even mildly surprised when a person knocks at their window, enters the room, says hello and immediately after walks out through the door to the corridor. When there’s a party in a 3rd or 4th floor room the guests will use one of the 2nd floor rooms to leave the building after the party finishes at some 3 am. We spent two nights in a Kate’s friend’s friend’s private flat in the outskirts of Moscow when there were problems with the dorm. In Moscow the university residence halls generally don’t have a free internet service in every room (and there’s no WiFI in range in most places, especially not in a dormitory of a non-technical university like MGOU where Kate studies) which surprised me a lot because I remember that in Warsaw in the 90s when having an internet cable in your private home was still very uncommon and the only way most people could connect was through a 56K dial-up modem, you would usually go to a friend in a univeristy dorm to download the latest movies and get music from P2P. Dorms were one of the first places to have a true cable connection and still today use to have the widest bands.

In consequence a big number of the “cost effective” Moscow tourists go daily to one of the McDonald’s in the centre to use internet. There’s one that offers free WiFi for everyone and when you go there you see at every second table (and even outside in the street when it’s not raining) someone sitting with a laptop surfing away and some people are known to spend hours there. The staff in this particular McDonald’s is extremely tolerant.

I visited most of the main touristic sites in Moscow, and many non-touristic attractions that you don’t find in the guidebooks, thanks to the excelent guidance by my host.

I have seen the dead tovarish Lenin and I take the side of the part of population that thinks the body is genuine and not a plastic replica, but I’m not 100% sure, it might be a fake. Moscow in general was very impressive even despite the very bad weather in which I had to appreciate it.

I was slightly disappointed by the gastronomic offering of Moscow but I didn’t have time to get to know it very well (and in my judgements I only consider the part that has a reasonable economic aspect and I’m very low tolerant to pricing (cf. greedy)). If I was to recommend one place for general eating out it would be the Solecito Italian restaurant (pizza dlya gurmanov!) in Nikolskaya street, and for Russian food the canteens at university departments (but then the access to the buildings is restricted to non-students or those who don’t know the cunning tricks to get in, which is not so difficult).

I went with Kate to some of her classes during this week which was interesting because I speak no Russian and they were all in Russian. I was immediately being noticed also because almost all of linguistics students are females at MGOU so in many clases I was the only male and the only person who had a laptop on during the lectures.

We went to a see a movie in the Moscow’s Iluzyon cinema that plays one French movie in French, no subtitles, every month. It was A nos amours, a 1975 production. Later when Nastya was explaining to us the plot, we learnt that we don’t know French sufficiently for watching movies, yet.

The coming back to Poland was again interesting but this time I decided to take the route that most back-packers take, to really cut down on costs this time, which partially worked out. The plan was to take a Moscow-Brest platskart train, cross the Belarusian - Polish border on a Brest-Terespol bus and take a normal polish train from Terespol on. The Brest-Terespol buses go rarely but there are short elektrichka trains that are only slightly more expensive and carry immensely more colourful adventures involving the said smugglers community. First thing was just when I appeared before the customs clearance office door someone ran up to me and asked if I had any cigarettes in my bag and if not, whether I could traffic one box for her (one big box is a legal quantity). In the black plastic bag that I was assigned by this person I later found out was also a bottle. My conversation with the customs officer was along these lines:

- Do you carry any alcohol?

- Yes, one bottle, Sir.

- What alcohol is it?

- No idea, Sir.

- Cigarettes?

- One box, Sir.

- A paczka [Polish for "box"] or a sztanga? [WTF is a sztanga of cigarettes?]

- A box I suppose, Sir.

I was let in, despite the lack of registration in Moscow, and then on the train I started reading a book. When the train was already running, suddenly a lady climbed one of the tables in the wagon and to my exclusive surprise ripped off a piece of the casing of the ceiling and stuffed a number of big, black, flat packets into the hole. This is when I very carefully produced camera and started recording. While she was doing this three persons of the railway staff were passing through the wagon and again to my surprise, they were completely not interested. Also later, just before stopping for clearance in Terespol I am pretty sure I saw someone run out of a rye field by the rail track and collect some objects that must have been thrown from the train, and immediately run away afterwards.

On a Brest-Terespol train

Note: the registration is a requirement of Russian immigration law that states that if you’re staying three days or longer, you should go to a nearest police office with the owner of the place where you’re staying and have the officer put a stamp with the address in your passport - this procedure takes five days to complete. Hostels, however, will do this for you if you’re staying at a hostel, so many people will go to a hostel before leaving Russia and pay some nights at the hostel to get the required stamp (theoretically you can be asked by police to show the registration stamp any day even if you’re just walking around a city, but the risk is very low). It turns out thought that sometimes you will not be asked for the registration at the border at all, when leaving Russia, and even if you are asked for it, the fine (cf. bribe) is less than the cost of five nights at a hostel so this way is often recommended. (I’m not entitled to give legal advice so don’t take this as an advice.)

So this was my first time in Terespol and while waiting for the train home I had a short walk through the town. I noticed that it is a very little town much smaller than I expected. Later I was wondering why I expected Terespol to not be a little town like this, or why I expected anything from Terespol at all. The answer may be that my expectation came from the board-game Monopoly. In its Polish version, that I played when I was very little, I vaguely remember the main hotels or train stations or whatever it was (say hotels) were named something like Hotel Paris, Hotel Warsaw, Hotel Vilnus and Hotel Terespol. And this is I think particularly the only place I had ever seen the name Terespol before so I imagined it to be something important but it actually has a one main street and a couple smaller streets on its sides, a supermarket and a computer shop next to a church with a tower with a clock with no hands (but otherwise a very nice clock!) on each of the tower’s four sides. I will need to check it.

Pictures from the whole trip here.