Yesterday I played more with executables under Linux and was introduced to linker scripts. Soon it turned out that with a very simple linker script you can have a section starting at vma 0 (zero) in a non-relocatable binary. This wouldn’t be any interesting, but address 0 is where the NULL constant points. So now, if the section was readable and/or writable I could cast NULL to any pointer type, which, as we read below, yields a null pointer of that type, and I could dereference such pointer and read or assign a value to it. This still wouldn’t be interesting, but C defines very specific semantics for null pointers. I reached for the C99 specs (aka ISO/IEC 9899:1999), something I do very rarely, normally only when having a quarrel with someone on IRC, for example over portability to a hypothetical C environment. This is what is says about null pointers in 6.3.2.3.3 and 6.3.2.3.4:
3. An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant. If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
4. Conversion of a null pointer to another pointer type yields a null pointer of that type. Any two null pointers shall compare equal.
First point becomes false because we can now have an object or a function pointed to by a null pointer. Today I stumbled across a phrase in glibc man page for dlsym(3) that explicitly treats function pointers that equal NULL.
Further in ISO/IEC 9899:1999 in footnote 84) we read:
Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime.
so the dereference is incorrect but is accepted by the C implementations. This is the case for a whole lot of operations that people’s C programs use. Another whole lot of operations have unpredictable outcome according to the specs but are relied on by thousands of popular programs. So perhaps the main fun from reading the different C specs, if there is any, is not in what it defines as C but rather what it doesn’t define. In any language specification there are parts defined as implementation-specific but this isn’t the best part, the best part is what is not defined as correct (but we often assume it is by generalisation), thus implicitly defined as incorrect or unpredictable, and what weird scenarios that makes possible. For example by my understanding of 6.3.2 in C99, it is entirely possible for two variables of the same type, to compare unequal even directly after assigning value of one to the other:
T a, b;
...
a = b;
a != b;
The last comparison can be true every time T is a pointer type and b doesn’t point at anything in particular. More things you didn’t know about C for example here.
Update: The ABI specification is another document that has something to say about the C implementation you use and what things are legal for you and for the compiler to do. For example regarding null pointers the “AMD64 Architecture” supplement to “System V ABI, Edition 4” has to say this:
A null pointer (for all types) has the value zero.
(…)
Programs that dereference null pointers are erroneous. although an implementation is not obliged to detect such erroneous behavior. Such programs may or may not fail on a particular system. To enhance portability, programmers are strongly cautioned not to rely on this behavior.
This makes things clearer but it doesn’t assure that a null pointer doesn’t point to any object of function.