Linux Mystery: linux-vdso.so.1

Everytime I compile and link a Linux binary, I see a dynamically linked library called linux-vdso.so.1.

[austin@localhost]$ cat hello.c
#include <stdio.h>

int main(int argc, char *argv[]) {
    printf("Hello, World!\n");
    return 0;
}
[austin@localhost]$ gcc -o hello hello.c
[austin@localhost]$ ldd hello
	linux-vdso.so.1 (0x00007ffee39dd000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f91baaec000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f91bad03000)

The libc and ld-linux I know about, but who is this linux-vdso? There is a man page that describes it.

The  "vDSO" (virtual dynamic shared object) is a small shared library that the 
kernel automatically maps into the address space of all user-space applications.  
Applications usually do not need to concern themselves with these details as the 
vDSO is most commonly called by the C library.  

...

Why  does  the  vDSO exist at all?  There are some system calls the kernel provides 
that user-space code ends up using frequently, to the point that such calls can 
dominate overall performance.  This is due both to the frequency of the call as 
well as the context-switch overhead that results from exiting user space and 
entering the kernel.

The rest of this documentation is geared toward the curious and/or C library writers 
rather than general developers.  If you're trying to call the vDSO in your own 
application rather than using the C library, you're most likely doing it wrong.

Huh. Neat. We are bringing in kernel functionality into user space in the form of shared object.

The man page goes on to describe that making system calls is expensive because we need to do a context switch to the kernel and back and there are some system calls that could really just be implemented as user space functions and it would save us a lot of time.

This seemed suspicious to me. Isn’t the point of a system call to clearly distinguish between user space code and kernel code? What system calls am I now bringing into my user space?

Well it’s actually just four syscalls (on x86-64).

To verify, I dumped the memory contents of the process where the vdso exists.

[austin@localhost]$ gdb -q ./hello   
Reading symbols from ./hello...
(No debugging symbols found in ./hello)
(gdb) b main
Breakpoint 1 at 0x1149
(gdb) r
Starting program: /home/austin/projects/elf-collection/hello 


Breakpoint 1, 0x0000555555555149 in main ()
(gdb) info proc map
process 132181
Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
      0x555555554000     0x555555555000     0x1000        0x0 /home/austin/projects/elf-collection/hello
      0x555555555000     0x555555556000     0x1000     0x1000 /home/austin/projects/elf-collection/hello
      0x555555556000     0x555555557000     0x1000     0x2000 /home/austin/projects/elf-collection/hello
      0x555555557000     0x555555558000     0x1000     0x2000 /home/austin/projects/elf-collection/hello
      0x555555558000     0x555555559000     0x1000     0x3000 /home/austin/projects/elf-collection/hello
      0x7ffff7db4000     0x7ffff7db6000     0x2000        0x0 
      0x7ffff7db6000     0x7ffff7ddc000    0x26000        0x0 /usr/lib/x86_64-linux-gnu/libc-2.32.so
      0x7ffff7ddc000     0x7ffff7f49000   0x16d000    0x26000 /usr/lib/x86_64-linux-gnu/libc-2.32.so
      0x7ffff7f49000     0x7ffff7f95000    0x4c000   0x193000 /usr/lib/x86_64-linux-gnu/libc-2.32.so
      0x7ffff7f95000     0x7ffff7f96000     0x1000   0x1df000 /usr/lib/x86_64-linux-gnu/libc-2.32.so
      0x7ffff7f96000     0x7ffff7f99000     0x3000   0x1df000 /usr/lib/x86_64-linux-gnu/libc-2.32.so
      0x7ffff7f99000     0x7ffff7f9c000     0x3000   0x1e2000 /usr/lib/x86_64-linux-gnu/libc-2.32.so
      0x7ffff7f9c000     0x7ffff7fa2000     0x6000        0x0 
      0x7ffff7fc8000     0x7ffff7fcc000     0x4000        0x0 [vvar]
      0x7ffff7fcc000     0x7ffff7fce000     0x2000        0x0 [vdso]
      0x7ffff7fce000     0x7ffff7fcf000     0x1000        0x0 /usr/lib/x86_64-linux-gnu/ld-2.32.so
      0x7ffff7fcf000     0x7ffff7ff3000    0x24000     0x1000 /usr/lib/x86_64-linux-gnu/ld-2.32.so
      0x7ffff7ff3000     0x7ffff7ffc000     0x9000    0x25000 /usr/lib/x86_64-linux-gnu/ld-2.32.so
      0x7ffff7ffc000     0x7ffff7ffd000     0x1000    0x2d000 /usr/lib/x86_64-linux-gnu/ld-2.32.so
      0x7ffff7ffd000     0x7ffff7fff000     0x2000    0x2e000 /usr/lib/x86_64-linux-gnu/ld-2.32.so
      0x7ffffffde000     0x7ffffffff000    0x21000        0x0 [stack]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0 [vsyscall]
(gdb) dump binary memory vdso.so 0x7ffff7fcc000     0x7ffff7fce000
(gdb) q

It’s really just an ELF so file!

[austin@localhost]$ file vdso.so 
vdso.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=112feb4b14e301806a8eafdcdd804c88bfa191d8, stripped

And here are the functions, as expected:

[austin@localhost]$ objdump -T vdso.so

vdso.so:     file format elf64-x86-64

DYNAMIC SYMBOL TABLE:
0000000000000bc0  w   DF .text	0000000000000005  LINUX_2.6   clock_gettime
0000000000000b80 g    DF .text	0000000000000005  LINUX_2.6   __vdso_gettimeofday
0000000000000bd0  w   DF .text	0000000000000060  LINUX_2.6   clock_getres
0000000000000bd0 g    DF .text	0000000000000060  LINUX_2.6   __vdso_clock_getres
0000000000000b80  w   DF .text	0000000000000005  LINUX_2.6   gettimeofday
0000000000000b90 g    DF .text	0000000000000029  LINUX_2.6   __vdso_time
0000000000000b90  w   DF .text	0000000000000029  LINUX_2.6   time
0000000000000bc0 g    DF .text	0000000000000005  LINUX_2.6   __vdso_clock_gettime
0000000000000000 g    DO *ABS*	0000000000000000  LINUX_2.6   LINUX_2.6
0000000000000c30 g    DF .text	0000000000000025  LINUX_2.6   __vdso_getcpu
0000000000000c30  w   DF .text	0000000000000025  LINUX_2.6   getcpu

It seems strange that the output of ldd doesn’t show an .so file on the disk to dynamically load but this is coming from the kernel and disk files are more of a user land thing.

Let’s make sure we aren’t making the syscall by actually using one of those vdso functions.

[austin@localhost]$ cat hello-vdso.c
#include <stdio.h>
#include <sys/time.h>

int main(int argc, char *argv[]) {
    struct timeval t;
    gettimeofday(&t, NULL);
    printf("Seconds: %lu\n", t.tv_sec);
    return 0;
}

We can use strace to see what syscalls are being made. We should not see the gettimeofday syscall in this case.

[austin@localhost]$ strace ./hello-vdso 2>&1 | grep "gettimeofday\|write"
write(1, "Seconds: 1619316657\n", 20Seconds: 1619316657

We must be using the VDSO version. My next question is to see if I can force the syscall to happen. I could not find a GNU linker option to turn it off. You can turn it off system wide using various kernel options, but not “per application” at link time. I also thought linking statically would do it (since the VDSO shows up in ldd) but even that didn’t work. The kernel/glibc really want to make sure I’m using the optimized version!

I was able to do a junk hack to make it happen by statically linking and then using a hex editor to mangle the string __vdso_gettimeofday to make it think that the VDSO version was never loaded.

[austin@localhost]$ strace ./hello-vdso 2>&1 | grep "gettimeofday\|write"
gettimeofday({tv_sec=1619316899, tv_usec=828106}, NULL) = 0
write(1, "Seconds: 1619316899\n", 20Seconds: 1619316899

This was kind of a dumb experiment, but it was a good way to learn about how the kernel and user land interact in ways that most people don’t think about too hard.