Question

I'm having trouble finding appropriate documentation for the problem I'm having generating consistent HMACs in the kernel and user space. According to Robert Love in Linux Kernel Development, the Memory Descriptors mm->start_code and mm->end_code are supposed to contain the .text segment. Finding the .text segment in a static executable is well defined in the ELF documentation and is easy to get at. So, given the following two code snippets, one would expect to get a matching HMAC:

Kernel:

__mm = get_task_mm(__task);

__retcode = ntru_crypto_hmac_init(__crypto_context);
if(__retcode != NTRU_CRYPTO_HMAC_OK)
    return 1;

__retcode = ntru_crypto_hmac_update(__crypto_context, (const uint8_t*)__mm->start_code, 
                                    __mm->end_code - __mm->start_code);
if(__retcode != NTRU_CRYPTO_HMAC_OK)
   return 1;

__retcode = ntru_crypto_hmac_final(__crypto_context, __hmac);
if(__retcode != NTRU_CRYPTO_HMAC_OK)
    return 1;

return 0;

Userland:

for (j = 0; j < file_hdr32.e_shnum; j++)
{
   if (!strcmp(".text", strIndex + section_hdr32[j]->sh_name))
   {
       retcode = ntru_crypto_hmac_init(__crypto_context());
       if(retcode != NTRU_CRYPTO_HMAC_OK)
       {
            syslog(LOG_ERR, "ntru_crypto_hmac_init error: retcode = %d, TID(0x%lx)",
                                     retcode,pthread_self());
            return 0;
        }       

       retcode = ntru_crypto_hmac_update(__crypto_context(), 
                 filebuf + section_hdr32[j]->sh_offset, section_hdr32[j]->sh_size);
       if(retcode != NTRU_CRYPTO_HMAC_OK)
       {
           syslog(LOG_ERR, "Internal crypto error (%d)", retcode);
           return 0;
       }

       retcode = ntru_crypto_hmac_final(__crypto_context(), _hmac);
       if(retcode != NTRU_CRYPTO_HMAC_OK)
       {
           syslog(LOG_ERR, "Failed to finalize HMAC, TID(0x%lx)", pthread_self());
           return 0;
       }

       return 1;
   }
}

In both cases the .text segment is exactly where it's documented to be but they never match. I've generated userland HMACs for all 17,000 executable files on the system so even if the code segment in the kernel memory descriptor were pointing to a dependency, rather than the primary executable, I still should get a match. But no dice. There's something fundamentally different between the two .text segments and I was wondering if anyone out there knew what it was so I can save some time--any clues?

Was it helpful?

Solution

There's something fundamentally different between the two ".text" segments

Your problem is that you are ignoring the difference between segments and sections.

The ELF format is an executable and linking format. Segments are used for the former, sections for the latter (and linking here means static linking, i.e. build-time). Once the binary is linked, sections can be completely discarded from it, and only segments are needed at runtime. Segments are mmaped, not sections.

Now let's look at the difference between the two.

readelf -l /bin/date

Elf file type is EXEC (Executable file)
Entry point 0x402000
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001f8 0x00000000000001f8  R E    8
  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x000000000000d5ac 0x000000000000d5ac  R E    200000
  LOAD           0x000000000000de10 0x000000000060de10 0x000000000060de10
                 0x0000000000000440 0x0000000000000610  RW     200000
  DYNAMIC        0x000000000000de38 0x000000000060de38 0x000000000060de38
                 0x00000000000001a0 0x00000000000001a0  RW     8
  NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254
                 0x0000000000000044 0x0000000000000044  R      4
  GNU_EH_FRAME   0x000000000000c700 0x000000000040c700 0x000000000040c700
                 0x00000000000002a4 0x00000000000002a4  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     8
  GNU_RELRO      0x000000000000de10 0x000000000060de10 0x000000000060de10
                 0x00000000000001f0 0x00000000000001f0  R      1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.ABI-tag .note.gnu.build-id
   06     .eh_frame_hdr
   07
   08     .ctors .dtors .jcr .dynamic .got

Above you can see that multiple sections (.interp, .note.ABI-tag, ... .text, ...) all got mapped into a single PT_LOAD segment. All these sections have the same protections, and all are "covered" by a single [mm->start_core, mm->end_code) region.

Compare this to the .text section:

readelf -WS /bin/date | grep '\.text'
  [13] .text             PROGBITS        0000000000401900 001900 0077f8 00  AX  0   0 16

You'll note that the section is smaller and begins at a different offset.

No wonder you get different HMAC then. Try computing HMAC in user-land over segments, and you should get a match.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top