Perl hacking I: PEEK & POKE & XSUB

~ isra
2023-10-24 (updated 2023-10-26)

[ intro ]


By default Perl does not provide builtin functions for accessing and modifying raw memory content. However, some tricks can be used to alter the internal representation of variables and achieve the copy of arbitrary data into a mapped memory address. This allows to inject assembly code in memory and then trick Perl to execute it as if it were an external subroutine. This article describes such implementation for Linux x86_64 based on [1].

UPDATE: a simpler mechanism for POKE was suggested after this article was published (added in part 5).

A plain text version of this article can be found here.


[ part 1: Perl internal data types ]


Perl has three main data types: Scalar Value (SV), Array Value (AV), and Hash Value (HV), along with a special typedef for integer values (IV) which is guaranteed to be long enough to hold a pointer. SVs can also hold various types of values including integer values (IV) and strings (PV). In this case PV stands for "Pointer Value" which is a pointer to a string, but it can also point to other things according to Perl's documentation[2].

The internal structure of an SV is defined in sv.h as follows:


#define _SV_HEAD(ptrtype) \
    ptrtype sv_any;   /* pointer to body */ \
    U32   sv_refcnt;  /* how many references to us */ \
    U32   sv_flags  /* what we are */

[...]

#define _SV_HEAD_UNION \
    union {       \
        char*   svu_pv;   /* pointer to malloced string */  \
        IV      svu_iv;     \
        UV      svu_uv;     \
        _NV_BODYLESS_UNION    \
        SV*     svu_rv;   /* pointer to another SV */   \
        SV**    svu_array;    \
        HE**  svu_hash;   \
        GP* svu_gp;     \
        PerlIO *svu_fp;     \
    } sv_u        \

[...]

struct STRUCT_SV {    /* struct sv { */
    _SV_HEAD(void*);
    _SV_HEAD_UNION;
};
            

As shown above, the head of an SV has a ptrtype field 'sv_any' which points to its body and two U32 fields 'sv_refcnt' and 'sv_flags' to keep track of how many references point to it and its corresponding flags, respectively. An internal representation of an SV can also be examined by using Devel::Peek's Dump function as follows:


 $ perl -MDevel::Peek -e 'my $a=42;Dump $a'

 SV = IV(0x557bda4d4810) at 0x557bda4d4820
   REFCNT = 1
   FLAGS = (IOK,pIOK)
   IV = 42
            

The output above shows an SV with reference count 1, two flags (IOK,pIOK) and the integer value 42. Similarly:


 $ perl -MDevel::Peek -e 'my $s="42";Dump $s'

 SV = PV(0x55b493eeeea0) at 0x55b493f1b490
   REFCNT = 1
   FLAGS = (POK,IsCOW,pPOK)
   PV = 0x55b493f24390 "42"\0
   CUR = 2
   LEN = 10
   COW_REFCNT = 1
            

The ouput above shows an SV (or SvPV) with reference count 1, three flags (POK, IsCOW, pPIOK) and a PV pointing to the address 0x55b493f24390 where the string "42" is stored. A more elaborated example:


 $ perl -MDevel::Peek -e 'my $s="42";my $x=\$s+0;Dump $s;Dump $x'

 SV = PV(0x5650de3d5ea0) at 0x5650de402828
   REFCNT = 1
   FLAGS = (POK,IsCOW,pPOK)
   PV = 0x5650de40b5c0 "42"\0
   CUR = 2
   LEN = 10
   COW_REFCNT = 1
 SV = IV(0x5650de402830) at 0x5650de402840
   REFCNT = 1
   FLAGS = (IOK,pIOK)
   IV = 94905326118952
            

In this case $x holds an IV with the memory address of $s. This can be verified by looking at the hexadecimal value of 94905326118952:


 $ perl -e 'printf("0x%x\n", 94905326118952)'

 0x5650de402828
            

[ part 2: PEEK ]


To peek at an SV the script at [1] uses the unpack[3] function with the "P" template which defines "A pointer to a structure (fixed-length string)". The first step then is to create a dummy string of size $len (the size of payload) and obtain the memory address of the associated SvPV.


 my $dummy = 'X' x $len;
 my $dummy_addr = \$dummy + 0;
          

Next, a pointer to a structure of size '8 + 4 + 4 + $Config{ivsize}' is obtained from the memory address of the dummy SvPV:


 my $size = 8 + 4 + 4 + $Config{ivsize};
 my $ghost_sv_contents = unpack("P".$size, pack("Q", $dummy_addr));
          

The size 8 + 4 + 4 + $Config{ivsize} refers to the following:

  • 8 bytes for the sv_any pointer
  • 4 bytes for the reference count
  • 4 bytes for the flags
  • $Config{ivsize} bytes for the size of an IV (usually 8)
For instance, consider the following code:


 use Config;
 use Devel::Peek;

 my $dummy = 'X' x 10;
 my $dummy_addr = \$dummy + 0;
 my $size = 8 + 4 + 4 + $Config{ivsize};
 my $ghost_sv_contents = unpack("P".$size, pack("Q", $dummy_addr));

 Dump $dummy;

 my $sv_any = substr($ghost_sv_contents, 0, 8);
 my $refcnt = substr($ghost_sv_contents, 8, 4);
 my $flags  = substr($ghost_sv_contents, 12, 4);
 my $pv_addr = substr($ghost_sv_contents, 16, $Config{ivsize});
 printf("sv_any: 0x%x\n", unpack("Q", $sv_any));
 printf("refcnt: %d\n", unpack("L", $refcnt));
 printf("flags: %b\n", unpack("L", $flags));
 printf("PV addr: 0x%x\n", unpack("Q", $pv_addr));
          

Then:


 $ perl peek.pl
 SV = PV(0x55c9e6044ea0) at 0x55c9e60718a8
   REFCNT = 1
   FLAGS = (POK,pPOK)
   PV = 0x55c9e607c6b0 "XXXXXXXXXX"\0
   CUR = 10
   LEN = 12
 sv_any: 0x55c9e6044ea0
 refcnt: 1
 flags: 100010000000011
 PV addr: 0x55c9e607c6b0
          

In the example above $sv_any holds the memory address of the SvPV and $pv_addr holds the memory address pointed by the PV where the string "XXXXXXXXXX" is stored. This can be illustrated as follows:


       (0x55c9e60718a8)
 __________________________     
|sv_any     0x55c9e6044ea0 | -----> (0x55c9e6044ea0)
|__________________________|     _____________________
|sv_refcnt  1              |    |PV    0x55c9e6044ea0 |----> (0x55c9e607c6b0)
|__________________________|    |_____________________|      ________________
|sv_flags   100010000000011|            ...                 |  "XXXXXXXXXX"  |
|__________________________|                                |________________|
           ....
          

[ part 3: POKE ]


To poke a memory address $addr of choice the dummy SvPV structure is first modified by overriding its last 8 bytes (which hold the address pointed by the PV). Then the module B is used to create a new B::PV object based on the modified SvPV structure and finally the new SvPV is dereferenced to modify the contents of the string pointed by the PV:


 substr( $ghost_sv_contents, 8 + 4 + 4, $Config{ivsize} ) = $addr;

 my $ghost_string_ref = bless( \ unpack(
    "Q",
    do { no warnings 'pack'; pack( 'P', $ghost_sv_contents.'' ) },
  ), 'B::PV' )->object_2svref;

 eval 'substr($$ghost_string_ref, 0, $len) = $bytes';
          

To verify the above the mmap[4] syscall can be used to map a new area in memory and copy a given payload into it (in this case a string). Consider the following code:


 use B;
 use Config;
 use 5.008001;
 use Devel::Peek;

 sub mmap {
     my ($addr, $size, $protect, $flags) = @_;
     my $ret = syscall(9, $addr, $size, $protect, $flags, -1, 0);
     return $ret;
 }

 sub poke {
     my($location, $bytes, $len) = @_;
     my $addr = pack("Q", $location);

     my $dummy = 'X' x $len;
     my $dummy_addr = \$dummy + 0;
     my $size = 8 + 4 + 4 + $Config{ivsize};
     my $ghost_sv_contents = unpack("P".$size, pack("Q", $dummy_addr));
     substr( $ghost_sv_contents, 8 + 4 + 4, $Config{ivsize} ) = $addr;    

     my $ghost_string_ref = bless( \ unpack(
         "Q",
         do { no warnings 'pack'; pack( 'P', $ghost_sv_contents.'' ) },
     ), 'B::PV' )->object_2svref;
     eval 'substr($$ghost_string_ref, 0, $len) = $bytes';
     Dump $$ghost_string_ref;
     return $len;
 }

 my $payload = "japh";
 my $ptr = mmap(0, length($payload), 3, 33);
 if($ptr == -1) {
     print "Failed to map memory\n";
     exit;
 }

 printf("Using memory address 0x%x\n", $ptr);
 poke($ptr, $payload, length($payload));
          

In the code above the value 0 is used in mmap() as the $addr parameter to let the system choose a start address for the new mapped area. The values 3 and 33 are used for $protect and $flags parameters based on the following:

  • PROT_READ | PROT_WRITE = 3 - to allow writing into the mapped area
  • MAP_SHARED | MAP_ANONYMOUS = 33 - to avoid the use of files
mmap() will return the starting address of the mapped area on success and -1 on failure. Then when running the script:


 $ perl peekpoke.pl

Using memory address 0x15239f7b2000
SV = PV(0x55ceffa3dff0) at 0x55ceffb90a10
  REFCNT = 2
  FLAGS = (POK,pPOK)
  PV = 0x15239f7b2000 "japh"\0
  CUR = 4
  LEN = 10
          

In the example above a new memory area of size length($payload) is mapped (with write permissions) and the pointer 0x15239f7b2000 is returned by mmap as the start of the new area. Then poke() is used to copy $payload into such memory area which is pointed by the PV of $$ghost_string_ref.


[ part 4: XSUB ]


POKE can be used to copy more interesting things into a mapped memory area, such as strings containing assembly code. Such code can then be executed using Perl's DynaLoader[5] module which "Dynamically load C libraries into Perl code". To do this the protection of the mapped memory area needs to be updated first to allow execution. This can be done with the mprotect[6] syscall as follows:


 sub mprotect {
     my ($addr, $size, $protect) = @_;
     my $ret = syscall(10, $addr, $size, $protect);
     return $ret;
 }
 if(mprotect($ptr, length($payload), 5) == -1) {
     print "Failed to update memory protection\n";
     exit;
 }
          

In the code above $addr specifies the start address of the mapped area to be updated and $protect defines the new value for the memory protection, which in this case is:

  • PROT_READ | PROT_EXEC = 5 - to allow execution of the mapped area
mprotect() will return 0 on success and -1 on failure. The next step is using dl_install_xsub() which creates a new Perl external subroutine based on the parameters $perl_name and $symref and returns a reference to the "installed function":


 my $func = dl_install_xsub($perl_name, $symref [, $filename])
          

$symref is expected to be a pointer to the function which implements the routine to be installed, however, a pointer to the payload copied into memory can be used instead to obtain a function reference for execution:


 my $func = DynaLoader::dl_install_xsub(
        "_japh", # not really used
        $ptr, 
        __FILE__ # no file
 );
 # dereference and execute
 &{$func};
          

To try it out a simple payload for calling execve with "/usr/bin/id" will be considered:


 BITS 64
 global main
 section .text
 main:
    call run
    db "/usr/bin/id", 0x0

 run:
    ;;;;;;;;;;;;;;;;;;;;;;;;;
    ; call id
    ;;;;;;;;;;;;;;;;;;;;;;;;;
    pop rsi
    pop rsi

    xor rax, rax
    lea rdi, [rsi]
    
    ; argv
    ; ["/usr/bin/id"]
    push 0
    push rdi          ; "/usr/bin/id"
    mov rsi, rsp 

    ; execve & exit
    xor rax, rax
    mov rax, 59
    mov rdx, 0
    syscall
    pop rsi
    xor rdx, rdx
    mov rax, 60
    syscall
          

Then the final code is as follows (using the hexadecimal representation of the previous payload):


 use B;
 use Config;
 use 5.008001;
 use DynaLoader;
 use Devel::Peek;

 sub mmap {
     ...
 }
 sub mprotect {
     ...
 }
 sub poke {
     ...
 }

 my $payload = "";
 $payload .= "\xe8\x0d\x00\x00\x00\x2f\x75\x73\x72\x2f\x62\x69\x6e\x2f\x69";
 $payload .= "\x64\x00\x5e\x5e\x48\x31\xc0\x48\x8d\x3e\x6a\x00\x57\x48\x89";
 $payload .= "\xe6\x48\x31\xc0\xb8\x3b\x00\x00\x00\xba\x00\x00\x00\x00\x0f";
 $payload .= "\x05\x5e\x48\x31\xd2\xb8\x3c\x00\x00\x00\x0f\x05";

 my $ptr = mmap(0, length($payload), 3, 33);
 if($ptr == -1) {
     print "Failed to map memory\n";
     exit;
 }

 poke($ptr, $payload, length($payload));
 if(mprotect($ptr, length($payload), 5) == -1) {
     print "Failed to update memory protection\n";
     exit;
 }

 my $func = DynaLoader::dl_install_xsub(
        "_japh", # not really used
        $ptr, 
        __FILE__ # no file
 );

 # dereference and execute
 &{$func};
          

And finally:


 $ perl exec_asm.pl
 uid=1000(isra) gid=1000(isra)  ....
          

The execution of assembly code opens up the door for various interesting things with Perl. Stay tuned!


[ part 5: Extra mile ]


A simpler POKE mechanism was suggested by "Kalamata Hari" after this article was published, which consists in the use of the 'read' syscall to write data into a memory buffer obtained from mmap. This allows to replicate the script exec_asm64.pl from the previous section with fairly more simple and smaller code:


 use DynaLoader;

 $p  = "\xe8\x0d\x00\x00\x00\x2f\x75\x73\x72\x2f\x62\x69\x6e\x2f\x69";
 $p .= "\x64\x00\x5e\x5e\x48\x31\xc0\x48\x8d\x3e\x6a\x00\x57\x48\x89";
 $p .= "\xe6\x48\x31\xc0\xb8\x3b\x00\x00\x00\xba\x00\x00\x00\x00\x0f";
 $p .= "\x05\x5e\x48\x31\xd2\xb8\x3c\x00\x00\x00\x0f\x05";

 $f = "p";
 open $fh, '>', $f;
 syswrite($fh, $p);

 $sz = (stat $f)[7];
 $ptr = syscall(9, 0, $sz, 3, 33, -1, 0);     # mmap
 $fd = syscall(2, $f, 0);                     # open
 syscall(0, $fd, $ptr, $sz);                  # read
 syscall(10, $ptr, $sz, 5);                   # mprotect
 $x = DynaLoader::dl_install_xsub("", $ptr);
 &{$x};
          

In the code above the payload is defined as before and written into a temporary file. Then mmap is called to map a new memory area of size of the payload file and the file is open in read-only mode. The syscall read is then used with the file descriptor obtained from the open syscall and the pointer obtained from the mmap syscall. Finally the memory protection of the mapped area is updated and dl_install_xsub() is invoked to obtain a function reference for the mapped code.


[ references ]


[1] https://gist.github.com/monoxgas/c0b0f086fc7aa057a8256b42c66761c8
[2] https://perldoc.perl.org/perlguts#Working-with-SVs
[3] https://perldoc.perl.org/functions/pack
[4] https://man7.org/linux/man-pages/man2/mmap.2.html
[5] https://perldoc.perl.org/DynaLoader
[6] https://man7.org/linux/man-pages/man2/mprotect.2.html


[ sample code ]


exec_asm64.pl

IyEvdXNyL2Jpbi9wZXJsCiMKIyBleGVjX2FzbTY0LnBsOiBFeGVjdXRlIGFzc2VtYmx5IGNvZGUg
b24gTGludXggeDg2XzY0CiMgd3JpdHRlbiBieSBpc3JhIC0gaXNyYSBfcmVwbGFjZV9ieV9AXyBm
YXN0bWFpbC5uZXQgLSBodHRwczovL2hja25nLm9yZwojIGJhc2VkIG9uIGh0dHBzOi8vZ2lzdC5n
aXRodWIuY29tL21vbm94Z2FzL2MwYjBmMDg2ZmM3YWEwNTdhODI1NmI0MmM2Njc2MWM4CiMgdmVy
c2lvbiAwLjEgLSBvY3RvYmVyIDIwMjMKIwoKdXNlIEI7CnVzZSBDb25maWc7CnVzZSA1LjAwODAw
MTsKdXNlIER5bmFMb2FkZXI7CgojIG1lbW9yeSBtYXAKc3ViIG1tYXAgewogICAgIyBzeXNjYWxs
IG51bWJlciBmb3IgbW1hcCBpcyA5IG9uIExpbnV4IHg4Nl82NAogICAgIyAkYWRkciBjYW4gYmUg
YSBmaXhlZCB2YWx1ZSwgb3IgMCB0byBsZXQgbW1hcCBjaG9vc2Ugb25lCiAgICAjIGl0IHJldHVy
bnMgYSBwb2ludGVyIHRvIHRoZSBtYXBwZWQgYXJlYSBvbiBzdWNjZXNzLCAtMSBvbiBmYWlsdXJl
CiAgICBteSAoJGFkZHIsICRzaXplLCAkcHJvdGVjdCwgJGZsYWdzKSA9IEBfOwogICAgbXkgJHJl
dCA9IHN5c2NhbGwoOSwgJGFkZHIsICRzaXplLCAkcHJvdGVjdCwgJGZsYWdzLCAtMSwgMCk7CiAg
ICByZXR1cm4gJHJldDsKfQoKIyBtZW1vcnkgcHJvdGVjdApzdWIgbXByb3RlY3QgewogICAgIyBz
eXNjYWxsIG51bWJlciBmb3IgbXByb3RlY3QgaXMgMTAgb24gTGludXggeDg2XzY0CiAgICAjIGl0
IHJldHVybnMgMCBvbiBzdWNjZXNzLCAtMSBvbiBmYWlsdXJlCiAgICBteSAoJGFkZHIsICRzaXpl
LCAkcHJvdGVjdCkgPSBAXzsKICAgIG15ICRyZXQgPSBzeXNjYWxsKDEwLCAkYWRkciwgJHNpemUs
ICRwcm90ZWN0KTsKICAgIHJldHVybiAkcmV0Owp9CgojIGNvcHkgJGJ5dGVzIG9mIGxlbmd0aCAk
bGVuIGludG8gYWRkcmVzcyAkbG9jYXRpb24Kc3ViIHBva2UgewogICAgbXkoJGxvY2F0aW9uLCAk
Ynl0ZXMsICRsZW4pID0gQF87CiAgICBteSAkZHVtbXkgPSAnWCcgeCAkbGVuOwogICAgbXkgJGR1
bW15X2FkZHIgPSBcJGR1bW15ICsgMDsKCiAgICBteSAkc2l6ZSA9IDE2ICsgJENvbmZpZ3tpdnNp
emV9OwogICAgbXkgJGdob3N0X3N2X2NvbnRlbnRzID0gdW5wYWNrKCJQIi4kc2l6ZSwgcGFjaygi
USIsICRkdW1teV9hZGRyKSk7CiAgICBzdWJzdHIoICRnaG9zdF9zdl9jb250ZW50cywgMTYsICRD
b25maWd7aXZzaXplfSApID0gcGFjaygiUSIsICRsb2NhdGlvbik7CgogICAgbXkgJGdob3N0X3N0
cmluZ19yZWYgPSBibGVzcyggXCB1bnBhY2soCiAgICAgICAgIlEiLAogICAgICAgIGRvIHsgbm8g
d2FybmluZ3MgJ3BhY2snOyBwYWNrKCAnUCcsICRnaG9zdF9zdl9jb250ZW50cy4nJyApIH0sCiAg
ICApLCAnQjo6UFYnICktPm9iamVjdF8yc3ZyZWY7CgogICAgZXZhbCAnc3Vic3RyKCQkZ2hvc3Rf
c3RyaW5nX3JlZiwgMCwgJGxlbikgPSAkYnl0ZXMnOwp9CgpteSAkcGF5bG9hZCA9ICIiOwokcGF5
bG9hZCAuPSAiXHhlOFx4MGRceDAwXHgwMFx4MDBceDJmXHg3NVx4NzNceDcyXHgyZlx4NjJceDY5
XHg2ZVx4MmZceDY5IjsKJHBheWxvYWQgLj0gIlx4NjRceDAwXHg1ZVx4NWVceDQ4XHgzMVx4YzBc
eDQ4XHg4ZFx4M2VceDZhXHgwMFx4NTdceDQ4XHg4OSI7CiRwYXlsb2FkIC49ICJceGU2XHg0OFx4
MzFceGMwXHhiOFx4M2JceDAwXHgwMFx4MDBceGJhXHgwMFx4MDBceDAwXHgwMFx4MGYiOwokcGF5
bG9hZCAuPSAiXHgwNVx4NWVceDQ4XHgzMVx4ZDJceGI4XHgzY1x4MDBceDAwXHgwMFx4MGZceDA1
IjsKCnByaW50ICJcbiI7CnByaW50ICIqIiB4IDM5OwpwcmludCAiXG4qIGV4ZWNfYXNtNjQucGwg
LSBieSBpc3JhIC0gaGNrbmcub3JnICpcbiI7CnByaW50ICIqIiB4IDM5OwpwcmludCAiXG5cbiI7
CgpteSAkc2l6ZSA9IGxlbmd0aCgkcGF5bG9hZCk7CnByaW50ICJbK10gUGF5bG9hZCBzaXplOiAk
c2l6ZVxuIjsKcHJpbnQgIlsrXSBUcnlpbmcgdG8gbWFwIG5ldyBtZW1vcnkgYXJlYS4uLiI7Cm15
ICRwdHIgPSBtbWFwKDAsICRzaXplLCAzLCAzMyk7CmlmKCRwdHIgPT0gLTEpIHsKICAgIGRpZSAi
ZmFpbGVkIHRvIG1hcCBtZW1vcnlcbiI7Cn0KcHJpbnQgIk9LXG4iOwpwcmludGYoIlsrXSBTdGFy
dCBvZiBtYXBwZWQgYXJlYTogMHgleFxuIiwgJHB0cik7CgpwcmludGYoIlsrXSBUcnlpbmcgdG8g
UE9LRSBwYXlsb2FkIGF0IDB4JXguLi4iLCAkcHRyKTsKcG9rZSgkcHRyLCAkcGF5bG9hZCwgJHNp
emUpOwpwcmludCAiT0tcbiI7CgpwcmludCAiWytdIFRyeWluZyB0byB1cGRhdGUgbWVtb3J5IHBy
b3RlY3Rpb24uLi4iOwppZihtcHJvdGVjdCgkcHRyLCAkc2l6ZSwgNSkgPT0gLTEpIHsKICAgIGRp
ZSAiZmFpbGVkIHRvIHVwZGF0ZSBtZW1vcnkgcHJvdGVjdGlvblxuIjsKfQpwcmludCAiT0tcbiI7
CgpwcmludCAiWytdIFRyeWluZyB0byBpbnN0YWxsIHhzdWIuLi4iOwpteSAkZnVuYyA9IER5bmFM
b2FkZXI6OmRsX2luc3RhbGxfeHN1YigKICAgICJfamFwaCIsICMgbm90IHJlYWxseSB1c2VkCiAg
ICAkcHRyLCAKICAgIF9fRklMRV9fICMgbm8gZmlsZQopOwpwcmludCAiT0tcbiI7CgpwcmludCAi
WytdIEdvaW5nIHRvIGV4ZWN1dGU6XG5cbiI7CgojIGRlcmVmZXJlbmNlIGFuZCBleGVjdXRlCiZ7
JGZ1bmN9Owo=