Perl hacking I: PEEK & POKE & XSUB ~ isra - 2023-10-24 ====[ intro ]=================================================================== By default Perl does not provide builtin functions for accessing and modifying raw memory content. However, some tricks can be used to alter the internal representation of variables and achieve the copy of arbitrary data into a mapped memory address. This allows to inject assembly code in memory and then trick Perl to execute it as if it were an external subroutine. This article describes such implementation for Linux x86_64 based on [1]. UPDATE: a simpler mechanism for POKE was suggested after this article was published (added in part 5). ====[ part 1: Perl internal data types ]====================================== Perl has three main data types: Scalar Value (SV), Array Value (AV), and Hash Value (HV), along with a special typedef for integer values (IV) which is guaranteed to be long enough to hold a pointer. SVs can also hold various types of values including integer values (IV) and strings (PV). In this case PV stands for "Pointer Value" which is a pointer to a string, but it can also point to other things according to Perl's documentation[2]. The internal structure of an SV is defined in sv.h as follows: -------------------------------- sv.h ------------------------------------------ #define _SV_HEAD(ptrtype) \ ptrtype sv_any; /* pointer to body */ \ U32 sv_refcnt; /* how many references to us */ \ U32 sv_flags /* what we are */ [...] #define _SV_HEAD_UNION \ union { \ char* svu_pv; /* pointer to malloced string */ \ IV svu_iv; \ UV svu_uv; \ _NV_BODYLESS_UNION \ SV* svu_rv; /* pointer to another SV */ \ SV** svu_array; \ HE** svu_hash; \ GP* svu_gp; \ PerlIO *svu_fp; \ } sv_u \ [...] struct STRUCT_SV { /* struct sv { */ _SV_HEAD(void*); _SV_HEAD_UNION; }; -------------------------------------------------------------------------------- As shown above, the head of an SV has a ptrtype field 'sv_any' which points to its body and two U32 fields 'sv_refcnt' and 'sv_flags' to keep track of how many references point to it and its corresponding flags, respectively. An internal representation of an SV can also be examined by using Devel::Peek's Dump function as follows: -------------------------------------------------------------------------------- $ perl -MDevel::Peek -e 'my $a=42;Dump $a' SV = IV(0x557bda4d4810) at 0x557bda4d4820 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 42 -------------------------------------------------------------------------------- The output above shows an SV with reference count 1, two flags (IOK,pIOK) and the integer value 42. Similarly: -------------------------------------------------------------------------------- $ perl -MDevel::Peek -e 'my $s="42";Dump $s' SV = PV(0x55b493eeeea0) at 0x55b493f1b490 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK) PV = 0x55b493f24390 "42"\0 CUR = 2 LEN = 10 COW_REFCNT = 1 -------------------------------------------------------------------------------- The ouput above shows an SV (or SvPV) with reference count 1, three flags (POK, IsCOW, pPIOK) and a PV pointing to the address 0x55b493f24390 where the string "42" is stored. A more elaborated example: -------------------------------------------------------------------------------- $ perl -MDevel::Peek -e 'my $s="42";my $x=\$s+0;Dump $s;Dump $x' SV = PV(0x5650de3d5ea0) at 0x5650de402828 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK) PV = 0x5650de40b5c0 "42"\0 CUR = 2 LEN = 10 COW_REFCNT = 1 SV = IV(0x5650de402830) at 0x5650de402840 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 94905326118952 -------------------------------------------------------------------------------- In this case $x holds an IV with the memory address of $s. This can be verified by looking at the hexadecimal value of 94905326118952: -------------------------------------------------------------------------------- $ perl -e 'printf("0x%x\n", 94905326118952)' 0x5650de402828 -------------------------------------------------------------------------------- ====[ part 2: PEEK ]=========================================================== To peek at an SV the script at [1] uses the unpack[3] function with the "P" template which defines "A pointer to a structure (fixed-length string)". The first step then is to create a dummy string of size $len (the size of payload) and obtain the memory address of the associated SvPV. -------------------------------------------------------------------------------- my $dummy = 'X' x $len; my $dummy_addr = \$dummy + 0; -------------------------------------------------------------------------------- Next, a pointer to a structure of size '8 + 4 + 4 + $Config{ivsize}' is obtained from the memory address of the dummy SvPV: -------------------------------------------------------------------------------- my $size = 8 + 4 + 4 + $Config{ivsize}; my $ghost_sv_contents = unpack("P".$size, pack("Q", $dummy_addr)); -------------------------------------------------------------------------------- The size 8 + 4 + 4 + $Config{ivsize} refers to the following: - 8 bytes for the sv_any pointer - 4 bytes for the reference count - 4 bytes for the flags - $Config{ivsize} bytes for the size of an IV (usually 8). For instance, consider the following code: ---------------------------- peek.pl ------------------------------------------- use Config; use Devel::Peek; my $dummy = 'X' x 10; my $dummy_addr = \$dummy + 0; my $size = 8 + 4 + 4 + $Config{ivsize}; my $ghost_sv_contents = unpack("P".$size, pack("Q", $dummy_addr)); Dump $dummy; my $sv_any = substr($ghost_sv_contents, 0, 8); my $refcnt = substr($ghost_sv_contents, 8, 4); my $flags = substr($ghost_sv_contents, 12, 4); my $pv_addr = substr($ghost_sv_contents, 16, $Config{ivsize}); printf("sv_any: 0x%x\n", unpack("Q", $sv_any)); printf("refcnt: %d\n", unpack("L", $refcnt)); printf("flags: %b\n", unpack("L", $flags)); printf("PV addr: 0x%x\n", unpack("Q", $pv_addr)); -------------------------------------------------------------------------------- Then: -------------------------------------------------------------------------------- $ perl peek.pl SV = PV(0x55c9e6044ea0) at 0x55c9e60718a8 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x55c9e607c6b0 "XXXXXXXXXX"\0 CUR = 10 LEN = 12 sv_any: 0x55c9e6044ea0 refcnt: 1 flags: 100010000000011 PV addr: 0x55c9e607c6b0 -------------------------------------------------------------------------------- In the example above $sv_any holds the memory address of the SvPV and $pv_addr holds the memory address pointed by the PV where the string "XXXXXXXXXX" is stored. This can be illustrated as follows: (0x55c9e60718a8) __________________________ |sv_any 0x55c9e6044ea0 | -----> (0x55c9e6044ea0) |__________________________| _____________________ |sv_refcnt 1 | |PV 0x55c9e6044ea0 |----> (0x55c9e607c6b0) |__________________________| |_____________________| ________________ |sv_flags 100010000000011| ... | "XXXXXXXXXX" | |__________________________| |________________| .... ====[ part 3: POKE ]=========================================================== To poke a memory address $addr of choice the dummy SvPV structure is first modified by overriding its last 8 bytes (which hold the address pointed by the PV). Then the module B is used to create a new B::PV object based on the modified SvPV structure and finally the new SvPV is dereferenced to modify the contents of the string pointed by the PV: -------------------------------------------------------------------------------- substr( $ghost_sv_contents, 8 + 4 + 4, $Config{ivsize} ) = $addr; my $ghost_string_ref = bless( \ unpack( "Q", do { no warnings 'pack'; pack( 'P', $ghost_sv_contents.'' ) }, ), 'B::PV' )->object_2svref; eval 'substr($$ghost_string_ref, 0, $len) = $bytes'; -------------------------------------------------------------------------------- To verify the above the mmap[4] syscall can be used to map a new area in memory and copy a given payload into it (in this case a string). Consider the following code: -------------------------------- poke.pl --------------------------------------- use B; use Config; use 5.008001; use Devel::Peek; sub mmap { my ($addr, $size, $protect, $flags) = @_; my $ret = syscall(9, $addr, $size, $protect, $flags, -1, 0); return $ret; } sub poke { my($location, $bytes, $len) = @_; my $addr = pack("Q", $location); my $dummy = 'X' x $len; my $dummy_addr = \$dummy + 0; my $size = 8 + 4 + 4 + $Config{ivsize}; my $ghost_sv_contents = unpack("P".$size, pack("Q", $dummy_addr)); substr( $ghost_sv_contents, 8 + 4 + 4, $Config{ivsize} ) = $addr; my $ghost_string_ref = bless( \ unpack( "Q", do { no warnings 'pack'; pack( 'P', $ghost_sv_contents.'' ) }, ), 'B::PV' )->object_2svref; eval 'substr($$ghost_string_ref, 0, $len) = $bytes'; Dump $$ghost_string_ref; return $len; } my $payload = "japh"; my $ptr = mmap(0, length($payload), 3, 33); if($ptr == -1) { print "Failed to map memory\n"; exit; } printf("Using memory address 0x%x\n", $ptr); poke($ptr, $payload, length($payload)); -------------------------------------------------------------------------------- In the code above the value 0 is used in mmap() as the $addr parameter to let the system choose a start address for the new mapped area. The values 3 and 33 are used for $protect and $flags parameters based on the following: * PROT_READ | PROT_WRITE = 3 - to allow writing into the mapped area * MAP_SHARED | MAP_ANONYMOUS = 33 - to avoid the use of files mmap() will return the starting address of the mapped area on success and -1 on failure. Then when running the script: -------------------------------------------------------------------------------- $ perl peekpoke.pl Using memory address 0x15239f7b2000 SV = PV(0x55ceffa3dff0) at 0x55ceffb90a10 REFCNT = 2 FLAGS = (POK,pPOK) PV = 0x15239f7b2000 "japh"\0 CUR = 4 LEN = 10 -------------------------------------------------------------------------------- In the example above a new memory area of size length($payload) is mapped (with write permissions) and the pointer 0x15239f7b2000 is returned by mmap as the start of the new area. Then poke() is used to copy $payload into such memory area which is pointed by the PV of $$ghost_string_ref. ====[ part 4: XSUB ]=========================================================== POKE can be used to copy more interesting things into a mapped memory area, such as strings containing assembly code. Such code can then be executed using Perl's DynaLoader[5] module which "Dynamically load C libraries into Perl code". To do this the protection of the mapped memory area needs to be updated first to allow execution. This can be done with the mprotect[6] syscall as follows: -------------------------------------------------------------------------------- sub mprotect { my ($addr, $size, $protect) = @_; my $ret = syscall(10, $addr, $size, $protect); return $ret; } if(mprotect($ptr, length($payload), 5) == -1) { print "Failed to update memory protection\n"; exit; } -------------------------------------------------------------------------------- In the code above $addr specifies the start address of the mapped area to be updated and $protect defines the new value for the memory protection, which in this case is: * PROT_READ | PROT_EXEC = 5 - to allow execution of the mapped area mprotect() will return 0 on success and -1 on failure. The next step is using dl_install_xsub() which creates a new Perl external subroutine based on the parameters $perl_name and $symref and returns a reference to the "installed function": my $func = dl_install_xsub($perl_name, $symref [, $filename]) $symref is expected to be a pointer to the function which implements the routine to be installed, however, a pointer to the payload copied into memory can be used instead to obtain a function reference for execution: -------------------------------------------------------------------------------- my $func = DynaLoader::dl_install_xsub( "_japh", # not really used $ptr, __FILE__ # no file ); # dereference and execute &{$func}; -------------------------------------------------------------------------------- To try it out a simple payload for calling execve with "/usr/bin/id" will be considered: -------------------------------------------------------------------------------- BITS 64 global main section .text main: call run db "/usr/bin/id", 0x0 run: ;;;;;;;;;;;;;;;;;;;;;;;;; ; call id ;;;;;;;;;;;;;;;;;;;;;;;;; pop rsi pop rsi xor rax, rax lea rdi, [rsi] ; argv ; ["/usr/bin/id"] push 0 push rdi ; "/usr/bin/id" mov rsi, rsp ; execve & exit xor rax, rax mov rax, 59 mov rdx, 0 syscall pop rsi xor rdx, rdx mov rax, 60 syscall -------------------------------------------------------------------------------- Then the final code is as follows (using the hexadecimal representation of the previous payload): ------------------------- exec_asm64.pl ---------------------------------------- use B; use Config; use 5.008001; use DynaLoader; use Devel::Peek; sub mmap { ... } sub mprotect { ... } sub poke { ... } my $payload = ""; $payload .= "\xe8\x0d\x00\x00\x00\x2f\x75\x73\x72\x2f\x62\x69\x6e\x2f\x69"; $payload .= "\x64\x00\x5e\x5e\x48\x31\xc0\x48\x8d\x3e\x6a\x00\x57\x48\x89"; $payload .= "\xe6\x48\x31\xc0\xb8\x3b\x00\x00\x00\xba\x00\x00\x00\x00\x0f"; $payload .= "\x05\x5e\x48\x31\xd2\xb8\x3c\x00\x00\x00\x0f\x05"; my $ptr = mmap(0, length($payload), 3, 33); if($ptr == -1) { print "Failed to map memory\n"; exit; } poke($ptr, $payload, length($payload)); if(mprotect($ptr, length($payload), 5) == -1) { print "Failed to update memory protection\n"; exit; } my $func = DynaLoader::dl_install_xsub( "_japh", # not really used $ptr, __FILE__ # no file ); # dereference and execute &{$func}; -------------------------------------------------------------------------------- And finally: -------------------------------------------------------------------------------- $ perl exec_asm64.pl uid=1000(isra) gid=1000(isra) .... -------------------------------------------------------------------------------- The execution of assembly code opens up the door for various interesting things with Perl. Stay tuned! ====[ part 5: Extra mile ]===================================================== A simpler POKE mechanism was suggested by "Kalamata Hari" after this article was published, which consists in the use of the 'read' syscall to write data into a memory buffer obtained from mmap. This allows to replicate the script exec_asm64.pl from the previous section with fairly more simple and smaller code: ---------------------- exec_asm64-open.pl -------------------------------------- use DynaLoader; $p = "\xe8\x0d\x00\x00\x00\x2f\x75\x73\x72\x2f\x62\x69\x6e\x2f\x69"; $p .= "\x64\x00\x5e\x5e\x48\x31\xc0\x48\x8d\x3e\x6a\x00\x57\x48\x89"; $p .= "\xe6\x48\x31\xc0\xb8\x3b\x00\x00\x00\xba\x00\x00\x00\x00\x0f"; $p .= "\x05\x5e\x48\x31\xd2\xb8\x3c\x00\x00\x00\x0f\x05"; $f = "p"; open $fh, '>', $f; syswrite($fh, $p); $sz = (stat $f)[7]; $ptr = syscall(9, 0, $sz, 3, 33, -1, 0); # mmap $fd = syscall(2, $f, 0); # open syscall(0, $fd, $ptr, $sz); # read syscall(10, $ptr, $sz, 5); # mprotect $x = DynaLoader::dl_install_xsub("", $ptr); &{$x}; -------------------------------------------------------------------------------- In the code above the payload is defined as before and written into a temporary file. Then mmap is called to map a new memory area of size of the payload file and the file is open in read-only mode. The syscall read is then used with the file descriptor obtained from the open syscall and the pointer obtained from the mmap syscall. Finally the memory protection of the mapped area is updated and dl_install_xsub() is invoked to obtain a function reference for the mapped code. ====[ references ]============================================================= [1] https://gist.github.com/monoxgas/c0b0f086fc7aa057a8256b42c66761c8 [2] https://perldoc.perl.org/perlguts#Working-with-SVs [3] https://perldoc.perl.org/functions/pack [4] https://man7.org/linux/man-pages/man2/mmap.2.html [5] https://perldoc.perl.org/DynaLoader [6] https://man7.org/linux/man-pages/man2/mprotect.2.html ====[ sample source code ]====================================================== IyEvdXNyL2Jpbi9wZXJsCiMKIyBleGVjX2FzbTY0LnBsOiBFeGVjdXRlIGFzc2VtYmx5IGNvZGUg b24gTGludXggeDg2XzY0CiMgd3JpdHRlbiBieSBpc3JhIC0gaXNyYSBfcmVwbGFjZV9ieV9AXyBm YXN0bWFpbC5uZXQgLSBodHRwczovL2hja25nLm9yZwojIGJhc2VkIG9uIGh0dHBzOi8vZ2lzdC5n aXRodWIuY29tL21vbm94Z2FzL2MwYjBmMDg2ZmM3YWEwNTdhODI1NmI0MmM2Njc2MWM4CiMgdmVy c2lvbiAwLjEgLSBvY3RvYmVyIDIwMjMKIwoKdXNlIEI7CnVzZSBDb25maWc7CnVzZSA1LjAwODAw MTsKdXNlIER5bmFMb2FkZXI7CgojIG1lbW9yeSBtYXAKc3ViIG1tYXAgewogICAgIyBzeXNjYWxs IG51bWJlciBmb3IgbW1hcCBpcyA5IG9uIExpbnV4IHg4Nl82NAogICAgIyAkYWRkciBjYW4gYmUg YSBmaXhlZCB2YWx1ZSwgb3IgMCB0byBsZXQgbW1hcCBjaG9vc2Ugb25lCiAgICAjIGl0IHJldHVy bnMgYSBwb2ludGVyIHRvIHRoZSBtYXBwZWQgYXJlYSBvbiBzdWNjZXNzLCAtMSBvbiBmYWlsdXJl CiAgICBteSAoJGFkZHIsICRzaXplLCAkcHJvdGVjdCwgJGZsYWdzKSA9IEBfOwogICAgbXkgJHJl dCA9IHN5c2NhbGwoOSwgJGFkZHIsICRzaXplLCAkcHJvdGVjdCwgJGZsYWdzLCAtMSwgMCk7CiAg ICByZXR1cm4gJHJldDsKfQoKIyBtZW1vcnkgcHJvdGVjdApzdWIgbXByb3RlY3QgewogICAgIyBz eXNjYWxsIG51bWJlciBmb3IgbXByb3RlY3QgaXMgMTAgb24gTGludXggeDg2XzY0CiAgICAjIGl0 IHJldHVybnMgMCBvbiBzdWNjZXNzLCAtMSBvbiBmYWlsdXJlCiAgICBteSAoJGFkZHIsICRzaXpl LCAkcHJvdGVjdCkgPSBAXzsKICAgIG15ICRyZXQgPSBzeXNjYWxsKDEwLCAkYWRkciwgJHNpemUs ICRwcm90ZWN0KTsKICAgIHJldHVybiAkcmV0Owp9CgojIGNvcHkgJGJ5dGVzIG9mIGxlbmd0aCAk bGVuIGludG8gYWRkcmVzcyAkbG9jYXRpb24Kc3ViIHBva2UgewogICAgbXkoJGxvY2F0aW9uLCAk Ynl0ZXMsICRsZW4pID0gQF87CiAgICBteSAkZHVtbXkgPSAnWCcgeCAkbGVuOwogICAgbXkgJGR1 bW15X2FkZHIgPSBcJGR1bW15ICsgMDsKCiAgICBteSAkc2l6ZSA9IDE2ICsgJENvbmZpZ3tpdnNp emV9OwogICAgbXkgJGdob3N0X3N2X2NvbnRlbnRzID0gdW5wYWNrKCJQIi4kc2l6ZSwgcGFjaygi USIsICRkdW1teV9hZGRyKSk7CiAgICBzdWJzdHIoICRnaG9zdF9zdl9jb250ZW50cywgMTYsICRD b25maWd7aXZzaXplfSApID0gcGFjaygiUSIsICRsb2NhdGlvbik7CgogICAgbXkgJGdob3N0X3N0 cmluZ19yZWYgPSBibGVzcyggXCB1bnBhY2soCiAgICAgICAgIlEiLAogICAgICAgIGRvIHsgbm8g d2FybmluZ3MgJ3BhY2snOyBwYWNrKCAnUCcsICRnaG9zdF9zdl9jb250ZW50cy4nJyApIH0sCiAg ICApLCAnQjo6UFYnICktPm9iamVjdF8yc3ZyZWY7CgogICAgZXZhbCAnc3Vic3RyKCQkZ2hvc3Rf c3RyaW5nX3JlZiwgMCwgJGxlbikgPSAkYnl0ZXMnOwp9CgpteSAkcGF5bG9hZCA9ICIiOwokcGF5 bG9hZCAuPSAiXHhlOFx4MGRceDAwXHgwMFx4MDBceDJmXHg3NVx4NzNceDcyXHgyZlx4NjJceDY5 XHg2ZVx4MmZceDY5IjsKJHBheWxvYWQgLj0gIlx4NjRceDAwXHg1ZVx4NWVceDQ4XHgzMVx4YzBc eDQ4XHg4ZFx4M2VceDZhXHgwMFx4NTdceDQ4XHg4OSI7CiRwYXlsb2FkIC49ICJceGU2XHg0OFx4 MzFceGMwXHhiOFx4M2JceDAwXHgwMFx4MDBceGJhXHgwMFx4MDBceDAwXHgwMFx4MGYiOwokcGF5 bG9hZCAuPSAiXHgwNVx4NWVceDQ4XHgzMVx4ZDJceGI4XHgzY1x4MDBceDAwXHgwMFx4MGZceDA1 IjsKCnByaW50ICJcbiI7CnByaW50ICIqIiB4IDM5OwpwcmludCAiXG4qIGV4ZWNfYXNtNjQucGwg LSBieSBpc3JhIC0gaGNrbmcub3JnICpcbiI7CnByaW50ICIqIiB4IDM5OwpwcmludCAiXG5cbiI7 CgpteSAkc2l6ZSA9IGxlbmd0aCgkcGF5bG9hZCk7CnByaW50ICJbK10gUGF5bG9hZCBzaXplOiAk c2l6ZVxuIjsKcHJpbnQgIlsrXSBUcnlpbmcgdG8gbWFwIG5ldyBtZW1vcnkgYXJlYS4uLiI7Cm15 ICRwdHIgPSBtbWFwKDAsICRzaXplLCAzLCAzMyk7CmlmKCRwdHIgPT0gLTEpIHsKICAgIGRpZSAi ZmFpbGVkIHRvIG1hcCBtZW1vcnlcbiI7Cn0KcHJpbnQgIk9LXG4iOwpwcmludGYoIlsrXSBTdGFy dCBvZiBtYXBwZWQgYXJlYTogMHgleFxuIiwgJHB0cik7CgpwcmludGYoIlsrXSBUcnlpbmcgdG8g UE9LRSBwYXlsb2FkIGF0IDB4JXguLi4iLCAkcHRyKTsKcG9rZSgkcHRyLCAkcGF5bG9hZCwgJHNp emUpOwpwcmludCAiT0tcbiI7CgpwcmludCAiWytdIFRyeWluZyB0byB1cGRhdGUgbWVtb3J5IHBy b3RlY3Rpb24uLi4iOwppZihtcHJvdGVjdCgkcHRyLCAkc2l6ZSwgNSkgPT0gLTEpIHsKICAgIGRp ZSAiZmFpbGVkIHRvIHVwZGF0ZSBtZW1vcnkgcHJvdGVjdGlvblxuIjsKfQpwcmludCAiT0tcbiI7 CgpwcmludCAiWytdIFRyeWluZyB0byBpbnN0YWxsIHhzdWIuLi4iOwpteSAkZnVuYyA9IER5bmFM b2FkZXI6OmRsX2luc3RhbGxfeHN1YigKICAgICJfamFwaCIsICMgbm90IHJlYWxseSB1c2VkCiAg ICAkcHRyLCAKICAgIF9fRklMRV9fICMgbm8gZmlsZQopOwpwcmludCAiT0tcbiI7CgpwcmludCAi WytdIEdvaW5nIHRvIGV4ZWN1dGU6XG5cbiI7CgojIGRlcmVmZXJlbmNlIGFuZCBleGVjdXRlCiZ7 JGZ1bmN9Owo=