Question

I'm working with the PHP shm (part of the semaphores extension, not to be confused with the shmop ones!) functions in a project. Basically the shared memory serves as kind of heap, I have only one array inside in which I'm storing keys (with meaningless values) as hashed index, I just check "Ah, it's there already". Now my problem is: that array can get quite big at times, but it doesn't always. I don't want to reserve a huge amount of memory I don't usually need, but rather resize dynamically.

I have registered an error handler that converts errors into ErrorExceptions, so I can catch the error thrown by shm_put_var when the memory is to small to store the array - but unfortunatly PHP clears the segment when data doesn't fit in there, so all other data is lost, too. This isn't an option therefore.

Because of this, I need a way to predict the size I'll need to store the data. One of the comments to shm_attach at php.net states that PHP appends an header of (PHP_INT_SIZE * 4) + 8bytes length, and one variable needs strlen(serialize($foo)) + 4 * PHP_INT_SIZE) + 4 (I have simplified the expression given in the comment, it's equal to mine but was blown up unecessarily)
While the header size seems to be correct (any memory smaller than 24 byte results in an error at creation, so 24 bytes seems to be the size of the header PHP puts in there), the size of each variable entry doesn't seem to hold true anymore in recent versions of PHP:
- I could store "1" in a shared memory segment with a size of 24 + strlen(serialize("1") + 3 * PHP_INT_SIZE) + 4 byte (note the 3 in there instead of 4),
- I couldn NOT store "999" in one sized 24 + strlen(serialize("999") + 4 * PHP_INT_SIZE) + 4

Does anyone know a way to predict how much memory is needed to store any data in shared memory using the shm functions or has some reference on how shm stores the variables? (I read the whole contets using shmop functions and printed them, but since it's binary data it's not reverse-engineerable in reasonable time)

(I will provide code samples as needed, I'm just not sure what parts will get relevant - ping me if you want to see any working samples, I have tried much so I have samples ready for most cases)


[Update] My C is pretty bad, so I odn't get far looking at the source (sysvshm.c and php_sysvshm.h), but I already found one issue with the solution that was suggested at php.net: While I could simplify the complex formula there to what I have included here (which was taken from the C sourcecode basically), this is NOT possible with the original one, as there are typecasts and no floating point math. The formula divides by sizeof(long) and multiplies with it again - which is useless in PHP but does round to multiples of sizeof(long) on C. SO I need to correct that in PHP first. Still, this is not everything, as Tests showed that I could store some values in even less memory than returned by the formula (see above).

Was it helpful?

Solution 2

Ok, answering this myself, as I figured it out by now. I still have no sources but my own research, so feel free to comment with any helpful links or answer on your own.

Most important thing first: a working formula to calculate the size necessary to store data in shared memory using shm_* functions is:

$header = 24; // actually 4*4 + 8
$dataLength = (ceil(strlen(serialize($data)) / 4) * 4) + 16; // actually that 16 is 4*4

The header with the size of $header is only stored once at the beginning of the memory segment and is stored when the segment is allocated (using shm_attach the first time with that system v ressource key), even if no data is written. Therefore, you cannot ever create a memory segment smaller than 24 byte.

If you onyl want to use this and don'T care bout the details, just one warning: this is correct as long as PHP is compiled on a system that uses 32 bits for longs in C. If PHP is compiled with 64 bit longs, it's most likely a header size of 4 * 8 + 8 = 40 and each data variable needs (ceil(strlen(serialize($data)) / 8) * 8) + 32. Details in the explanation below.


So, how did I get there?

I looked into the PHP sourcecode. I don't know much C, so what I'm telling here is only how I got it, it may be nothing more than a lot of hot air...

The relevant files are already linked in the question - look there. The important parts are:

From php_sysvshm.h:

typedef struct {
    long key;
    long length;
    long next;
    char mem;
} sysvshm_chunk;

typedef struct {
    char magic[8];
    long start;
    long end;
    long free;
    long total;
} sysvshm_chunk_head;

And from sysvshm.c:

/* these are lines 166 - 173 in the sourcecode of PHP 5.2.17 (the one I found frist), 
   line nubmers may differ in recent versions */

/* check if shm is already initialized */
chunk_ptr = (sysvshm_chunk_head *) shm_ptr;
if (strcmp((char*) &(chunk_ptr->magic), "PHP_SM") != 0) {
    strcpy((char*) &(chunk_ptr->magic), "PHP_SM");
    chunk_ptr->start = sizeof(sysvshm_chunk_head);
    chunk_ptr->end = chunk_ptr->start;
    chunk_ptr->total = shm_size;
    chunk_ptr->free = shm_size-chunk_ptr->end;
}
 /* these are lines 371 - 397, comments as above */

 /* {{{ php_put_shm_data
 * inserts an ascii-string into shared memory */
static int php_put_shm_data(sysvshm_chunk_head *ptr, long key, char *data, long len)
{
    sysvshm_chunk *shm_var;
    long total_size;
    long shm_varpos;

    total_size = ((long) (len + sizeof(sysvshm_chunk) - 1) / sizeof(long)) * sizeof(long) + sizeof(long); /* long alligment */

    if ((shm_varpos = php_check_shm_data(ptr, key)) > 0) {
        php_remove_shm_data(ptr, shm_varpos);
    }

    if (ptr->free < total_size) {
        return -1; /* not enough memeory */
    }

    shm_var = (sysvshm_chunk *) ((char *) ptr + ptr->end);
    shm_var->key = key;
    shm_var->length = len;
    shm_var->next = total_size;
    memcpy(&(shm_var->mem), data, len);
    ptr->end += total_size;
    ptr->free -= total_size;
    return 0;
}
/* }}} */

So, lot'S of code, I'll try to break it down.

The parts from php_sysvshm.h tell us what size those structures ahve, we'll need that. I'm assuming each char has8 bits (which is most likely valid on any system), and each longhas 32 bits (which may differ on some systems that actually use 64 bit - you have to change the numbers then).

  • sysvshm_chunk has 3*sizeof(long) + sizeof(char), that makes 3*4 + 1 = 13 bytes.
  • sysvshm_chunk_head has 8*sizeof(char) + 4*sizeof(long), that makes 8*1 + 4*4 = 24 bytes.

Now the first part from sysvshm.c is part of the code that gets executed when we're calling shm_attach in PHP. It initializes the memory segment by writing a header strucutre - the one defined as sysvshm_chunk_head we already talked about - if it'S not there already. This will need the 24 byte we calculated - the same 24 byte I gave in the formular right at the beginning.

The second part is the function that actually inserts a variable into the shared memory. This get's called by another function, but I skipped that one, as it's not that usefull. Basicall, it gets the shared memory header structure, whcih includes the addresses of start and end of the data inside the meory segment. It then gets a longwith the variavble key you used to store the variable, a char* (well, similar to strings, but C version) with the already serialized data, and the length of that data (for whatever reason, it could calculate that on it's own, but anyway).
For each data, a header (the structure defined as sysvshm_chunk we looked at) plus the actual data is now written into the memory. It is aligned to long however for easier memory management (that means: It's size is always rounded to the next multiple of sizeof(long), which is 4 bytes on most systems again). Now here it becomes a little strange. According to the C code we're looking at, (ceil((strlen(serialize($data)) + 13 - 1) / 4) * 4) ; should work (that13 in there is sizeof(sysvshm_chunk)). But: It doesn't. It always yields 4 bytes less then we actually need. couldn't find those four bytes. I assume that the length of that serialized data (len) is already alingned, but I didn't look into the source for that. But I couldn't find those 4 bytes anywhere else. The char is lasted in the C structure definition, and charis aligned on full bytes and nothing more, so that shouldn't cause those 4 additional bytes either - but if I'm wrong of how C alignes those, that could be the reason, too. ANyway, I aligned the data and the header individually in my formula, and it worked (aligned header alweayss has 16 bytes, that's the 16 in my formula, the data length gets aligned by that divide-round-multiply thingy). But, technically, the formula could also be

 $dataLength = (ceil((strlen(serialize($data)) + 13 - 1) / 4) * 4) + 4;

It yields the sam results however, if I just missed those 4 bytes somewhere else. I have no system with a PHP versoin running that was compiled with 64 bit longs, so I cannot verify which one is correct.

tl;dr: problem solved, comments welcome, if you got any additional questions, now is the time.

OTHER TIPS

As a workaround for the problem of a variable being deleted when trying to update it and there is not enough free space in the segment for the new value, you can check first if there is enough free space, and if so, only then you proceed to update.
The following function uses shmop_* API to obtain the used, free and total space in a segment created with shm_attach.

function getMemSegmentStats($segmentKey){
    $segId = shmop_open($segmentKey, 'a', 0, 0) ;
    $wc = PHP_INT_SIZE/4 ;
    $stats = unpack("I{$wc}used/I{$wc}free/I{$wc}total",shmop_read($segId,8+PHP_INT_SIZE,3*PHP_INT_SIZE)) ;
    shmop_close($segId) ;
    return combineUnpackLHwords($stats) ;
}

function combineUnpackLHwords($array){
    foreach($array as $key => &$val)
        if( preg_match('/([^\d]+)(\d+)/',$key,$matches) ){
            $key2 = $matches[1].($matches[2]+1) ;
            $array[$matches[1]] = $val | $array[$key2] << 4*PHP_INT_SIZE ;
            unset( $array[$key], $array[$key2] ) ;
        }
    return $array ;
}

The function combineUnpackLHwords is needed in 64bit machines because the unpack function doesn't unpack 64bit integers so they have to be constructed from the low-order and high-order 32-bit words (on 32bit machines the function has no effect).

Example:

$segmentKey = ftok('/path/to/a/file','A') ;
$segmentStats = getMemSegmentStats($segmentKey) ;
print_r($segmentStats) ;

Output:

Array
(
    [used] => 3296
    [free] => 96704
    [total] => 100000
)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top