Trying to understand the workings of com.adobe.net.URIEncodingBitmap

https://stackoverflow.com/questions/20177778

04-08-2022
|

Question

I'm examining the URIEncodingBitmap class of the com.adobe.net package, and I'm having a hard time understanding the internal workings, exactly. Here's the code:

package com.adobe.net
{
    import flash.utils.ByteArray;

    /**
     * This class implements an efficient lookup table for URI
     * character escaping.  This class is only needed if you
     * create a derived class of URI to handle custom URI
     * syntax.  This class is used internally by URI.
     * 
     * @langversion ActionScript 3.0
     * @playerversion Flash 9.0* 
     */
    public class URIEncodingBitmap extends ByteArray
    {
        /**
         * Constructor.  Creates an encoding bitmap using the given
         * string of characters as the set of characters that need
         * to be URI escaped.
         * 
         * @langversion ActionScript 3.0
         * @playerversion Flash 9.0
         */
        public function URIEncodingBitmap(charsToEscape:String) : void
        {
            var i:int;
            var data:ByteArray = new ByteArray();

            // Initialize our 128 bits (16 bytes) to zero
            for (i = 0; i < 16; i++)
                this.writeByte(0);

            data.writeUTFBytes(charsToEscape);
            data.position = 0;

            while (data.bytesAvailable)
            {
                var c:int = data.readByte();

                if (c > 0x7f)
                    continue;  // only escape low bytes

                var enc:int;
                this.position = (c >> 3);
                enc = this.readByte();
                enc |= 1 << (c & 0x7);
                this.position = (c >> 3);
                this.writeByte(enc);
            }
        }

        /**
         * Based on the data table contained in this object, check
         * if the given character should be escaped.
         * 
         * @param char  the character to be escaped.  Only the first
         * character in the string is used.  Any other characters
         * are ignored.
         * 
         * @return  the integer value of the raw UTF8 character.  For
         * example, if '%' is given, the return value is 37 (0x25).
         * If the character given does not need to be escaped, the
         * return value is zero.
         * 
         * @langversion ActionScript 3.0
         * @playerversion Flash 9.0 
         */
        public function ShouldEscape(char:String) : int
        {
            var data:ByteArray = new ByteArray();
            var c:int, mask:int;

            // write the character into a ByteArray so
            // we can pull it out as a raw byte value.
            data.writeUTFBytes(char);
            data.position = 0;
            c = data.readByte();

            if (c & 0x80)
            {
                // don't escape high byte characters.  It can make international
                // URI's unreadable.  We just want to escape characters that would
                // make URI syntax ambiguous.
                return 0;
            }
            else if ((c < 0x1f) || (c == 0x7f))
            {
                // control characters must be escaped.
                return c;
            }

            this.position = (c >> 3);
            mask = this.readByte();

            if (mask & (1 << (c & 0x7)))
            {
                // we need to escape this, return the numeric value
                // of the character
                return c;
            }
            else
            {
                return 0;
            }
        }
    }
}

Although I understand the workings of ByteArray and the workings of various (bitwise) operators (>>, <<, &, |=, etc.), I'm almost at a complete loss of what this class does exactly (or rather: why it does things the way it does).

Could somebody give a run down on what the purpose of all the bit-shifting and masking is in this class? Particularly:

What is the constructor initializing exactly, and why?

a. What is this.position = (c >> 3); doing, or rather, why?

b. What is enc |= 1 << (c & 0x7); doing?
What is the mask doing exactly in ShouldEscape()?

Solution

ad 1. Constructor creates an escape definition array (length 16 bytes = 128 bits). One bit per character. Position of the bit corresponds to the ordinal value of character and its value means whether character should be escaped or not.

ad a. This row calculates appropriate byte in escape definition array for given character.

ad b. Sets bit corresponding to character within the byte.

ad 2. Mask contains appropriate byte for given character and is used to check whether the corresponding bit is set or not.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow