Question

I have the following code that I use to normalize a char array. At the end of the process, the normalized file has some of the old output leftover at the end. This is do to i reaching the end of the array before j. This makes sense but how do I remove the extra characters? I am coming from java so I apologize if I'm making mistakes that seem simple. I have the following code:

/* The normalize procedure normalizes a character array of size len 
   according to the following rules:
     1) turn all upper case letters into lower case ones
     2) turn any white-space character into a space character and, 
        shrink any n>1 consecutive whitespace characters to exactly 1 whitespace

     When the procedure returns, the character array buf contains the newly 
     normalized string and the return value is the new length of the normalized string.

     hint: you may want to use C library function isupper, isspace, tolower
     do "man isupper"
*/
int
normalize(unsigned char *buf,   /* The character array contains the string to be normalized*/
                    int len     /* the size of the original character array */)
{
    /* use a for loop to cycle through each character and the built in c funstions to analyze it */
    int i = 0;
    int j = 0;
    int k = len;

    if(isspace(buf[0])){
        i++;
        k--;
    }
    if(isspace(buf[len-1])){
        i++;
        k--;
    }
    for(i;i < len;i++){
        if(islower(buf[i])) {
            buf[j]=buf[i];
            j++;
        }
        if(isupper(buf[i])) {
            buf[j]=tolower(buf[i]);
            j++;
        }
        if(isspace(buf[i]) && !isspace(buf[j-1])) {
            buf[j]=' ';
            j++;
        }
        if(isspace(buf[i]) && isspace(buf[i+1])){
            i++;
            k--;
        }
    }

   return k;

}

Here is some sample output:

halb mwqcnfuokuqhuhy ja mdqu nzskzkdkywqsfbs zwb lyvli HALB MwQcnfuOKuQhuhy Ja mDQU nZSkZkDkYWqsfBS ZWb lyVLi

As you can see the end part is repeating. Both the new normalized data and old remaining un-normalized data is present in the result. How can I fix this?

Was it helpful?

Solution

add a null terminator

k[newLength]='\0';
return k;

OTHER TIPS

to fix like this

int normalize(unsigned char *buf, int len) {
    int i, j;

    for(j=i=0 ;i < len; ++i){
        if(isupper(buf[i])) {
            buf[j++]=tolower(buf[i]);
            continue ;
        }
        if(isspace(buf[i])){
            if(!j || j && buf[j-1] != ' ')
                buf[j++]=' ';
            continue ;
        }
        buf[j++] = buf[i];
    }
    buf[j] = '\0';

    return j;
}

or? add a null terminator

    k[newLength] = NULL;
    return k;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top