Question

I'm working on the K&R book. I've read farther ahead than I've done exercises, mostly for lack of time. I'm catching up, and have done almost all the exercises from chapter 1, which is the tutorial.

My issue was exercise 1-18. The exercise is to:

Write a program to remove trailing blanks and tabs from line of input, and to delete entirely blank lines

My code (below) does that, and works. My problem with it is the trim method I implemented. It feels ... wrong ... somehow. Like if I saw similar code in C# in a code review, I'd probably go nuts. (C# being one of my specialties.)

Can anyone offer some advice on cleaning this up -- with the catch that said advice has to only use knowledge from Chapter 1 of K & R. (I know there are a zillion ways to clean this up using the full C library; we're just talking Chapter 1 and basic stdio.h here.) Also, when giving the advice, can you explain why it will help? (I am, after all, trying to learn! And who better to learn from than the experts here?)

#include <stdio.h>

#define MAXLINE 1000

int getline(char line[], int max);
void trim(char line[], char ret[]);

int main()
{
    char line[MAXLINE];
    char out[MAXLINE];
    int length;

    while ((length = getline(line, MAXLINE)) > 0)
    {
        trim(line, out);
        printf("%s", out);
    }

    return 0;
}

int getline(char line[], int max)
{
    int c, i;

    for (i = 0; i < max - 1 && (c = getchar()) != EOF && c != '\n'; ++i)
        line[i] = c;

    if (c == '\n')
    {
        line[i] = c;
        ++i;
    }

    line[i] = '\0'; 
    return i;
}

void trim(char line[], char ret[])
{
    int i = 0;

    while ((ret[i] = line[i]) != '\0')
        ++i;

    if (i == 1)
    {
        // Special case to remove entirely blank line
        ret[0] = '\0';
        return;
    }

    for (  ; i >= 0; --i)
    {
        if (ret[i] == ' ' || ret[i] == '\t')
            ret[i] = '\0';
        else if (ret[i] != '\0' && ret[i] != '\r' && ret[i] != '\n')
            break;
    }

    for (i = 0; i < MAXLINE; ++i)
    {
        if (ret[i] == '\n')
        {
            break;
        }
        else if (ret[i] == '\0')
        {
            ret[i] = '\n';
            ret[i + 1] = '\0';
            break;
        }
    }
}

EDIT: I appreciate all the helpful tips I'm seeing here. I would like to remind folks that I'm still a n00b with C, and specifically haven't gotten up to pointers yet. (Remember the bit about Ch.1 of K&R -- Ch.1 doesn't do pointers.) I "kinda" get some of those solutions, but they're still a touch advanced for where I'm at ...

And most of what I'm looking for is the trim method itself -- specifically the fact that I'm looping through 3 times (which feels so dirty). I feel like if I were just a touch more clever (even without the advanced knowledge of C), this could have been cleaner.

Was it helpful?

Solution

There is no reason to have two buffers, you can trim the input line in place

int trim(char line[])
{
    int len = 0;
    for (len = 0; line[len] != 0; ++len)
        ;

    while (len > 0 &&
           line[len-1] == ' ' && line[len-1] == '\t' && line[len-1] == '\n')
        line[--len] = 0;

    return len;
}

By returning the line length, you can eliminate blank lines by testing for non-zero length lines

if (trim(line) != 0)
    printf("%s\n", line);

EDIT: You can make the while loop even simpler, assuming ASCII encoding.

while (len > 0 && line[len-1] <= ' ')
    line[--len] = 0;

OTHER TIPS

If you are sticking with chapter 1, that looks pretty good to me. Here's what I would recommend from a code-review standpoint:

When checking equality in C, always put the constant first

if (1 == myvar)

That way you will never accidentally do something like this:

if (myvar = 1)

You can't get away with that in C#, but it compiles fine in C and can be a real devil to debug.

trim() is too big.

What I think you need is a strlen-ish function (go ahead and write it int stringlength(const char *s)).

Then you need a function called int scanback(const char *s, const char *matches, int start) which starts at start, goes down to z as long as the character being scanned at s id contained in matches, return the last index where a match is found.

Then you need a function called int scanfront(const char *s, const char *matches) which starts at 0 and scans forward as long as the character being scanned at s is contained in matches, returning the last index where a match is found.

Then you need a function called int charinstring(char c, const char *s) which returns non-zero if c is contained in s, 0 otherwise.

You should be able to write trim in terms of these.

Personally for while constructs:

I prefer the following:

while( (ret[i] = line[i]) )
        i++;

to:

while ((ret[i] = line[i]) != '\0')
        ++i;

They both check against != 0 but the first looks a little cleaner. If the char is anything other thah 0, then the loop body will execute else it will break out of the loop.

Also for 'for' statements, whilst being syntatically valid, I find that the following:

for (  ; i >= 0; --i)

just looks 'odd' to me and indeed is a potential nightmare solution for potential bugs. If I was reviewing this code, it would be like a glowing red warning like. Typically you want to use for loops for iterating a known number of times, otherwise cosider a while loop. (as always there are exceptions to the rule but Ive found that this generally holds true). The above for statement could become:

while (i)
{
        if (ret[i] == ' ' || ret[i] == '\t')
        {
            ret[i--] = '\0';
        }
        else if (ret[i] != '\0' && ret[i] != '\r' && ret[i] != '\n')
        {
            break;
        }
}

First of all:

int main(void)

You know the parameters to main(). They're nothing. (Or argc&argv, but I don't think that's Chapter 1 material.)

Stylewise, you might want to try K&R-style brackets. They're much easier on the vertical space:

void trim(char line[], char ret[])
{
    int i = 0;

    while ((ret[i] = line[i]) != '\0')
        ++i;

    if (i == 1) { // Special case to remove entirely blank line
        ret[0] = '\0';
        return;
    }

    for (; i>=0; --i) { //continue backwards from the end of the line
        if ((ret[i] == ' ') || (ret[i] == '\t')) //remove trailing whitespace
            ret[i] = '\0';

        else if ((ret[i] != '\0') && (ret[i] != '\r') && (ret[i] != '\n')) //...until we hit a word character
            break;
    }

    for (i=0; i<MAXLINE-1; ++i) { //-1 because we might need to add a character to the line
        if (ret[i] == '\n') //break on newline
            break;

        if (ret[i] == '\0') { //line doesn't have a \n -- add it
            ret[i] = '\n';
            ret[i+1] = '\0';
            break;
        }
    }
}

(Also added comments and fixed one bug.)

A big issue is the usage of the MAXLINE constant -- main() exclusively uses it for the line and out variables; trim(), which is only working on them doesn't need to use the constant. You should pass the size(s) as a parameter just like you did in getline().

Personally I'd put code like this:

ret[i] != '\0' && ret[i] != '\r' && ret[i] != '\n'

into a separate function (or even a define macro)

  1. trim should indeed use 1 buffer only (as @Ferruccio says).
  2. trim needs to be broken up, as @plinth says
  3. trim needs not return any value (if you want to check for empty string, test line[0] == 0)
  4. for extra C flavor, use pointers rather than indexes

-go to end of line (terminating 0; -while not at the start of line and current character is space, replace it with 0. -back off one char

char *findEndOfString(char *string) {
  while (*string) ++string;
  return string; // string is now pointing to the terminating 0
}

void trim(char *line) {
  char *end = findEndOfString(line);
   // note that we start at the first real character, not at terminating 0
  for (end = end-1; end >= line; end--) {
      if (isWhitespace(*end)) *end = 0;
      else return;
  }
}

Another example of doing the same thing. Did some minor violation by using C99-specific stuff. that will not be found in K&R. also used the assert() function which is part of the starndard library, but is probably not covered in chapter one of K&R.

#include <stdbool.h> /* needed when using bool, false and true. C99 specific. */
#include <assert.h> /* needed for calling assert() */

typedef enum {
  TAB = '\t',
  BLANK = ' '
} WhiteSpace_e;

typedef enum {
  ENDOFLINE = '\n',
  ENDOFSTRING = '\0'
} EndofLine_e;

bool isWhiteSpace(
  char character
) {
  if ( (BLANK == character) || (TAB == character ) ) {
    return true;
  } else {
    return false;
  }
}

bool isEndOfLine( 
  char character
) {
 if ( (ENDOFLINE == character) || (ENDOFSTRING == character ) ) {
    return true;
  } else {
    return false;
  }
}   

/* remove blanks and tabs (i.e. whitespace) from line-string */
void removeWhiteSpace(
  char string[]
) {
  int i;
  int indexOutput;

  /* copy all non-whitespace character in sequential order from the first to the last.
    whitespace characters are not copied */
  i = 0;
  indexOutput = 0;
  while ( false == isEndOfLine( string[i] ) ) {
    if ( false == isWhiteSpace( string[i] ) ) {
      assert ( indexOutput <= i );
      string[ indexOutput ] = string[ i ];
      indexOutput++;
    }
    i++; /* proceed to next character in the input string */
  }

  assert( isEndOfLine( string[ i ] ) );
  string[ indexOutput ] = ENDOFSTRING;

}

Here's my stab at the exercise without knowing what is in Chapter 1 or K & R. I assume pointers?

#include "stdio.h"

size_t StrLen(const char* s)
{
    // this will crash if you pass NULL
    size_t l = 0;
    const char* p = s;
    while(*p)
    {
        l++;
        ++p;
    }
    return l;
}

const char* Trim(char* s)
{
    size_t l = StrLen(s);
    if(l < 1)
        return 0;

    char* end = s + l -1;
    while(s < end && (*end == ' ' || *end == '\t'))
    {
        *end = 0;
        --end;
    }

    return s;
}

int Getline(char* out, size_t max)
{
    size_t l = 0;
    char c;
    while(c = getchar())
    {
        ++l;

        if(c == EOF) return 0;
        if(c == '\n') break;

        if(l < max-1)
        {
            out[l-1] = c;
            out[l] = 0;
        }
    }

    return l;
}

#define MAXLINE 1024

int main (int argc, char * const argv[]) 
{
    char line[MAXLINE];
    while (Getline(line, MAXLINE) > 0)
    {
        const char* trimmed = Trim(line);
        if(trimmed)
            printf("|%s|\n", trimmed);

        line[0] = 0;
    }

    return 0;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top