문제

I need to create a function in C, which finds out if 2 strings are made from same words. As can be seen in current code, I loaded each string in separate array. I made it that in the array there are words, all in lower case letters with just 1 space between each word and without all non-alpha characters. I though, that I could just sort the string and call strcmp on them, but it can't be done so, because of the reason, that there can be strings such as "dog dog dog cat" and "dog cat" , these strings are from same words, so the function should return 1, but it wouldnt if just sorted and used strcmp. So i though, I could merge all duplicated words in 1 and then sort and strcmp, but there is still one problem, that when there would be words such as "dog" and "god" , these are 2 different words, but the function would still take them as same after sorting. "dog dog dog cat" "dog cat" - same words "HI HeLLO!!'" "hi,,,hello hi" - same words I would be very thankful for any help. I really don't know how to create it, I sat at it for quite some time and still can't figure it.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

int sameWords( const char * a, const char * b)
{
char * array1=NULL;
char * array2=NULL;
int length1=0, length2=0, i=0, j=0;
while(a[i])
  {
  if(i>=length1)
    {
      length1+=250;
      array1=(char*)malloc(length1*sizeof(char));
    }
  if(isspace(a[i]) && !isspace(a[i-1]))
    {
      array1[i]=a[i];
    }
  if(isalpha(a[i]))
    {
      array1[i]=tolower(a[i]);
    }
  i++;
  }
while(b[j])
  {
  if(j>=length2)
    {
      length2+=250;
      array2=(char*)malloc(length2*sizeof(char));
    }
  if(isspace(b[j]) && !isspace(b[j-1]))
    {
      array2[j]=b[j];
    }
  if(isalpha(b[j]))
    {
      array2[j]=tolower(b[j]);
    }
  j++;
  }
}

int main()
{
sameWords("This' is   string !!! ", "THIS stRing is !!  string ");
return 0;
}
도움이 되었습니까?

해결책

You have already learned two ways to go about your problem. The complicated one is to split each of the strings into words, sort them and then weed out duplicates, which is easy in a sorted array. The easier one is to split the first string into words, search for each word in the second. Then do the same the other way round: split the second and check for words in the first.

Both approaches require that you split the strings. That's also where you seem to have problems in your code. (You've got the basic idea to look at word boundaries, but you don't seem to know how to store the words.)

The basic question is: How are you going to represent the words, i.e. the substrings of a C string? There are various ways. You could use pointers into the string together with a string length or you could copy them into another buffer.

Here is a sloution that splits the string a into words and then checks whether each word can be found in b:

/*
 *      Return 1 if all words in a can be found in b, 
 *      return 0 otherwise.
 */
int split_and_check(const char *a, const char *b)
{
    int begin = -1;    /* marker for beginning of word */
    char word[80];     /* temporary buffer for current word */
    int prev = 0;      /* previously read char to detect word bounaries */
    int len;           /* current length of word */
    int i;

    i = 0;
    while (1) {
        if (isalpha(a[i])) {
            if (!isalpha(prev)) {
                begin = i;
                len = 0;
            }
            if (len < 80) word[len++] = a[i];
        } else {
            if (len > 0) {
                word[len] = '\0';       /* manually null-terminate word */

                if (strstr(b, word) == NULL) {
                    /* fail on string mismatch */
                    return 0;
                }
                len = 0;                /* reset word-length counter */
            }
        }
        if (a[i] == '\0') break;        /* check end here to catch last word */
        prev = a[i++];
    }

    return 1;
}

The current word is stored in the local char buffer word and has the length len. Note how the zero end marker '\0' is added to word manually before searching b for word: The library function strstr looks for a string in another one. Both strings must be zero-terminated.

This is only one half of the solution. You must check the strings the other way round:

int same_words(const char *a, const char *b)
{    
    if (split_and_check(a, b) == 0) return 0;
    if (split_and_check(b, a) == 0) return 0;

    return 1;
}

This is not yet the exact solution to your problem, because the string matching is done case-sensitively. I've skipped this part, because it was easier that way: strstr is case sensitive and I don't know of any variants that ignore the case.

다른 팁

You are returning nothing from your function sameWords whose return type is int.

I don't pretend to be awarded as the answer, but I would take a look at regular expressions too for this kind of things.

Does C or C++ have a standard regex library?

It would take minutes to solve it, you split the string with regex, lowercase-it, and then iterate to look after common words.

What I would do to solve this problem is create a data structure like a tree into which you can insert words. The insert function would do nothing if the word is already there, otherwise, it would convert it to lowercase and insert it in the tree. Then you could simply convert both strings to these types of trees and compare the trees.

Another way to do this is in bash. While this is probably not allowed for you assignment, if you understand how and why it works, you should be able to code something up that mimics it:

# string1 and string2 are simply strings with spaces separating words
s1="dog dog dog cat"
s2="cat dog"

# Convert to arrays
a1=( $(printf "%s\n" ${s1}  | sort | uniq ) )
a2=( $(printf "%s\n" ${s2}  | sort | uniq ) )

# Compare the result
if [ "${a1[*]}" == "${a2[*]}" ] ; then
  echo "Same"
fi
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top