Frage

I have two string variables that differ on one character for each observation. I need to get the position of that different character. I have tried to use indexnot() function but it yields false results as the characters in both strings are the same. Here is an illustrative example, and variable position is the one I am trying to get to:

+--------------+--------------+-----------+
|   String 1   |   String 2   | Position  |
+--------------+--------------+-----------+
| 000002002000 | 000000002000 |         6 |
| 000002102000 | 000002002000 |         7 |
| 000002112000 | 000002102000 |         8 |
| 000002112020 | 000002112000 |        11 |
| 000002112120 | 000002112020 |        10 |
+--------------+--------------+-----------+
War es hilfreich?

Lösung

gen Position = . 

quietly forval j = 1/12 { 
    replace Position = `j' if substr(String1, `j', 1) != substr(String2, `j', 1) & missing(Position)
} 

Commentary is perhaps redundant here, but will harm no-one.

In the absence of a built-in function to do this, you need to write some code using existing commands and functions. Initialise a Position to missing (zero would do fine as an alternative). Then loop over the characters, here 1 to 12 because the example shows 12 character strings. We record the position of the first difference in characters. Note how the condition missing(Position) (Position == . if you like) restricts changes to the first difference met.

Stata loops automatically over all the observations here, so the only loop needed is over string positions.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top