Question

I've got a bunch of first names in a field that carry a middle initial with a '.' at the end..

I need a regex to convert this example:

Kenneth R.

into

Kenneth

I was trying to build my own and found this useful site btw..

http://www.gskinner.com/RegExr/

but I'm new to Perl & regular expressions and could only get "...$" - which is useless when there is no middle initial at the end of the first name....


i just found another name format that needs consideration... 'R. Kelly' needs to be 'Kelly'

Was it helpful?

Solution

To remove the last "word" if it ends with dot :

my $name =~ s/\w+\.$//i;

(this supposes you don't have any space after that)

To remove any word ending with dot :

my $name =~ s/\w+\.//i;

look at the /g modifier if you want to remove them all ...

and BTW make yourself a test case list to check your solution then try with real word data, you probably will get some surprises ...

OTHER TIPS

To take care of the R. Kelly case:

s/\w\. *//g

Here's a quick test:

$ echo 'R. Kelly
Kenneth R.
R. Kemp R.
John Q. Smith' | perl -pe 's/\w\. *//g'
Kelly
Kenneth 
Kemp 
John Smith

I'd suggest that:

  1. The global option (g) is required.
  2. The case insensitive option (i) isn't.
  3. You might consider looking for upper case ([:upper:]) initials only.
  4. Multiple character "initials" should be viewed with suspicion. (So w+ is probably a mistake unless your data has relevant cases.)
  5. Read perldoc perlre for more information.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top