Question

I'm looking for a good JavaScript RegEx to convert names to proper cases. For example:

John SMITH = John Smith

Mary O'SMITH = Mary O'Smith

E.t MCHYPHEN-SMITH = E.T McHyphen-Smith  

John Middlename SMITH = John Middlename SMITH

Well you get the idea.

Anyone come up with a comprehensive solution?

Was it helpful?

Solution

Something like this?

function fix_name(name) {
    var replacer = function (whole,prefix,word) {
        ret = [];
        if (prefix) {
            ret.push(prefix.charAt(0).toUpperCase());
            ret.push(prefix.substr(1).toLowerCase());
        }
        ret.push(word.charAt(0).toUpperCase());
        ret.push(word.substr(1).toLowerCase());
        return ret.join('');
    }
    var pattern = /\b(ma?c)?([a-z]+)/ig;
    return name.replace(pattern, replacer);
}

OTHER TIPS

Wimps!.... Here's my second attempt. Handles "John SMITH", "Mary O'SMITH" "John Middlename SMITH", "E.t MCHYPHEN-SMITH" and "JoHn-JOE MacDoNAld"

Regex fixnames = new Regex("(Ma?C)?(\w)(\w*)(\W*)");
string newName = fixnames.Replace(badName, NameFixer);


static public string NameFixer(Match match) 
{
    string mc = "";
    if (match.Groups[1].Captures.Count > 0)
    {
        if (match.Groups[1].Captures[0].Length == 3)
            mc = "Mac";
        else
            mc = "Mc";
    }

    return 
       mc
      +match.Groups[2].Captures[0].Value.ToUpper()
      +match.Groups[3].Captures[0].Value.ToLower()
      +match.Groups[4].Captures[0].Value;
}

NOTE: By the time I realized you wanted a Javascript solution instead of a .NET one, I was having too much funny to stop....

Mark Summerfield has done a comprehensive job of this with Lingua::EN::NameCase:

KEITH               Keith
LEIGH-WILLIAMS      Leigh-Williams
MCCARTHY            McCarthy
O'CALLAGHAN         O'Callaghan
ST. JOHN            St. John
VON STREIT          von Streit
VAN DYKE            van Dyke
AP LLWYD DAFYDD     ap Llwyd Dafydd
henry viii          Henry VIII
louis xiv           Louis XIV

The above is written in Perl, but it makes heavy use of regular expressions, so you should be able to glean some good techniques.

Here the relevant source:

sub nc {

    croak "Usage: nc [[\\]\$SCALAR]"
        if scalar @_ > 1 or ( ref $_[0] and ref $_[0] ne 'SCALAR' ) ;

    local( $_ ) = @_ if @_ ;
    $_ = ${$_} if ref( $_ ) ;           # Replace reference with value.

    $_ = lc ;                           # Lowercase the lot.
    s{ \b (\w)   }{\u$1}gox ;           # Uppercase first letter of every word.
    s{ (\'\w) \b }{\L$1}gox ;           # Lowercase 's.

    # Name case Mcs and Macs - taken straight from NameParse.pm incl. comments.
    # Exclude names with 1-2 letters after prefix like Mack, Macky, Mace
    # Exclude names ending in a,c,i,o, or j are typically Polish or Italian

    if ( /\bMac[A-Za-z]{2,}[^aciozj]\b/o or /\bMc/o ) {
        s/\b(Ma?c)([A-Za-z]+)/$1\u$2/go ;

        # Now correct for "Mac" exceptions
        s/\bMacEvicius/Macevicius/go ;  # Lithuanian
        s/\bMacHado/Machado/go ;        # Portuguese
        s/\bMacHar/Machar/go ;
        s/\bMacHin/Machin/go ;
        s/\bMacHlin/Machlin/go ;
        s/\bMacIas/Macias/go ;  
        s/\bMacIulis/Maciulis/go ;  
        s/\bMacKie/Mackie/go ;
        s/\bMacKle/Mackle/go ;
        s/\bMacKlin/Macklin/go ;
        s/\bMacQuarie/Macquarie/go ;
    s/\bMacOmber/Macomber/go ;
    s/\bMacIn/Macin/go ;
    s/\bMacKintosh/Mackintosh/go ;
    s/\bMacKen/Macken/go ;
    s/\bMacHen/Machen/go ;
    s/\bMacisaac/MacIsaac/go ;
    s/\bMacHiel/Machiel/go ;
    s/\bMacIol/Maciol/go ;
    s/\bMacKell/Mackell/go ;
    s/\bMacKlem/Macklem/go ;
    s/\bMacKrell/Mackrell/go ;
    s/\bMacLin/Maclin/go ;
    s/\bMacKey/Mackey/go ;
    s/\bMacKley/Mackley/go ;
    s/\bMacHell/Machell/go ;
    s/\bMacHon/Machon/go ;
    }
    s/Macmurdo/MacMurdo/go ;

    # Fixes for "son (daughter) of" etc. in various languages.
    s{ \b Al(?=\s+\w)  }{al}gox ;   # al Arabic or forename Al.
    s{ \b Ap        \b }{ap}gox ;       # ap Welsh.
    s{ \b Ben(?=\s+\w) }{ben}gox ;  # ben Hebrew or forename Ben.
    s{ \b Dell([ae])\b }{dell$1}gox ;   # della and delle Italian.
    s{ \b D([aeiu]) \b }{d$1}gox ;      # da, de, di Italian; du French.
    s{ \b De([lr])  \b }{de$1}gox ;     # del Italian; der Dutch/Flemish.
    s{ \b El        \b }{el}gox unless $SPANISH ;   # el Greek or El Spanish.
    s{ \b La        \b }{la}gox unless $SPANISH ;   # la French or La Spanish.
    s{ \b L([eo])   \b }{l$1}gox ;      # lo Italian; le French.
    s{ \b Van(?=\s+\w) }{van}gox ;  # van German or forename Van.
    s{ \b Von       \b }{von}gox ;  # von Dutch/Flemish

    # Fixes for roman numeral names, e.g. Henry VIII, up to 89, LXXXIX
    s{ \b ( (?: [Xx]{1,3} | [Xx][Ll]   | [Ll][Xx]{0,3} )?
            (?: [Ii]{1,3} | [Ii][VvXx] | [Vv][Ii]{0,3} )? ) \b }{\U$1}gox ;

    $_ ;
}

Unfortunately there are too many different name formats to do this correctly. John-Joe MacDonald is always going to be a nuisance!

Agreed it will never be perfect, but looking to get the most common cases. Which is pretty much to camel case any "word" and handle hyphens and apostrophe's I guess as spaces.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top