Question

I need to extract names (including uncommon names) from blocks of text using Perl. I've looked into this module for extracting names, but it only has the top 1000 popular names and surnames in the US dating back to 1990; I need something a bit more comprehensive.

I've considered using the Social Security Index to make a database for comparison, but this seems very tedious and processing intensive. Is there a way to pull names from Perl using another method?

Example of text to parse:

LADNIER
Louis Anthony Ladnier, [Louie] age 48, of Mobile, Alabama died at home Friday, November 16, 2012.
Louie was born January 9, 1964 in Mobile, Alabama. He was the son of John E. Ladnier, Sr. and Gloria Bosarge Ladnier. He was a graduate of McGill-Toolen High School and attended University of South Alabama. He was employed up until his medical retirement as Communi-cations Supervisor with the Bayou La Batre Police Department.
He is preceded in death by his father, John. Survived by his mother, Gloria, nephews, Dominic Ladnier and Christian Rubio, whom he loved and help raise as his own sons, sisters, Marj Ladnier and Morgan Gordy [Julian], and brother Eddie Ladnier [Cindy], and nephews, Jamie, Joey, Eddie, Will, Ben and nieces, Anna and Elisabeth.
Memorial service will be held at St. Dominic's Catholic Church in Mobile on Wednesday at 1pm.
Serenity Funeral Home is in charge of arrangements.
In lieu of flowers, memorials may be sent to St. Dominic School, 4160 Burma Road Mobile, AL 36693, education fund for Christian Rubio and McGill-Toolen High School, 1501 Old Shell Road Mobile, AL 36604, education Fund for Dominic Ladnier.
The family is grateful for all the prayers and support during this time. Louie was a rock and a joy to us all.

Was it helpful?

Solution 2

There is no sure fire way to do this due to the nature of the English language. You either need lists to (fuzzy)compare with, or will have to settle for significant accuracy penalties.

OTHER TIPS

Use Stanford's NER (GPL). Demo:

http://nlp.stanford.edu:8080/ner/process

The Apache Foundation has a few projects that cover the topic of entity extraction with specific pre-trained models for English names (nameFinder). I would recommend openLNP or Stanbol. In the meantime if you have just a few queries I have an NLP I've implemented in C# in my apps section at http://www.augmentedintel.com/apps/csharpnlp/extract-names-from-text.aspx.

Best,

Don

You're trying to implement a named-entity recognition. The bad news is that it's really hard. You could try Lingua::EN::NamedEntity, however:

$ perl -MLingua::EN::NamedEntity -nE 'say $_ for map { $_->{class} eq "person" ? $_->{entity} : () } extract_entities($_)' names.txt 
Louie
Louis Anthony Ladnier
Louie
John E
Bayou La Batre Police Department
Gloria
Julian
Cindy
Eddie Ladnier
Eddie
John
Catholic Church
Christian Rubio
Dominic Ladnier
Burma Road Mobile
Louie

You can also use Calais, a Reuters webservice for natural language processing, which offers a lot better results:

calais

I think you want to Google something like:

perl part of speech tagging
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top