I see two issues here at once. One: extract structured data. Two: present graphically. I would start with One.
I don't think the following would be an exact solution, and will not win any algorithm award, and, for 350.000 rows, may spend a few nights running. But if you would like to try this path, this may give you a few hints. (But as some mentioned, this may be a very bumpy path, or even a dead end)
Add a few columns to the table, use (class) DBI to iterate over the rows, adding separate functions to try to guess each parameter.
see e.g. PerlMonks for some efficient db updates.
#meta code alert
my $dbh= DBI->new('connect to a database');
my $sth = $dbh->prepare("SELECT ID, THETEXT FROM ATABLE");
$sth->execute();
while (my $row = $sth->fetchrow_hashref) {
my $age = guess_age($row->{TEXT});
if ($age > 0) {
...#update database
}
}
#end meta
sub guess_age{
my $text = shift,
my $age;
#look for text, any sequence of number words or - or <whitespace>\s
if ($text =~ /((?:one|two|three|...ninety|-|\s)+)/ ) {
$age = some_number_from_text_function($1)
#see if we have some prefix words in front of a number
} elsif ($text =~ /(?:age|aged)\s*(\d+)/ ) {
$age = $1;
#see if we have some postfix words after a number
} elsif ($text =~ /(\d+)\s*(?:old|of age|years)/ ) {
$age = $1;
#see if we have a comma early in the sentence,
} elsif ($text =~ /,\s*(\d+)/ ) {
#this 'if' should been part of main elsif, as it may stop here:-(
if ($-[0] <50) {#found before pos 50 in the text
$age = $1;
}
} elsif (... ) {
} else {
$age = -1; #flag : not found?
}
return $age;
}
But again, this may be a dead end...
For Town, I guess any unexpected capitalization may be something to look for /[a-z]\W([A-Z]\w+)/#ie a non-cap letter followed by a non-letter, followed by a capital + any letters. For profession I am really out of clues. Maybe do a word match against a big hash with many professions??