Notepad++ moving tagged text strings to excel

Question 1

I gave this another try and found an awfully more easy solution to just copy the stuff to Excel. I don't have Notepad++, but I do use PSPad occasionally if my IDE is not around. It offers pretty much the same features as Notepad++. Some things it does better and others it doesn't. The regex search is pretty good, and the search dialogue has a button that says Copy.

Find dialogue

I copied your file and used my regex from the other answer without the capture groups. We don't need them as it will copy the complete match. Remember the \b is a word boundary and not a real character that will be copied.

Copied search results

And voila, here we go. A list of names with their classification that should be easy enough to copy to Excel and split into columns there.

Question 2

Even if you manage to search/replace all those names with Notepad++, I don't know how you intend to copy them over to Excel but one by one. Since SO is mainly about programming, I'll provide a code solution. This is Perl, and if you don't know how it works or how to run it, do not despair. It's probably not your language of choice for Windows anyway. You can build this in any programming language really.

#!/usr/bin/perl
use strictures;
use Data::Dump;

my $counts;

while (my $row = <DATA>) {
  while ($row =~ m{\b(\w+)/([A-Z]+)}g) {
    $counts->{$2}->{$1}++;
  }
}

dd $counts;
__DATA__
This is the Showing forth of the Inquiry of Herodotus/PERSON of Halicarnassos/LOCATION,

Output for first paragraph:

{
  LOCATION => { Halicarnassos => 1 },
  ORGANIZATION => { Barbarians => 1, Hellenes => 1 },
  PERSON => { Herodotus => 1 },
}

Let's start with the __DATA__ section at the bottom. I've pasted your complete text file there, but omitted it here for practical reasons. Basically it just reads the file line by line in the first while loop. The second while loop applies a regular expression match to each line with the /g modifier, that lets the regex match multiple times. The pattern means:

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  /                        '/'
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    [A-Z]+                   any character of: 'A' to 'Z' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \2

The two capture groups (..) end up in the variables $1 and $2. For every word that is found, we put count a value in our data structure $counts. This is like a GROUP BY count in SQL. The first key ($2) is the type (PERSON, LOCATION...) and the second key is the actual word. The ++ operator increments by one.

When we are done, we print it using the Data::Dump module's function dd, which gives us a nice output of counts grouped by type.

Thanks for bearing with me on that little technical ex-course. If it was too technical, try the excellent javascript regex tool regex101.com, where I set it up for you. You should be able to copy/paste from there to Excel. I recommend a browser plugin that lets you copy table columns.

Question 3

Why not just extract the actual names only: [a-zA-Z]+?(?=\/PERSON)? Remove the (?=) if you want to have the /PERSON match too.

You could even go so far as to extract everything into groups using: ([a-zA-Z]+?)\/([A-Z]+). Then you could output the captured groups however you want. In any decent text editor such as SublimeText you could find [\s\S]*?([a-zA-Z]+?)\/([A-Z]+)[\s\S]*? and replace with { $2: $1 }, for example to make a nice array of JS objects.