Here's a simple solution that satisfies your requirements. It is based on tokenizing by whitespace and reconstructing the name. It assumes the names are the only field that contains multiple tokens. It should be noted that the spacing may not be perfectly preserved and may not work correctly with embedded tabs instead of spaces:
library(stringr)
lines = readLines("team.names.with.spaces.txt");
for (line in lines[2:length(lines)]) {
toks = strsplit(str_trim(line), " +")[[1]];
ntoks = length(toks);
name = paste(toks[1:(ntoks-3)], collapse=' ');
team = toks[ntoks-2];
num1 = as.integer(toks[ntoks-1]);
num2 = as.integer(toks[ntoks]);
print(line)
print(name)
print(team)
print(num1)
print(num2)
}
I do recommend using the str_trim() unless your files are always cleanly constructed, in which case you might be able to remove the stringr dependence. The output looks like this:
[1] "Jim Smith NYY 100 200"
[1] "Jim Smith"
[1] "NYY"
[1] 100
[1] 200
[1] "Jerry Johnson Jr. PHI 100 200"
[1] "Jerry Johnson Jr."
[1] "PHI"
[1] 100
[1] 200
As an alternative, you might use str_locate() to more stably deal with multiple spaces or punctuation in the name (hyphenated name of using a comma):
library(stringr)
x="Jerry Johnson Jr. PHI 100 200"
ndx = str_locate(x," +[A-Z]{3} +[0-9]+ +[0-9]+")[1]
name = substr(x,1,ndx-1);