Вопрос

If I have a string with a given journal reference formatted as in

ref="Carlson, A., Bernier, U.R., Hogsette, J.A., and Sutton, B.D. 2001. Distinctive hydrocarbons of the black dump fly, Hydrotaea aenescens (Diptera: Muscidae). Arch. Insect Biochem. Physiol. 48:167-178."

then I would like to come up with good gsub expressions in R to extract the first author, journal and volume plus pages. For the year and author I already came up with

year=strsplit(sub('^\\D*', '',ref),". ")[[1]][[1]]
year
"2001"
author=gsub("[^a-zA-Z0-9 ]","",strsplit(ref,"\\., ")[[1]][[1]])
author
"Carlson A"

but I am having trouble finding a good expression for the journal and for the volume and pages. Anybody any thoughts perhaps? (The volume and pages should ideally be detected as the last characters of the string that either contain numbers, a full stop or a colon, and the journal should be comprised of the part that lies between the year and volume+pages after first removing the first part that lies between the year (plus full stop) and the next full stop, which should be the title)

cheers, Tom

Это было полезно?

Решение

No need to use gsub here , just strsplit. This should be a good start:

ll <- unlist(strsplit(ref,','))
ll[1]
[1] "Carlson"

strsplit(tail(ll,1),'[.]')
[[1]]
[1] " Hydrotaea aenescens (Diptera: Muscidae)" " Arch"                                   
[3] " Insect Biochem"                          " Physiol"                                
[5] " 48:167-178"   
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top