If I have a string with a given journal reference formatted as in
ref="Carlson, A., Bernier, U.R., Hogsette, J.A., and Sutton, B.D. 2001. Distinctive hydrocarbons of the black dump fly, Hydrotaea aenescens (Diptera: Muscidae). Arch. Insect Biochem. Physiol. 48:167-178."
then I would like to come up with good gsub
expressions in R to extract the first author, journal and volume plus pages. For the year and author I already came up with
year=strsplit(sub('^\\D*', '',ref),". ")[[1]][[1]]
year
"2001"
author=gsub("[^a-zA-Z0-9 ]","",strsplit(ref,"\\., ")[[1]][[1]])
author
"Carlson A"
but I am having trouble finding a good expression for the journal and for the volume and pages. Anybody any thoughts perhaps? (The volume and pages should ideally be detected as the last characters of the string that either contain numbers, a full stop or a colon, and the journal should be comprised of the part that lies between the year and volume+pages after first removing the first part that lies between the year (plus full stop) and the next full stop, which should be the title)
cheers, Tom