R - Combining multiple columns together within a data frame, while keeping connected data

https://stackoverflow.com/questions/19814246

04-07-2022
|

Question

So I've looked quite a lot for an answer to this question, but I can't find an answer that satisfies my needs or my understanding of R.

First, here's some code to just give you an idea of what my data set looks like

df <- data.frame("Year" = 1991:2000, "Subdiv" = 24:28, H1 = c(31.2,34,70.2,19.8,433.7,126.34,178.39,30.4,56.9,818.3),
             H2 = c(53.9,121.5,16.9,11.9,114.6,129.9,221.1,433.4,319.2,52.6))             
> df
   Year Subdiv     H1    H2
1  1991     24  31.20  53.9
2  1992     25  34.00 121.5
3  1993     26  70.20  16.9
4  1994     27  19.80  11.9
5  1995     28 433.70 114.6
6  1996     24 126.34 129.9
7  1997     25 178.39 221.1
8  1998     26  30.40 433.4
9  1999     27  56.90 319.2
10 2000     28 818.30  52.6

So what I've got here is a data set containing abundance of herring of different ages in different areas ("Subdiv") over time. H1 stands for herring at age 1. My real data set contains more ages as well as more areas (,and additional species of fish).

What I would like to do is combine the abundance of different ages into one column while keeping the connected data (Year, Subdiv) as well as creating a new column for Age. Like so:

       Year Subdiv   Abun   Age
    1  1991     24  31.20    1
    2  1992     25  34.00    1
    3  1993     26  70.20    1
    4  1994     27  19.80    1
    5  1995     28 433.70    1 
    6  1991     24   53.9    2
    7  1992     25  121.5    2
    8  1993     26   16.9    2
    9  1994     27   11.9    2
   10  1995     28  114.6    2

Note: Yes, I removed some rows, but only to not crowd the screen

I hope this is enough of information for making it understandable what I need and for someone to help.

Since I have more species of fish, if someone would like to include a description for adding a Species column as well, that would be helpful. Here's code for the same data, just duplicated for sprat (Sn):

df <- data.frame("Year" = 1991:2000, "Subdiv" = 24:28, H1 = c(31.2,34,70.2,19.8,433.7,126.34,178.39,30.4,56.9,818.3),
                 H2 = c(53.9,121.5,16.9,11.9,114.6,129.9,221.1,433.4,319.2,52.6),
                 S1 = c(31.2,34,70.2,19.8,433.7,126.34,178.39,30.4,56.9,818.3),
                 S2 = c(53.9,121.5,16.9,11.9,114.6,129.9,221.1,433.4,319.2,52.6))

Cheers!

I don't think the tags of this question should be unrelated, but if you don't find the tags fitting for my question, go a head and change.

La solution

This is a typical reshape then supplement task so you can:

1) 'Melt' your data with reshape2

library("reshape2")
df.m<-melt(df,id.vars=c("Year","Subdiv"))

2) Then add additional columns based on the variable column that holds your previous df's column names

library("stringr")
df.m$Fish<-str_extract(df.m$variable,"[A-Z]")
df.m$Age<-str_extract(df.m$variable,"[0-9]")

I recommend you look up the reshape functions as these are very commonly required and learning them will save you lots of time in future http://www.statmethods.net/management/reshape.html

Autres conseils

I think the basic data.frame function will do exactly what you want. Try something like:

data.frame(df$Year,df$Subdiv,Abun=c(df$H1,df$H2),
  Age=rep(c(1,2),each=nrow(df)))

So I'm concatenating the values you want in the abundance column, and creating a new column that is just the ages replicated for each row. You can create a similar species column easily.

Hope that helps!

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow