Modelli per rimodellare in R

https://stackoverflow.com/questions/1876085

r
reshape

18-09-2019
|

Domanda

Ho un dataframe che voglio rimodellare; il mio codice Reshape:

matchedlong <- reshape(matched, direction = 'long',
                       varying = c(29:33, 36:3943),
                       v.names = c("Math34", "TFCIn"),
                       times = 2006:2009, idvar = "schoolnum")

in colonne matched 36 a 39 sono logici (TRUE FALSE), ma in matchedlong si sono trasformati in numeri in qualche modo .... No modello chiaro ai numeri.

che cosa sta causando questo?

Dati di esempio:

example.data <- structure(list(Grade_Range_2008 = structure(c(14L, 14L, 40L,
40L, 36L, 13L), .Label = c("3-5, UE", "4-5, UE", "4-8, UE, US",
"5-10, UE, US", "5-8, 10, UE, US", "5-8, UE, US", "5-9, UE, US",
"6-11, US", "6-12, UE, US", "6-7, UE, US", "6-8, 10, UE, US",
"6-8, UE", "6-8, UE, US", "6-9, UE, US", "6, UE", "7-10, US",
"7-8, US", "8-Jun", "8-May", "K-3", "K-3, UE", "K-4, UE", "K-5",
"K-5, UE", "K-6, UE", "K-8", "K-8, UE", "K-8, UE, US", "K, 2-5, UE",
"N/A", "PK-3, UE", "PK-4, UE", "PK-5, 10, UE", "PK-5, 7-9, UE, US",
"PK-5, 8, UE", "PK-5, UE", "PK-6, 10, UE", "PK-6, UE", "PK-8, UE",
"PK-8, UE, US"), class = "factor"), X__of_Yrs_in_school = c(0L,
0L, 0L, 0L, 0L, 0L), Total_Enrollment_2008 = c(348L, 444L, 636L,
495L, 319L, 410L), Free_Lunch_pct_2008 = c(75L, 89L, 94L, 89L,
89L, 91L), Reduced_Lunch_pct_2008 = c(6L, 6L, 3L, 4L, 5L, 4L),
    Stability_pct_2008 = c(89L, 93L, 100L, 98L, 92L, 81L),
Limited_Eng__Prof__pct_2008 = c(8L,
    20L, 8L, 10L, 19L, 19L), Am__Ind_pct_2008 = c(1L, 2L, 0L,
    2L, 0L, 2L), Black_pct_2008 = c(41L, 39L, 28L, 33L, 32L,
    38L), Hispanic_pct_2008 = c(55L, 59L, 70L, 61L, 65L, 57L),
    Asian_pct_2008 = c(2L, 1L, 0L, 2L, 1L, 1L), White_pct_2008 = c(2L,
    0L, 1L, 2L, 1L, 2L), Multi_pct_2008 = c(0L, 0L, 0L, 0L, 0L,
    0L), w_o_Valid_Cert__N_2008 = c(4L, 0L, 1L, 0L, 1L, 1L),
    w_o_Valid_Cert__pct_2008 = c(11L, 0L, 2L, 0L, 3L, 5L),
Teaching_Out_of_Certification_N_ = c(7L,
    7L, 2L, 13L, 3L, 4L), Teaching_Out_of_Certification_pc = c(20L,
    15L, 4L, 25L, 9L, 18L), X_3_yrs__Exp_N_2008 = c(12L, 13L,
    5L, 12L, 5L, 5L), X_3_yrs__Exp_pct_2008 = c(34L, 28L, 11L,
    24L, 15L, 23L), Masters_Plus_N_2008 = c(6L, 11L, 15L, 10L,
    16L, 8L), Masters_Plus___2008 = c(17L, 23L, 32L, 20L, 47L,
    36L), Core_Classes_N_2008 = c(78L, 142L, 49L, 91L, 22L, 49L
    ), Core_Not_Taught_by_HQ_Teachers_p = c(23L, 6L, 2L, 24L,
    9L, 20L), Number_of_Classes_N_2008 = c(93L, 193L, 56L, 119L,
    33L, 68L), Clases_Not_taught_by_App__Cert__ = c(18L, 18L,
    2L, 37L, 3L, 13L), Clases_Not_taught_by_App__Cert_0 = c(19L,
    9L, 4L, 31L, 9L, 19L), Turnover_Rate_of_Teachers_with__ = c(31L,
    56L, 20L, 32L, 0L, 50L), Turnover_Rate_all_Teachers_pct_2 = c(42L,
    29L, 17L, 30L, 14L, 49L), Math_Level_3_4_pct_2006 = c(5.1,
    16.4, 58.2, 34.4, 48.9, 12.4), Math_Level_3_4_pct_2007 = c(15.2,
    22.1, 65.7, 29.9, 70.5, 22.6), Math_Level_3_4_pct_2008 = c(29.9,
    43.2, 69.8, 41.2, 78.9, 38.5), Math_Level_3_4_pct_2009 = c(50.7,
    49.7, 80.7, 47.1, 83.9, 51.6), Att__pct_2005 = c(0.83, 0.86,
    0.89, 0.9, 0.89, 0.87), Susp__pct_2005 = c(6L, 15L, 1L, 4L,
    0L, 3L), schoolnum = c(4013, 4045, 4096, 4101, 4102, 4117
    ), In_2006 = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
    In_2007 = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), In_2008 = c(FALSE,
    FALSE, FALSE, FALSE, FALSE, FALSE), In_2009 = c(FALSE, FALSE,
    FALSE, FALSE, FALSE, FALSE), weights = c(1, 1, 1, 1, 1, 1
    )), .Names = c("Grade_Range_2008", "X__of_Yrs_in_school",
"Total_Enrollment_2008", "Free_Lunch_pct_2008", "Reduced_Lunch_pct_2008",
"Stability_pct_2008", "Limited_Eng__Prof__pct_2008", "Am__Ind_pct_2008",
"Black_pct_2008", "Hispanic_pct_2008", "Asian_pct_2008", "White_pct_2008",
"Multi_pct_2008", "w_o_Valid_Cert__N_2008", "w_o_Valid_Cert__pct_2008",
"Teaching_Out_of_Certification_N_", "Teaching_Out_of_Certification_pc",
"X_3_yrs__Exp_N_2008", "X_3_yrs__Exp_pct_2008", "Masters_Plus_N_2008",
"Masters_Plus___2008", "Core_Classes_N_2008",
"Core_Not_Taught_by_HQ_Teachers_p",
"Number_of_Classes_N_2008", "Clases_Not_taught_by_App__Cert__",
"Clases_Not_taught_by_App__Cert_0", "Turnover_Rate_of_Teachers_with__",
"Turnover_Rate_all_Teachers_pct_2", "Math_Level_3_4_pct_2006",
"Math_Level_3_4_pct_2007", "Math_Level_3_4_pct_2008",
"Math_Level_3_4_pct_2009",
"Att__pct_2005", "Susp__pct_2005", "schoolnum", "In_2006", "In_2007",
"In_2008", "In_2009", "weights"), row.names = c(1L, 4L, 7L, 8L,
11L, 12L), class = "data.frame")

Soluzione

Una colonna deve essere tutti dello stesso tipo di dati ; Non si possono mescolare logico e numerico.

Non sei sicuro di come si dovrebbe anche fare analisi "lungo" su più tipi di dati diversi, perché di solito quelle sono le stesse variabili con diversi raggruppamenti. Se è necessario, provare a convertire i vostri valori logici per primo numerica (con as.numeric).

Mentre non si sta usando il pacchetto reshape, Hadley ha fatto questo punto nella sua discussione della funzione melt(), che sta eseguendo lo stesso compito (vedi questo documento, per esempio ):

Nell'implementazione corrente [di fusione], c'è solo un presupposto che fondono fa: tutti i valori misurati devono essere dello stesso tipo, ad esempio numerico, fattore, data. Dobbiamo questa ipotesi poiché i dati fuso viene memorizzato in una trama di dati R, e colonna valore può essere solo tipo . Il più delle volte questo non è un problema perché ci sono pochi casi in cui ha senso di combinare diversi tipi di variabili nell'output cast.

Modifica:

Credo che si può tentare di fare due cose contemporaneamente. E 'questo quello che vuoi?

a <- reshape(example.data[,-c(36:39)], direction = 'long', varying = c(29:32), v.names = c("Math34"), times = 2006:2009, idvar = "schoolnum")
b <- reshape(example.data[,-c(29:32)], direction = 'long', varying = c(36:39)-4, v.names = c("TFCIn"), times = 2006:2009, idvar = "schoolnum")
c <- merge(a,b)

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow