Logistic Regression : not actual out put with predict function

https://stackoverflow.com/questions/12184371

29-06-2021
|

Question

I am new to R, when I am going to estimate a logistic model using glm() it's not predicting the response, but gives a not actual output on calling predict function like 1 for every input at my predict function.

Code:

    ex2data1R <- read.csv("/media/ex2data1R.txt")
    x <-ex2data1R$x
    y <-ex2data1R$y
    z <-ex2data1R$z

    logisticmodel <- glm(z~x+y,family=binomial(link = "logit"),data=ex2data1R)
    newdata = data.frame(x=c(10),y=(10))
    predict(logisticmodel, newdata, type="response") 

Output:
> predict(logisticmodel, newdata, type="response") 
           1 
1.181875e-11 

Data(ex2data1R.txt) :
"x","y","z"
34.62365962451697,78.0246928153624,0
30.28671076822607,43.89499752400101,0
35.84740876993872,72.90219802708364,0
60.18259938620976,86.30855209546826,1
79.0327360507101,75.3443764369103,1
45.08327747668339,56.3163717815305,0
61.10666453684766,96.51142588489624,1
75.02474556738889,46.55401354116538,1
76.09878670226257,87.42056971926803,1
84.43281996120035,43.53339331072109,1
95.86155507093572,38.22527805795094,0
75.01365838958247,30.60326323428011,0
82.30705337399482,76.48196330235604,1
69.36458875970939,97.71869196188608,1
39.53833914367223,76.03681085115882,0
53.9710521485623,89.20735013750205,1
69.07014406283025,52.74046973016765,1
67.94685547711617,46.67857410673128,0
70.66150955499435,92.92713789364831,1
76.97878372747498,47.57596364975532,1
67.37202754570876,42.83843832029179,0
89.67677575072079,65.79936592745237,1
50.534788289883,48.85581152764205,0
34.21206097786789,44.20952859866288,0
77.9240914545704,68.9723599933059,1
62.27101367004632,69.95445795447587,1
80.1901807509566,44.82162893218353,1
93.114388797442,38.80067033713209,0
61.83020602312595,50.25610789244621,0
38.78580379679423,64.99568095539578,0
61.379289447425,72.80788731317097,1
85.40451939411645,57.05198397627122,1
52.10797973193984,63.12762376881715,0
52.04540476831827,69.43286012045222,1
40.23689373545111,71.16774802184875,0
54.63510555424817,52.21388588061123,0
33.91550010906887,98.86943574220611,0
64.17698887494485,80.90806058670817,1
74.78925295941542,41.57341522824434,0
34.1836400264419,75.2377203360134,0
83.90239366249155,56.30804621605327,1
51.54772026906181,46.85629026349976,0
94.44336776917852,65.56892160559052,1
82.36875375713919,40.61825515970618,0
51.04775177128865,45.82270145776001,0
62.22267576120188,52.06099194836679,0
77.19303492601364,70.45820000180959,1
97.77159928000232,86.7278223300282,1
62.07306379667647,96.76882412413983,1
91.56497449807442,88.69629254546599,1
79.94481794066932,74.16311935043758,1
99.2725269292572,60.99903099844988,1
90.54671411399852,43.39060180650027,1
34.52451385320009,60.39634245837173,0
50.2864961189907,49.80453881323059,0
49.58667721632031,59.80895099453265,0
97.64563396007767,68.86157272420604,1
32.57720016809309,95.59854761387875,0
74.24869136721598,69.82457122657193,1
71.79646205863379,78.45356224515052,1
75.3956114656803,85.75993667331619,1
35.28611281526193,47.02051394723416,0
56.25381749711624,39.26147251058019,0
30.05882244669796,49.59297386723685,0
44.66826172480893,66.45008614558913,0
66.56089447242954,41.09209807936973,0
40.45755098375164,97.53518548909936,1
49.07256321908844,51.88321182073966,0
80.27957401466998,92.11606081344084,1
66.74671856944039,60.99139402740988,1
32.72283304060323,43.30717306430063,0
64.0393204150601,78.03168802018232,1
72.34649422579923,96.22759296761404,1
60.45788573918959,73.09499809758037,1
58.84095621726802,75.85844831279042,1
99.82785779692128,72.36925193383885,1
47.26426910848174,88.47586499559782,1
50.45815980285988,75.80985952982456,1
60.45555629271532,42.50840943572217,0
82.22666157785568,42.71987853716458,0
88.9138964166533,69.80378889835472,1
94.83450672430196,45.69430680250754,1
67.31925746917527,66.58935317747915,1
57.23870631569862,59.51428198012956,1
80.36675600171273,90.96014789746954,1
68.46852178591112,85.59430710452014,1
42.0754545384731,78.84478600148043,0
75.47770200533905,90.42453899753964,1
78.63542434898018,96.64742716885644,1
52.34800398794107,60.76950525602592,0
94.09433112516793,77.15910509073893,1
90.44855097096364,87.50879176484702,1
55.48216114069585,35.57070347228866,0
74.49269241843041,84.84513684930135,1
89.84580670720979,45.35828361091658,1
83.48916274498238,48.38028579728175,1
42.2617008099817,87.10385094025457,1
99.31500880510394,68.77540947206617,1
55.34001756003703,64.9319380069486,1
74.77589300092767,89.52981289513276,1

Let me know am I doing something wrong?

Solution

I'm not seeing any problem. Here are predictions for x,y = 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100:

newdata = data.frame(x=seq(30, 100, 5) ,y=seq(30, 100, 5))
predict(logisticmodel, newdata, type="response")
1            2            3            4            5            6 
2.423648e-06 1.861140e-05 1.429031e-04 1.096336e-03 8.357794e-03 6.078786e-02 
           7            8            9           10           11           12 
3.320041e-01 7.923883e-01 9.670066e-01 9.955766e-01 9.994218e-01 9.999247e-01 
          13           14           15 
9.999902e-01 9.999987e-01 9.999998e-01

You were predicting x=10, y=10 which is way outside the range of your x, y values (30 - 100), but the prediction was zero which fits these results. When x and y are low (30 - 55), the prediction for z is zero. when x and y are high (75 - 100), the prediction is one (or nearly one). It may be easier to interpret the results if you round them to a few decimals:

round(predict(logisticmodel, newdata, type="response") , 5)
      1       2       3       4       5       6       7       8       9      10 
0.00000 0.00002 0.00014 0.00110 0.00836 0.06079 0.33200 0.79239 0.96701 0.99558 
     11      12      13      14      15 
0.99942 0.99992 0.99999 1.00000 1.00000

Here is a simple way to predict a category and compare the results with your data:

predict <- ifelse(predict(logisticmodel, type="response")>.5, 1, 0)
xtabs(~predict+ex2data1R$z)

ex2data1R$z
predict  0  1
      0 34  5
      1  6 55

We used predict() on your original data and then created a rule that picks 1 if the probability is greater than .5 and 0 if it is not. Then we use xtabs() to compare the predictions to the data. When z is 0, we correctly predict zero 34 times and incorrectly predict one 6 times. When z is 1 we correctly predict one 55 times and incorrectly predict zero 5 times. We are correct 89% of the time (34+55)/100*100. You could explore the accuracy of prediction if you use .45 or .55 as the cutoff instead of .5.

OTHER TIPS

In my opinion all is correct, as you can read from R manual:

newdata - optionally, a data frame in which to look for variables with which to predict. If omitted, the fitted linear predictors are used.

If you have data frame with 1 record it will produce prediction only for that one.

For more details see R manual/glm/predict

or just in R console, after loading library glm put:

?glm

You can also use the following command to make the confusion matrix:

predict <- ifelse(predict(logisticmodel, type="response")>.5, 1, 0)
table(predict,ex2data1R$z)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow