Multiple Linear Regression with Dichotomous Predictor Variables in R: to dummy-code or let R handle it?

Question 1

Either way is correct (assuming you do the manual coding properly), but there is a but. R supports several coding schemes (contrasts) for categorical variables. There is Dummy Coding, Deviation Coding, Helmert Coding, etc. What changes in each of these schemes is the meaning of intercept, and the interpretation of parameters. For instance, with dummy coding you compare all categories against a single base category, and the intercept is the mean for the base category (all other parameters being zero). With deviation coding, you intercept is the grand (!) mean, and your parameters are deviations from this grand mean. For example, if you are conducting country analysis, it is not always useful to compare every country against, say, France. Instead, you might want to compare each country to some mean, say, for the European Union.

This also goes for dichotomous variables. Do you want to compare men to women, or would you rather compare men to grand mean, and women to grand mean? Both are feasible, depending on your research context.

Now, when you use manual coding, you make no error. Yet you cannot quickly switch from one coding system to another, you'll have to recode everything manually again. For more complex coding systems you'll have some chance to make a mistake by doing it manually. And this may not matter much for dichotomous variables, but if you have more categories, creating dummies manually will clutter up your dataset and may result in confusion when you return to your analysis in a few months. Just a few arguments to use the automatic coding.

You can find additional information on coding systems in R here. It is a useful read and gives you more flexibility within the context of regression. Good luck!

Question 2

Just to expand a bit on @BenBolker's comment.

In your first model, R takes Sex=F as the baseline and reports that the intercept is -0.43031. If Sex=M the whole model is shifted by -0.28566 (the coefficient of Sexm). So Sexm is not the impact of males, it is the difference between the models when Sex=F and Sex=M. None of the other parameters are affected by this because you have linear model with no interactions. So when Sex=M you would have an identical model, but with the intercept being -0.43031 + (-0.28566) = -0.71597.

In your second model, Female is a numeric predictor. The intercept occurs when Female=0 (e.g., Sex=M) and , at -0.71597, is equivalent to the first model. Again, none of the other parameters is different because thie is a linear model with no interactions.

IMO the "correct" way depends on your audience. The idiomatic way to deal with categorical variables is the first - make it a factor. However I have found that with non-technical, or "less-technical" audiences the second way is much easier to explain and understand. Note of course that this applies to dichotomous variables only - if your categorical variable can take on more than two values you must use factors.