The entropy/information gain doesn't so much depend on the distribution of the classes, but on the information contained in the features that are used to characterise the instances in your data set. If, for example, you had a feature that was always 1 for the TRUE
class and always 2 for the FALSE
class, it would have the highest information gain because it allows you to separate these two classes perfectly.
If the information gain you're getting is very small, it indicates that the information contained in the features is not useful for separating your classes. In this case, you need to find more informative features.