there was a data set on Kaggle with questions from StackOverflow where the objective was to guess the tags related to the question. That could require guessing the language of code samples (or just looking for keywords) https://www.kaggle.com/c/facebook-recruiting-iii-keyword-extraction
Other possibilities searching through Github - since all that code is free and open.
StackOverflow itself shares its own data of all user contributed posts (anonymized)