Where can I find large sample of computer languages for Naive Bayesian Analysis

https://stackoverflow.com/questions/16872103

naivebayes
code-analysis

30-05-2022
|

문제

I am trying to analyse online code and want to use Bayesian Classification. However I need a fair amount of pre classified code as sample data.

Maybe the twenty or so top languages?

Does anyone know of such a corpus?

올바른 솔루션이 없습니다

다른 팁

there was a data set on Kaggle with questions from StackOverflow where the objective was to guess the tags related to the question. That could require guessing the language of code samples (or just looking for keywords) https://www.kaggle.com/c/facebook-recruiting-iii-keyword-extraction

Other possibilities searching through Github - since all that code is free and open.

StackOverflow itself shares its own data of all user contributed posts (anonymized)

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow