Pregunta

I have a text file called "test.txt" which contains data in libsvm format. Data in this file is represented as follows:

165475 0:246870 1124384:2 342593:7 1141651:1 297582:1 1186846:1 17725:1 656602:1 
463304:1 766612:1 573309:1 290046:1 748198:1 216665:1 950594:2 909004:1 29008:1      
105623:1 5018:5 806027:1 1125729:1 757846:1 1023921:2 612980:1 120767:1 51340:1 
108172:5 674420:2

where 1st term represents the label and remaining represents the feature and its weight(separated by : ).This is a very huge file(with every label having lots of features and weights).

I am using scikit with ipython notebook and want to load this data in notebook to start processing it.

Can someone tell how to do that.Thanks in advance.

¿Fue útil?

Solución

Use load_svmlight_file from sklearn.datasets.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top