Frage

I built a fasttext classification model in order to do sentiment analysis for facebook comments (using pyspark 2.4.1 on windows). When I use the prediction model function to predict the class of a sentence, the result is a tuple with the form below:

[('__label__positif', '__label__négatif', '__label__neutre', 0.8947999477386475, 0.08174632489681244, 0.023483742028474808)]

but when I tried to apply it to the column "text" I did this :

from pyspark.sql.types import *
from pyspark.sql.functions import udf, col
import fasttext

schema = StructType([
    StructField("pos", StringType(), False),
    StructField("neg", StringType(), False),
    StructField("ntr", StringType(), False),
    StructField("pr_pos", DoubleType(), False),
    StructField("pr_neg", DoubleType(), False),
    StructField("pr_ntr", DoubleType(), False)
])

udf_label = udf(lambda words : predictClass(words), schema)
df = df.withColumn("classe", udf_label(col('text')))

df.select('classe').show()

I get this error: PicklingError: Could not serialize object: TypeError: can't pickle fasttext_pybind.fasttext objects

Keine korrekte Lösung

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit datascience.stackexchange
scroll top