Method to assess text credibility

https://datascience.stackexchange.com/questions/66201

20-10-2020
|

Question

I am searching for an automated method (ideally a python package) that produces a score to assess the credibility of a given text (e.g. from a webpage).

I am not searching for:

text complexity assessments (i.e. how long sentences are and how many difficult words are used) as for example flesch reading ease, smog index, flesch kincaid grade, coleman liau index, automated readability index, dale chall readability score, difficult words index, linsear write formula, or gunning fog.
text coherence (i.e. how well the next sentence fit with the previous one) as for example Text Coherence Analysis Based on Deep Neural Network

Why is complexity/coherence not the same credibility? Because many texts advertising for example homeopathy use long complex scientifically sounding and complex word loaded sentences while being nonsense in terms of trueness. Therefore I am wondering if there is any method to assess the credibility/reliability of a given piece of text/webpage information automatically?

Solution 2

The answer is extensively summarized in this recent 14-pages research paper "Veracity assessment of online data".

Main points:

"Three main veracity assessment research directions found, i.e., utilizing implicit features, employing explicit fact checking, and the appeal to authority method."
"The veracity assessment domain is still relatively immature."

EDIT: The above paper misses Credeye / Deepeye (https://gate.d5.mpi-inf.mpg.de/credeye/) which seems to be the only(?) method in this area that can be easily tested/used by other people.

OTHER TIPS

I don't think there's anything close to doing this:

It would be very hard to even define the task objectively, as different humans wouldn't agree about what is credible or not.
It would require a complex system to represent reliable background knowledge... and again people wouldn't agree what should be considered "reliable" or not.
Generally the state of the art in NLP is still far from solving tasks related to Natural Language Understanding in a satisfying way. Juding the credibility of a text requires not only a real understanding of the text but also an ability to reason at a higher level. It's not clear whether this level of AI can ever be reached.

If you find a package which pretends to achieve this task, try to apply it to its own documentation because it's not credible ;)

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange