A 3D Similarity Matrix for Mobile Text Misinformation Identification
Abstract
Mobile health text messages could be true, fake, or controversial.
This research does not judge their correctness by its own standards.
Instead, it tries to classify the messages as one of the five categories (true, fake, misinformative, disinformative, and neutral) based on previous known messages, and lets users to judge the correctness of the messages by themselves according to our recommendations.
The 3D similarity matrix, which is used to classify the incoming mobile messages into the five classes by comparing the incoming message to the known messages saved in the database.
Consider the following two messages:
S1 (saved): “The CDC (Disease Control Prevention Center) announces the COVID-19 vaccine
is prevention effective.”
S2 (testing): “Valid Omicron booster is developed by Pfizer Coronavirus Vaccine according to
CDC (Centers for Disease Control and Prevention).”
The 2D similarity matrix is as follows:
|
cdc |
disease |
control |
prevent |
center |
announce |
covid-19 |
vaccine |
prevent |
effect |
valid |
|
|
|
|
|
|
|
|
|
x |
omicron |
|
|
|
|
|
|
x |
|
|
|
booster |
|
|
|
|
|
|
|
x |
|
|
develop |
|
|
|
|
|
|
|
|
|
|
pfizer |
|
|
|
|
|
|
|
|
|
|
coronavirus |
|
|
|
|
|
|
x |
|
|
|
vaccine |
|
|
|
|
|
|
|
x |
|
|
cdc |
x |
|
|
|
|
|
|
|
|
|
center |
|
|
|
|
x |
|
|
|
|
|
disease |
|
x |
|
|
|
|
|
|
|
|
control |
|
|
x |
|
|
|
|
|
|
|
prevent |
|
|
|
x |
|
|
|
|
x |
|
Another dimension is the synonyms.
For example, Omicron, COVID-19, coronavirus, and delta are in another dimension.
Once you have the sparse 3D matrix, you come up with a value by analyzing the matches like
- number of keyword matching,
- number and lengths of phrase matching, and
- number and lengths of common subsequence matching.
The highest value from the incoming message and the saved message shows the category of the incoming message is the same as the one of the saved message.
Keywords
mobile computing, security, privacy, text mining, data mining, misinformation, misinformation identification, similarity measurement, 3D similarity matrix, sentence similarity, mobile data management
Conference