
“Fake news” and rumors have always been a prevailing theme in the history of mankind. Be it about politics or religion or games or entertainment or any other industry, no one has been spared by false news reports or misinformation. It is observed that with fake news there arises false hopes, misunderstandings, superstitions, riots and many more such adverse impacts to human society. With the advancement of technologies and the introduction of social media, the spread of fake news has reached its peak, leading to disharmony and unrest in most parts of the world.
Recently, with the outbreak of COVID-19, the world is in crisis and are people searching for ways to stay healthy. Many inaccurate reports are already circulating over different social media platforms giving tips on how to kill the virus. And most people, without any research, are considering those to be true and have started following them. This has created false hopes for health and, in extreme cases, sickness or loss of life.
The World Health Organization (WHO) mentions misinformation as a big concern and calls this spread of fake news an Infodemic.
The UK has recently started a joint-campaign with WHO to fund a new initiative to challenge misinformation and mistruths, which are dangerous and should be stopped at the source.
Some of examples of fake health news, myths and misinformation around the coronavirus can be found here and here.
These include:
Teradata recently organized a Global Hackathon to find solutions to tackle the COVID-19 crisis, both from a medical and non-medical perspective.
There were around 70 great ideas submitted by various teams globally and 25 of those ideas were selected for the Hackathon. Our team was among the 25 selected.
Our Idea was to build a tool which can detect the misinformation of health news related to COVID-19.
The importance of such tools can be explained by the below:
Please note, the word cloud is made of the 1200+ datasets that we collected. It may differ with more datasets.
Genuine News

Fake News
.png?origin=fd)
We also created WordClouds from Text Analysis in Vantage Appcenter showing the positive (Green) and negative (Red) sentiments involved with the datasets of COVID-19
.png?origin=fd)
Below are the Implementation Details:
Results:
Below are some of the models used, along with the accuracy score:
Vantage functions used in the model training and prediction are as follows:
Although the model is ready for predicting the authenticity of news, it has a few limitations:
There are some more topics to be covered for improving the tool. Below is some of the future work we can implement:
Below is one example of how to submit news. You can submit multiple messages in different lines
The output will show the authenticity of the News in percentage. The higher the percentage, the higher the probability that the news is fake.
Question: Above all these AI tools, models and algorithms, what is most powerful tool in reducing the spread of misinformation?
The Answer is:
WE – the Humans.
If we stand united and stop promoting and propagating misinformation, we will be in much better shape to fight the pandemics of the future. Stop the Infodemic to help fight the Pandemic!
Recently, with the outbreak of COVID-19, the world is in crisis and are people searching for ways to stay healthy. Many inaccurate reports are already circulating over different social media platforms giving tips on how to kill the virus. And most people, without any research, are considering those to be true and have started following them. This has created false hopes for health and, in extreme cases, sickness or loss of life.
The World Health Organization (WHO) mentions misinformation as a big concern and calls this spread of fake news an Infodemic.
The UK has recently started a joint-campaign with WHO to fund a new initiative to challenge misinformation and mistruths, which are dangerous and should be stopped at the source.
Some of examples of fake health news, myths and misinformation around the coronavirus can be found here and here.
These include:
- Inhaling steam kills coronavirus
- Taking hot water showers (60 degrees centigrade) cures Covid-19
- Drinking lemon water and garlic prevents coronavirus
- Drinking alcohol kills coronavirus
- Mosquito bites spread Covid-19
- Drinking bleach cures Covid-19
- Gargling saltwater prevents Covid-19
- Bathing with cow dung prevents Covid-19
Teradata recently organized a Global Hackathon to find solutions to tackle the COVID-19 crisis, both from a medical and non-medical perspective.
There were around 70 great ideas submitted by various teams globally and 25 of those ideas were selected for the Hackathon. Our team was among the 25 selected.
Our Idea was to build a tool which can detect the misinformation of health news related to COVID-19.
The importance of such tools can be explained by the below:
- Fact Checkers
- Manual, slow and involves lot of effort
- Not accessible to everyone
- Usually domain experts – doctors to debunk myths
- Fact checking not the primary job for doctors
- Automated fake health news identification
- Completely automated
- Quick response
- Ability to learn by “examples”
- Ability to process a lot information frequently
- Cost effective
- Uses techniques in Artificial Intelligence
- Uses Supervised Learning – Learning with examples
- Completely automated with no human intervention
- Provides web based easy access – no apps required
- Provides API for seamless integration into 3rd party apps and chatbots
- Can be exposed as “software as a service” to identify fake health myths
- Can save human lives
- Users can check the veracity of the health news using this tool
- Publishers can use this tool to validate before publishing
- Naïve users can verify information before forwarding the information
- Social media networks can use this tool to validate the accuracy of the claim in the post/tweet itself
- Third party app developers can utilize our web-based API for integration into their apps
- Health chat bots can also leverage our API for first level verification.
- Collected a dataset of 1200+ fake/myths and genuine health news
- Around 600 were fake, 600 were genuine.
- Used NLP pipeline to do required modelling to learn the patterns of fake health tips and myths
- Created “Fake-o-meter” - an automated fake news classifier which tells the probability of the given news being fake; achieves an accuracy of 90%
- Used Vantage for training the models
- Created a web-based API which can be easily integrated with other apps
- Integrated with a web-based user interface in Flask for easy usage
Please note, the word cloud is made of the 1200+ datasets that we collected. It may differ with more datasets.
Genuine News

Fake News
.png?origin=fd)
We also created WordClouds from Text Analysis in Vantage Appcenter showing the positive (Green) and negative (Red) sentiments involved with the datasets of COVID-19
.png?origin=fd)
Below are the Implementation Details:
- Collected dataset from social media platforms and labeled them as ‘Fake’ or ‘Genuine’ accordingly by cross verifying proper sources
- Used these “examples” to train a Machine Learning model which can predict if a new claim is Fake or Genuine based on its past observations
- The NLP workflow involves stop-words removal, punctuations, lemmatizing/stemming, etc.
- Used Vantage text analytic functions to have a clean corpus and divide the datasets into Train and Validation sets
- Feature extraction techniques like TF-IDF* were used
- TextClassification functions in Vantage were used for creating a model from these features
- Use the model on test data sets and accuracy of the classification is measured
Results:
Below are some of the models used, along with the accuracy score:
.png?origin=fd)
- NaiveBayesTextClassifierTrainer
- NaiveBayesTextClassifierPredict
- TextClassifierTrainer
- TextClassifier
- TextClassifierEvaluator
.png?origin=fd)
- Not a replacement for doctor/health care professional
- Has false positives and false negatives
- Any advice should be taken after consulting health care professionals
- Limited by the quality of training data. What goes in comes out!
There are some more topics to be covered for improving the tool. Below is some of the future work we can implement:
- Provide reference to the health knowledge base for more confidence
- Identify and crawl relevant information from the knowledge base
- Techniques for information extraction
- Automatically update the knowledge base
- Can use deep learning techniques when more data is available
.png?origin=fd)
.png?origin=fd)
.png?origin=fd)
The Answer is:
WE – the Humans.
If we stand united and stop promoting and propagating misinformation, we will be in much better shape to fight the pandemics of the future. Stop the Infodemic to help fight the Pandemic!
En savoir plus sur Teradata Vantage
Explorez VantageEn savoir plus sur Teradata Vantage
Explorez VantageRestez au courant
Abonnez-vous au blog de Teradata pour recevoir des informations hebdomadaires