“Neil Armstrong is convinced that the moon landing is fake”
The above headline was reported by few media houses throughout the world and by the time the mistake was realised, it had already reached to millions. And thanks to internet, anything published online is turned to stone instantly. The meteoric rise of internet and social media have been very beneficial to mankind but it has created a lot many scourges of our time. One of the biggest menace in today’s time is accurately believed to be Fake news. Fake news can be defined by just two words: “Deliberate Disinformation”.
Fake news has seen a phenomenal rise in the last couple of years so much so that it was named as the word of the year for 2017. A study recently revealed that the fake news spread faster than truth which lead to response of fear and disgust. It also affects politics in a major way and thus have become an prominent influencer in the democratic elections of a country. It not only affects media and politics but also the economy with event driven algorithms consuming fake news and mirroring them in their trading markets.
These factors make fake news very challenging especially in a diverse and sensitive nation like India. Lately, a lot many local newspapers have circulated fake news demeaning political parties or leaders. “Fox news declares Modi second-most corrupt leader in the world” read one while ”Indian National Congress is the 4th most corrupt political party: BBC” read another. Images are distorted and attached along with the bogus articles. Fake news often spread like wildfire and sometimes these lead to violence-like situation which when blown out of proportion can lead to riot-like situation as observed in 2013. Thus, this suffering needs to be stopped before it grows more malignant. Technology, especially artificial intelligence can be beneficial in helping us.
Fake News came into a lot of prominence after 2016 U.S. elections and thereafter researchers have taken deep interest in this topic and have aggressively searched for solutions to counter this threat.
Firstly, it is important to discuss the definition of fake news and how it can be differentiated from real news. Rubin  in his paper has discussed 3 types of fake news. Horne and Adil illustrated how obvious it is to distinguish between fake and honest articles. Thereafter George McIntire made a dataset comprising of fake and real news.
The realistic and pragmatic methods have been thoroughly discussed by Zhang, Cui, Fu and Gouza . Coming to Artificial Intelligence, Bengio and Lavelli provided NLP a shot in the arm when they illustrated about word embeddings. A special RNN model called as LSTM was then discovered by Hochreiter and Schmidhuber. GRU was then recently invented by Cho, Bengio and others.
NLP was then implemented in detecting fake news by Chopra and Jain. Bajaj also contributed by using some machine learning models to detect fake news.
The dataset for this paper was created by scraping publicly available fake news and real news. Fake news dataset was obtained by the available Kaggle dataset consisting of only fake news and thus labelled as fake news. Real news dataset was obtained from All Sides, a website committed to hosting news and opinion article and permitting users to download the full text of the article. Thousands of articles were scraped which included articles from reliable media sources like WSJ, NYT, Bloomberg, WaPo etc. They were labelled as Real News. Thus, both the dataset were collated in equal ratio and a 6000 observation dataset was created containing equal number of real and fake news. The dataset had 3 variables: The Source/Author, Headline and Article Text Body and Label. The label was binary classified with 0 representing Real News whereas 1 represented Fake News.
Fake news detection is possible through existing tools and methods which either verify the news through their source or knowledge based approaches utilising external sources for fact checking. Although, these methods are very accurate but in terms of scalability and efficiency, it is very poor and not pragmatic. So, in this paper, fake news detection problem will be approached through the knowledge of text classification. Text classification has a rich history and recently has gained popularity in solving practical problems with the help of Natural Language Processing (NLP) algorithms. Choosing only text classification will be instrumental in an in-depth analysis of different NLP algorithms and suitability regarding this problem and in the end a comparison can be done on the efficiency of different models.
The method that has been implemented has been explained through the flow chart.
The dataset has been made by utilising scraping of news article containing website.
Data Pre-processing: In this mainly Tokenisation of sentences takes place which is nothing but dividing a sentence into a list of words. Since Neural Networks are being used and as neurons are inspiration to this methodology, so the way we understand a sentence, i.e. by breaking them into words and understanding it, is being mirrored here.
Word-Embedding: NLP has become instrumental in understanding patterns among vast amount of language data. Word embeddings are a set of feature engineering techniques widely used in predictive NLP modeling, particularly in deep learning applications. Word embeddings transform sparse vector representations of words into a dense, continuous vector space, enabling you to identify similarities between words and phrases — on a large scale — based on their context. This vector representation provides convenient properties for comparing words or phrases.
Word embeddings convert words into n-dimensional vectors making sentiment and feature analysis using deep learning easier. Thus, it reduces the number of features
Classifier Training: Thereafter, the model is trained on the training dataset acquired by splitting the dataset. And then it is validated on the Testing dataset.
Currently, the three most popular word embedding are:Word2Vec, GloVe and fastText. Word2Vec is modelled on Google news and was the first downloadable model that was created by Google. GloVe is provided by StanfordNLP and is a count-based model unlike Word2Vec which is a predictive model. FastTesxt recently released by Facebook works on the principle of treating each word as character n-gram, thus generating better word embeddings for rare words and out of vocabulary words in comparison to the former models.
The machine learning models to be used in this are:
LR is a simple linear machine learning and thus will be considered as the benchmark for the other sophisticated models. This model uses the sigmoid cross-entropy loss:
J=-ylogy ̂-(1-y) log(1-y ̂ )
In this the vectors corresponding to words in each story is averaged out to produce one word embedding which is passed onto the model as input.
Extreme Gradient Boosting (xgBoost)
The extreme gradient boosting (Xgboost) algorithm is selected for the classification task. Xgboost is a supervised tree boosting algorithm which combines many weak learners to produce a strong learner. First, fake news texts are pre-processed in order to reduce the data complexity for word embedding methods. Then, word embeddings are calculated by using any of the three aforementioned models on labelled corpus. Then, an extreme gradient boosting (Xgboost) classifier is trained on the vectors of labelled news dataset for the task of binary analysis, i.e. classification of fake or real news.
This model has been recently dominating applied machine learning and Kaggle community, thus the averaged vector was taken which not only helps in normalizing every example but also ensures every word is taken into account.
Long Short-Term Memories (LSTM)
Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies. They were introduced by Hochreiter & Schmidhuber (1997). LSTMs are explicitly designed to avoid the long-term dependency problem. Each LSTM unit maintains a memory, which is subsequently used to determine the output, or the activation of the cell. Just like the earlier ones, step 1 is to map words to word embeddings. step 2 is the RNN that receives a sequence of vectors as input and considers the order of the vectors to generate prediction. From the embedding layer, the new representations will be passed to LSTM cells. These will add recurrent connections to the network so we can include information about the sequence of words in the data. Finally, the LSTM cells will go to a sigmoid output layer to predict if this text is fake or real. The output layer will just be a single unit then, with a sigmoid activation function.
Gated Recurrent Units (GRU)
The gating mechanism controls which information should be passed to the next step so a good translation can be predicted. A GRU module is like a reading/writing register. It reads part of the previous hidden state, combine with current input to construct the candidate update. Then it keeps part of the dimension the same as previous hidden state, and update the remaining part by candidate update (see the first row in the above formula).
Usefulness of study:
Jack Dorsey, Founder-CEO of Twitter, recently said that the this is a multi-variable problem and does not have a perfect solution. With social media and mainstream media under tremendous scrutiny regarding fake news, its high time to stop this danger before it blows out of proportion. More importantly it threatens the security, social fabric and democracy, the basic building blocks of a nation. The methods discussed above prove their accuracy and can be applied practically to check the realness of an article. Thus, these methods have the potential to save a nation from burning down