{"id":46713,"date":"2022-07-22T00:00:00","date_gmt":"2022-07-22T07:00:00","guid":{"rendered":"https:\/\/www.griddb.net\/blog\/detecting-fake-news-using-python-and-griddb\/"},"modified":"2025-11-13T12:56:07","modified_gmt":"2025-11-13T20:56:07","slug":"detecting-fake-news-using-python-and-griddb","status":"publish","type":"post","link":"https:\/\/www.griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/","title":{"rendered":"Detecting Fake News using Python and GridDB"},"content":{"rendered":"<p>Whenever we come across such articles, we instinctively feel that something doesn&#8217;t feel right. There are so many posts out there that it is nearly impossible to sort out the right from the wrong.<\/p>\n<p>Fake news can be claimed in two ways: first, an argument against the facts. Secondly, the language used. The former can only be accomplished with automated query systems and substantial searches into the internet. The latter is possible through a natural language processing pipeline followed by a machine learning pipeline.<\/p>\n<p>The purpose of this article is to model the news data labeled as fake or real. Using GridDB to extract the data, followed by performing the preprocess steps and finally building the machine learning model.<\/p>\n<p>The outline of the tutorial is as follows:<\/p>\n<ol>\n<li>Dataset overview<\/li>\n<li>Importing required libraries<\/li>\n<li>Loading the dataset<\/li>\n<li>Data Cleaning and Preprocessing<\/li>\n<li>Building a Machine Learning Model<\/li>\n<li>Evaluating Model <\/li>\n<li>Conclusion<\/li>\n<\/ol>\n<h2>Prerequisites and Environment setup<\/h2>\n<p>This tutorial is carried out in Anaconda Navigator (Python version \u2013 3.8.3) on Windows Operating System. The following packages need to be installed before you continue with the tutorial \u2013<\/p>\n<ol>\n<li>\n<p>Pandas<\/p>\n<\/li>\n<li>\n<p>NumPy<\/p>\n<\/li>\n<li>\n<p>Scikit-learn<\/p>\n<\/li>\n<li>\n<p>Matplotlib<\/p>\n<\/li>\n<li>\n<p>Seaborn<\/p>\n<\/li>\n<li>\n<p>Tensorflow<\/p>\n<\/li>\n<li>\n<p>Keras<\/p>\n<\/li>\n<li>\n<p>nltk<\/p>\n<\/li>\n<li>\n<p>re<\/p>\n<\/li>\n<li>\n<p>patoolib<\/p>\n<\/li>\n<li>\n<p>urllib<\/p>\n<\/li>\n<li>\n<p>griddb_python<\/p>\n<\/li>\n<\/ol>\n<p>You can install these packages in Conda\u2019s virtual environment using <code>conda install package-name<\/code>. In case you are using Python directly via terminal\/command prompt, <code>pip install package-name<\/code> will do the work.<\/p>\n<h3>GridDB Installation<\/h3>\n<p>While loading the dataset, this tutorial will cover two methods \u2013 Using GridDB as well as Using Pandas. To access GridDB using Python, the following packages also need to be installed beforehand:<\/p>\n<ol>\n<li><a href=\"https:\/\/github.com\/griddb\/c_client\">GridDB C-client<\/a><\/li>\n<li>SWIG (Simplified Wrapper and Interface Generator)<\/li>\n<li><a href=\"https:\/\/github.com\/griddb\/python_client\">GridDB Python Client<\/a><\/li>\n<\/ol>\n<h2>1&#46; Dataset Overview<\/h2>\n<p>The dataset consists of about 40000 articles consisting around equal number of fake as well as real news Most of the news where collected from U.S newspapers and contain news about american politics, world news, news etc.<\/p>\n<p>https:\/\/www.kaggle.com\/datasets\/clmentbisaillon\/fake-and-real-news-dataset<\/p>\n<h2>2&#46; Importing Required Libraries<\/h2>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">#import griddb_python as griddb\n\nimport numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\n\nimport urllib.request\nimport patoolib\n\nimport nltk\nimport string\nfrom nltk.corpus import stopwords\nimport re\n\nimport tensorflow as tf\nfrom keras.preprocessing.text import Tokenizer\nfrom tensorflow.keras.utils import to_categorical\nfrom keras.preprocessing.sequence import pad_sequences\n\nfrom sklearn.metrics import classification_report,confusion_matrix,accuracy_score\nfrom sklearn.model_selection import train_test_split\n\nimport warnings\nwarnings.filterwarnings('ignore')\n%matplotlib inline<\/code><\/pre>\n<\/div>\n<h2>3&#46; Loading the Dataset<\/h2>\n<p>Let\u2019s proceed and load the dataset into our notebook.<\/p>\n<h3>3&#46;a Using GridDB<\/h3>\n<p>Toshiba GridDB is a highly scalable NoSQL database best suited for IoT and Big Data. The foundation of GridDB\u2019s principles is based upon offering a versatile data store that is optimized for IoT, provides high scalability, tuned for high performance, and ensures high reliability.<\/p>\n<p>To store large amounts of data, a CSV file can be cumbersome. GridDB serves as a perfect alternative as it in open-source and a highly scalable database. GridDB is a scalable, in-memory, NoSQL database which makes it easier for you to store large amounts of data. If you are new to GridDB, a tutorial on <a href=\"https:\/\/griddb.net\/en\/blog\/using-pandas-dataframes-with-griddb\/\">reading and writing to GridDB<\/a> can be useful.<\/p>\n<p>Assuming that you have already set up your database, we will now write the SQL query in python to load our dataset.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">factory = griddb.StoreFactory.get_instance()\n\n# Initialize the GridDB container (enter your database credentials)\ntry:\n    gridstore = factory.get_store(host=host_name, port=your_port, \n            cluster_name=cluster_name, username=admin, \n            password=admin)\n\n    info = griddb.ContainerInfo(\"false_news\",\n                    [[\"title\", griddb.Type.STRING],[\"text\", griddb.Type.STRING],[\"subject\", griddb.Type.STRING],\n                     [\"date\", griddb.Type.TIMESTAMP],\n                    griddb.ContainerType.COLLECTION, True)\n    cont = gridstore.put_container(info) \n    data = pd.read_csv(\"False.csv\")\n    #Add data\n    for i in range(len(data)):\n        ret = cont.put(data.iloc[i, :])\n    print(\"Data added successfully\")\n                     \ntry:\n    gridstore = factory.get_store(host=host_name, port=your_port, \n            cluster_name=cluster_name, username=admin, \n            password=admin)\n\n    info = griddb.ContainerInfo(\"true_news\",\n                    [[\"title\", griddb.Type.STRING],[\"text\", griddb.Type.STRING],[\"subject\", griddb.Type.STRING],\n                     [\"date\", griddb.Type.TIMESTAMP],\n                    griddb.ContainerType.COLLECTION, True)\n    cont = gridstore.put_container(info) \n    data = pd.read_csv(\"True.csv\")\n    #Add data\n    for i in range(len(data)):\n        ret = cont.put(data.iloc[i, :])\n    print(\"Data added successfully\")<\/code><\/pre>\n<\/div>\n<p>The read_sql_query function offered by the pandas library converts the data fetched into a panda data frame to make it easy for the user to work.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">sql_statement1 = ('SELECT * FROM false_news')\nfalse = pd.read_sql_query(sql_statement, cont)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">sql_statement2 = ('SELECT * FROM true_news')\ntrue = pd.read_sql_query(sql_statement, cont)<\/code><\/pre>\n<\/div>\n<p>Note that the <code>cont<\/code> variable has the container information where our data is stored. Replace the <code>credit_card_dataset<\/code> with the name of your container. More info can be found in this tutorial <a href=\"https:\/\/griddb.net\/en\/blog\/using-pandas-dataframes-with-griddb\/\">reading and writing to GridDB<\/a>.<\/p>\n<p>When it comes to IoT and Big Data use cases, GridDB clearly stands out among other databases in the Relational and NoSQL space. Overall, GridDB offers multiple reliability features for mission-critical applications that require high availability and data retention.<\/p>\n<h3>3&#46;b Using pandas read_csv<\/h3>\n<p>We can also use Pandas&#8217; <code>read_csv<\/code> function to load our data. Both of the above methods will lead to the same output as the data is loaded in the form of a pandas dataframe using either of the methods.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">true = pd.read_csv(\"True.csv\")\nfalse = pd.read_csv(\"Fake.csv\")<\/code><\/pre>\n<\/div>\n<h2>4&#46; Data Cleaning and Preprocessing<\/h2>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">true['label'] = 1\nfalse['label'] = 0<\/code><\/pre>\n<\/div>\n<p>Combining the two datasets into one and adding the columns text and title into one column.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">news = pd.concat([true,false]) \nnews['text'] = news['text'] + \" \" + news['title']\ndf=news.drop([\"date\",\"title\",\"subject\"],axis=1)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">sns.countplot(x=\"label\", data=news);\nplt.show()<\/code><\/pre>\n<\/div>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2022\/06\/output_25_0.png\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2022\/06\/output_25_0.png\" alt=\"\" width=\"402\" height=\"262\" class=\"aligncenter size-full wp-image-28326\" srcset=\"\/wp-content\/uploads\/2022\/06\/output_25_0.png 402w, \/wp-content\/uploads\/2022\/06\/output_25_0-300x196.png 300w, \/wp-content\/uploads\/2022\/06\/output_25_0-400x262.png 400w\" sizes=\"(max-width: 402px) 100vw, 402px\" \/><\/a><\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">df.head()<\/code><\/pre>\n<\/div>\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          text\n        <\/th>\n<th>\n          label\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          0\n        <\/th>\n<td>\n          WASHINGTON (Reuters) &#8211; The head of a conservat&#8230;\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          1\n        <\/th>\n<td>\n          WASHINGTON (Reuters) &#8211; Transgender people will&#8230;\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          2\n        <\/th>\n<td>\n          WASHINGTON (Reuters) &#8211; The special counsel inv&#8230;\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          3\n        <\/th>\n<td>\n          WASHINGTON (Reuters) &#8211; Trump campaign adviser &#8230;\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          4\n        <\/th>\n<td>\n          SEATTLE\/WASHINGTON (Reuters) &#8211; President Donal&#8230;\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>We have to convert the raw messages (sequence of characters) into vectors (sequences of numbers).before that we need to do the following: Remove punctuation, Remove numbers. Remove tags, Remove urls, Remove stepwords, Change the news to lower case and Lemmatisation<\/p>\n<p>The following 4 functions will help as to remove punctions (&lt;,.&#8221;:, etc),numbers,tags and urls<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">def rem_punctuation(text):\n  return text.translate(str.maketrans('','',string.punctuation))\n\ndef rem_numbers(text):\n  return re.sub('[0-9]+','',text)\n\n\ndef rem_urls(text):\n  return re.sub('https?:S+','',text)\n\n\ndef rem_tags(text):\n  return re.sub('&lt;.*?>',\" \",text)\n\ndf['text'].apply(rem_urls)\ndf['text'].apply(rem_punctuation)\ndf['text'].apply(rem_tags)\ndf['text'].apply(rem_numbers)<\/code><\/pre>\n<\/div>\n<pre><code>0        WASHINGTON (Reuters) - The head of a conservat...\n1        WASHINGTON (Reuters) - Transgender people will...\n2        WASHINGTON (Reuters) - The special counsel inv...\n3        WASHINGTON (Reuters) - Trump campaign adviser ...\n4        SEATTLE\/WASHINGTON (Reuters) - President Donal...\n                               ...                        \n23476    st Century Wire says As WIRE reported earlier ...\n23477    st Century Wire says It s a familiar theme. Wh...\n23478    Patrick Henningsen  st Century WireRemember wh...\n23479    st Century Wire says Al Jazeera America will g...\n23480    st Century Wire says As WIRE predicted in its ...\nName: text, Length: 44898, dtype: object\n<\/code><\/pre>\n<p>rem_stopwords() is the function for removing stopwords and for converting the words to lower case<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">stop = set(stopwords.words('english'))\n\ndef rem_stopwords(df_news):\n    \n    words = [ch for ch in df_news if ch not in stop]\n    words= \"\".join(words).split()\n    words= [words.lower() for words in df_news.split()]\n    \n    return words   \n\ndf['text'].apply(rem_stopwords)<\/code><\/pre>\n<\/div>\n<pre><code>0        [washington, (reuters), -, the, head, of, a, c...\n1        [washington, (reuters), -, transgender, people...\n2        [washington, (reuters), -, the, special, couns...\n3        [washington, (reuters), -, trump, campaign, ad...\n4        [seattle\/washington, (reuters), -, president, ...\n                               ...                        \n23476    [21st, century, wire, says, as, 21wire, report...\n23477    [21st, century, wire, says, it, s, a, familiar...\n23478    [patrick, henningsen, 21st, century, wireremem...\n23479    [21st, century, wire, says, al, jazeera, ameri...\n23480    [21st, century, wire, says, as, 21wire, predic...\nName: text, Length: 44898, dtype: object\n<\/code><\/pre>\n<p>Lemmatization performs vocabulary and morphological analysis of the word and is normally aimed at removing inflectional endings only.That convert the words to their base or root form eg in &#8220;plays&#8221; it is converted to &#8220;play&#8221; by removing &#8220;s&#8221;<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">from nltk.stem import WordNetLemmatizer\n#nltk.download('wordnet')\nlemmatizer = WordNetLemmatizer()\n\ndef lemmatize_words(text):\n  lemmas = []\n  for word in text.split():\n    lemmas.append(lemmatizer.lemmatize(word))\n  return \" \".join(lemmas)\n\ndf['text'].apply(lemmatize_words)<\/code><\/pre>\n<\/div>\n<pre><code>0        WASHINGTON (Reuters) - The head of a conservat...\n1        WASHINGTON (Reuters) - Transgender people will...\n2        WASHINGTON (Reuters) - The special counsel inv...\n3        WASHINGTON (Reuters) - Trump campaign adviser ...\n4        SEATTLE\/WASHINGTON (Reuters) - President Donal...\n                               ...                        \n23476    21st Century Wire say As 21WIRE reported earli...\n23477    21st Century Wire say It s a familiar theme. W...\n23478    Patrick Henningsen 21st Century WireRemember w...\n23479    21st Century Wire say Al Jazeera America will ...\n23480    21st Century Wire say As 21WIRE predicted in i...\nName: text, Length: 44898, dtype: object\n<\/code><\/pre>\n<p>Tokenizing &amp; Padding<\/p>\n<p>Tokenizing is the process of breaking down a text into words. Tokenization can happen on any character, however the most common way of tokenization is to do it on space character.<\/p>\n<p>Padding Naturally, some of the sentences are longer or shorter. We need to have the inputs with the same size, for this we use padding<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">x = df['text'].values\ny= df['label'].values<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">tokenizer = Tokenizer()\ntokenizer.fit_on_texts(x)\nword_to_index = tokenizer.word_index\nx = tokenizer.texts_to_sequences(x)<\/code><\/pre>\n<\/div>\n<p>Lets keep all news to 250, add padding to news with less than 250 words and truncating long ones<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">vocab_size =  len(word_to_index)\noov_tok = \"&lt;oov>\"\nmax_length = 250\nembedding_dim = 100&lt;\/oov><\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">x = pad_sequences(x, maxlen=max_length)<\/code><\/pre>\n<\/div>\n<h2>5&#46; Machine Learning Model Building<\/h2>\n<p>Vectorization Word vectorization is a methodology in NLP to map words or phrases from vocabulary to a corresponding vector of real numbers There are many method for doing vectorization including Bag of words,TFIDF or prettrained method such as Word2Vec ,Glove etc we are using GloVe learning algorithm for obtaining vector representations for words devolped by Stanford.<\/p>\n<p>GloVe method is built on an important idea, You can derive semantic relationships between words from the co-occurrence matrix. Given a corpus having V words, the co-occurrence matrix X will be a V x V matrix, where the i th row and j th column of X, X_ij denotes how many times word i has co-occurred with word j.<\/p>\n<p>The below code will download the pre-trained embeddings from the stanford website.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">urllib.request.urlretrieve('https:\/\/nlp.stanford.edu\/data\/glove.6B.zip','glove.6B.zip')<\/code><\/pre>\n<\/div>\n<pre><code>('glove.6B.zip', &lt;http.client.HTTPMessage at 0x21bf8cb57c0&gt;)\n<\/code><\/pre>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">patoolib.extract_archive('glove.6B.zip')<\/code><\/pre>\n<\/div>\n<pre><code>patool: Extracting glove.6B.zip ...\npatool: ... glove.6B.zip extracted to `glove.6B' (multiple files in root).\n\n\n'glove.6B'\n<\/code><\/pre>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">embeddings_index = {};\nwith open('glove.6B\/glove.6B.100d.txt', encoding='utf-8') as f:\n    for line in f:\n        values = line.split();\n        word = values[0];\n        coefs = np.asarray(values[1:], dtype='float32');\n        embeddings_index[word] = coefs;\n\nembeddings_matrix = np.zeros((vocab_size+1, embedding_dim));\nfor word, i in word_to_index.items():\n    embedding_vector = embeddings_index.get(word);\n    if embedding_vector is not None:\n        embeddings_matrix[i] = embedding_vector;<\/code><\/pre>\n<\/div>\n<p>After creating the embeddings dataset, we will split out dataset into train and test.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">from sklearn.model_selection import train_test_split\nX_train,X_test,y_train,y_test = train_test_split(x,y,test_size=0.20,random_state=1)<\/code><\/pre>\n<\/div>\n<p>Building and training the LSTM model.<\/p>\n<p>Things to note:<\/p>\n<p>1) We have initialized the weights as the Glove embeddings matrix.<\/p>\n<p>2) We are using 2 dropout layers with p=0.2<\/p>\n<p>3) Optimizer used is Adam with metric to optimize on as accuracy since the dataset is balanced.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">model = tf.keras.Sequential([\n    tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=max_length, weights=[embeddings_matrix], trainable=False),\n    tf.keras.layers.LSTM(64,return_sequences=True),\n    tf.keras.layers.Dropout(0.2),\n    tf.keras.layers.LSTM(32),\n    tf.keras.layers.Dropout(0.2),\n    tf.keras.layers.Dense(24, activation='relu'),\n    tf.keras.layers.Dense(1, activation='sigmoid')\n])\nmodel.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])\nmodel.summary()<\/code><\/pre>\n<\/div>\n<pre><code>Model: \"sequential\"\n_________________________________________________________________\nLayer (type)                 Output Shape              Param #   \n=================================================================\nembedding (Embedding)        (None, 250, 100)          14770900  \n_________________________________________________________________\nlstm (LSTM)                  (None, 250, 64)           42240     \n_________________________________________________________________\ndropout (Dropout)            (None, 250, 64)           0         \n_________________________________________________________________\nlstm_1 (LSTM)                (None, 32)                12416     \n_________________________________________________________________\ndropout_1 (Dropout)          (None, 32)                0         \n_________________________________________________________________\ndense (Dense)                (None, 24)                792       \n_________________________________________________________________\ndense_1 (Dense)              (None, 1)                 25        \n=================================================================\nTotal params: 14,826,373\nTrainable params: 55,473\nNon-trainable params: 14,770,900\n_________________________________________________________________\n<\/code><\/pre>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">epochs = 6\nhistory = model.fit(X_train,y_train,epochs=epochs,validation_data=(X_test,y_test),batch_size=128)<\/code><\/pre>\n<\/div>\n<pre><code>Epoch 1\/6\n281\/281 [==============================] - 163s 570ms\/step - loss: 0.1676 - accuracy: 0.9386 - val_loss: 0.0807 - val_accuracy: 0.9713\nEpoch 2\/6\n281\/281 [==============================] - 168s 599ms\/step - loss: 0.0682 - accuracy: 0.9768 - val_loss: 0.0508 - val_accuracy: 0.9817\nEpoch 3\/6\n281\/281 [==============================] - 176s 625ms\/step - loss: 0.0377 - accuracy: 0.9882 - val_loss: 0.0452 - val_accuracy: 0.9837\nEpoch 4\/6\n281\/281 [==============================] - 179s 638ms\/step - loss: 0.0249 - accuracy: 0.9922 - val_loss: 0.0234 - val_accuracy: 0.9923\nEpoch 5\/6\n281\/281 [==============================] - 193s 689ms\/step - loss: 0.0157 - accuracy: 0.9950 - val_loss: 0.0189 - val_accuracy: 0.9948\nEpoch 6\/6\n281\/281 [==============================] - 170s 605ms\/step - loss: 0.0110 - accuracy: 0.9963 - val_loss: 0.0172 - val_accuracy: 0.9948\n<\/code><\/pre>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">epochs = [i for i in range(6)]\nfig , ax = plt.subplots(1,2)\ntrain_acc = history.history['accuracy']\ntrain_loss = history.history['loss']\nval_acc = history.history['val_accuracy']\nval_loss = history.history['val_loss']\nfig.set_size_inches(20,10)\n\nax[0].plot(epochs , train_acc , 'go-' , label = 'Training Accuracy')\nax[0].plot(epochs , val_acc , 'ro-' , label = 'Testing Accuracy')\nax[0].set_title('Training & Testing Accuracy')\nax[0].legend()\nax[0].set_xlabel(\"Epochs\")\nax[0].set_ylabel(\"Accuracy\")\n\nax[1].plot(epochs , train_loss , 'go-' , label = 'Training Loss')\nax[1].plot(epochs , val_loss , 'ro-' , label = 'Testing Loss')\nax[1].set_title('Training & Testing Loss')\nax[1].legend()\nax[1].set_xlabel(\"Epochs\")\nax[1].set_ylabel(\"Loss\")\nplt.show()<\/code><\/pre>\n<\/div>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2022\/06\/output_52_0.png\"><img decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2022\/06\/output_52_0.png\" alt=\"\" width=\"1173\" height=\"604\" class=\"aligncenter size-full wp-image-28327\" srcset=\"\/wp-content\/uploads\/2022\/06\/output_52_0.png 1173w, \/wp-content\/uploads\/2022\/06\/output_52_0-300x154.png 300w, \/wp-content\/uploads\/2022\/06\/output_52_0-1024x527.png 1024w, \/wp-content\/uploads\/2022\/06\/output_52_0-768x395.png 768w, \/wp-content\/uploads\/2022\/06\/output_52_0-600x309.png 600w\" sizes=\"(max-width: 1173px) 100vw, 1173px\" \/><\/a><\/p>\n<h2>6&#46; Evaluating Model<\/h2>\n<p>Our model is performing very well with 99.48% accuracy on the test dataset.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">result = model.evaluate(X_test, y_test)\n# extract those\nloss = result[0]\naccuracy = result[1]\n\n\nprint(f\"[+] Accuracy: {accuracy*100:.2f}%\")<\/code><\/pre>\n<\/div>\n<pre><code>281\/281 [==============================] - 19s 69ms\/step - loss: 0.0172 - accuracy: 0.9948\n[+] Accuracy: 99.48%\n<\/code><\/pre>\n<p>We will also create a confusion matrix to analyise precision and recall of our model on the test dataset. With this, we can gain greater insights on the false positives and false negatives of our model evaluations.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">pred = model.predict_classes(X_test)\ncm = confusion_matrix(y_test,pred)\ncm = pd.DataFrame(cm , index = ['Fake','Real'] , columns = ['Fake','Real'])<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">plt.figure(figsize = (10,10))\nsns.heatmap(cm,cmap= \"Accent\", linecolor = 'black' , linewidth = 1 , annot = True, fmt='' , xticklabels = ['Fake','Real'] , yticklabels = ['Fake','Real'])\nplt.xlabel(\"Predicted\")\nplt.ylabel(\"Actual\")<\/code><\/pre>\n<\/div>\n<pre><code>Text(69.0, 0.5, 'Actual')\n<\/code><\/pre>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2022\/06\/output_58_1.png\"><img decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2022\/06\/output_58_1.png\" alt=\"\" width=\"579\" height=\"589\" class=\"aligncenter size-full wp-image-28328\" srcset=\"\/wp-content\/uploads\/2022\/06\/output_58_1.png 579w, \/wp-content\/uploads\/2022\/06\/output_58_1-295x300.png 295w\" sizes=\"(max-width: 579px) 100vw, 579px\" \/><\/a><\/p>\n<h2>7&#46; Conclusion<\/h2>\n<p>In this tutorial we built a very accurate fake news identifier model using NLP techniques and GridDB. We examined two ways to import our data, ussing (1) GridDB and (2) Pandas. For large datasets, GridDB provides an excellent alternative to import data in your notebook as it is open-source and highly scalable. <a href=\"https:\/\/griddb.net\/en\/downloads\/\">Download GridDB<\/a> today!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Whenever we come across such articles, we instinctively feel that something doesn&#8217;t feel right. There are so many posts out there that it is nearly impossible to sort out the right from the wrong. Fake news can be claimed in two ways: first, an argument against the facts. Secondly, the language used. The former can [&hellip;]<\/p>\n","protected":false},"author":41,"featured_media":28596,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[121],"tags":[],"class_list":["post-46713","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.1.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Detecting Fake News using Python and GridDB | GridDB: Open Source Time Series Database for IoT<\/title>\n<meta name=\"description\" content=\"Whenever we come across such articles, we instinctively feel that something doesn&#039;t feel right. There are so many posts out there that it is nearly\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Detecting Fake News using Python and GridDB | GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"og:description\" content=\"Whenever we come across such articles, we instinctively feel that something doesn&#039;t feel right. There are so many posts out there that it is nearly\" \/>\n<meta property=\"og:url\" content=\"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/\" \/>\n<meta property=\"og:site_name\" content=\"GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/griddbcommunity\/\" \/>\n<meta property=\"article:published_time\" content=\"2022-07-22T07:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-11-13T20:56:07+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.griddb.net\/wp-content\/uploads\/2022\/06\/truth-word-newspaper_1280x848.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1280\" \/>\n\t<meta property=\"og:image:height\" content=\"848\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"griddb-admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:site\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"griddb-admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"12 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/\"},\"author\":{\"name\":\"griddb-admin\",\"@id\":\"https:\/\/www.griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233\"},\"headline\":\"Detecting Fake News using Python and GridDB\",\"datePublished\":\"2022-07-22T07:00:00+00:00\",\"dateModified\":\"2025-11-13T20:56:07+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/\"},\"wordCount\":1145,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.griddb.net\/en\/#organization\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/#primaryimage\"},\"thumbnailUrl\":\"\/wp-content\/uploads\/2022\/06\/truth-word-newspaper_1280x848.jpg\",\"articleSection\":[\"Blog\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/\",\"url\":\"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/\",\"name\":\"Detecting Fake News using Python and GridDB | GridDB: Open Source Time Series Database for IoT\",\"isPartOf\":{\"@id\":\"https:\/\/www.griddb.net\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/#primaryimage\"},\"thumbnailUrl\":\"\/wp-content\/uploads\/2022\/06\/truth-word-newspaper_1280x848.jpg\",\"datePublished\":\"2022-07-22T07:00:00+00:00\",\"dateModified\":\"2025-11-13T20:56:07+00:00\",\"description\":\"Whenever we come across such articles, we instinctively feel that something doesn't feel right. There are so many posts out there that it is nearly\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/#primaryimage\",\"url\":\"\/wp-content\/uploads\/2022\/06\/truth-word-newspaper_1280x848.jpg\",\"contentUrl\":\"\/wp-content\/uploads\/2022\/06\/truth-word-newspaper_1280x848.jpg\",\"width\":1280,\"height\":848},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.griddb.net\/en\/#website\",\"url\":\"https:\/\/www.griddb.net\/en\/\",\"name\":\"GridDB: Open Source Time Series Database for IoT\",\"description\":\"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL\",\"publisher\":{\"@id\":\"https:\/\/www.griddb.net\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.griddb.net\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.griddb.net\/en\/#organization\",\"name\":\"Fixstars\",\"url\":\"https:\/\/www.griddb.net\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.griddb.net\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"contentUrl\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"width\":200,\"height\":83,\"caption\":\"Fixstars\"},\"image\":{\"@id\":\"https:\/\/www.griddb.net\/en\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/griddbcommunity\/\",\"https:\/\/x.com\/GridDBCommunity\",\"https:\/\/www.linkedin.com\/company\/griddb-by-toshiba\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233\",\"name\":\"griddb-admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.griddb.net\/en\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g\",\"caption\":\"griddb-admin\"},\"url\":\"https:\/\/www.griddb.net\/en\/author\/griddb-admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Detecting Fake News using Python and GridDB | GridDB: Open Source Time Series Database for IoT","description":"Whenever we come across such articles, we instinctively feel that something doesn't feel right. There are so many posts out there that it is nearly","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/","og_locale":"en_US","og_type":"article","og_title":"Detecting Fake News using Python and GridDB | GridDB: Open Source Time Series Database for IoT","og_description":"Whenever we come across such articles, we instinctively feel that something doesn't feel right. There are so many posts out there that it is nearly","og_url":"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/","og_site_name":"GridDB: Open Source Time Series Database for IoT","article_publisher":"https:\/\/www.facebook.com\/griddbcommunity\/","article_published_time":"2022-07-22T07:00:00+00:00","article_modified_time":"2025-11-13T20:56:07+00:00","og_image":[{"width":1280,"height":848,"url":"https:\/\/www.griddb.net\/wp-content\/uploads\/2022\/06\/truth-word-newspaper_1280x848.jpg","type":"image\/jpeg"}],"author":"griddb-admin","twitter_card":"summary_large_image","twitter_creator":"@GridDBCommunity","twitter_site":"@GridDBCommunity","twitter_misc":{"Written by":"griddb-admin","Est. reading time":"12 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/#article","isPartOf":{"@id":"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/"},"author":{"name":"griddb-admin","@id":"https:\/\/www.griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233"},"headline":"Detecting Fake News using Python and GridDB","datePublished":"2022-07-22T07:00:00+00:00","dateModified":"2025-11-13T20:56:07+00:00","mainEntityOfPage":{"@id":"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/"},"wordCount":1145,"commentCount":0,"publisher":{"@id":"https:\/\/www.griddb.net\/en\/#organization"},"image":{"@id":"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2022\/06\/truth-word-newspaper_1280x848.jpg","articleSection":["Blog"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/","url":"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/","name":"Detecting Fake News using Python and GridDB | GridDB: Open Source Time Series Database for IoT","isPartOf":{"@id":"https:\/\/www.griddb.net\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/#primaryimage"},"image":{"@id":"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2022\/06\/truth-word-newspaper_1280x848.jpg","datePublished":"2022-07-22T07:00:00+00:00","dateModified":"2025-11-13T20:56:07+00:00","description":"Whenever we come across such articles, we instinctively feel that something doesn't feel right. There are so many posts out there that it is nearly","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/blog\/detecting-fake-news-using-python-and-griddb\/#primaryimage","url":"\/wp-content\/uploads\/2022\/06\/truth-word-newspaper_1280x848.jpg","contentUrl":"\/wp-content\/uploads\/2022\/06\/truth-word-newspaper_1280x848.jpg","width":1280,"height":848},{"@type":"WebSite","@id":"https:\/\/www.griddb.net\/en\/#website","url":"https:\/\/www.griddb.net\/en\/","name":"GridDB: Open Source Time Series Database for IoT","description":"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL","publisher":{"@id":"https:\/\/www.griddb.net\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.griddb.net\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.griddb.net\/en\/#organization","name":"Fixstars","url":"https:\/\/www.griddb.net\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.griddb.net\/en\/#\/schema\/logo\/image\/","url":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","contentUrl":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","width":200,"height":83,"caption":"Fixstars"},"image":{"@id":"https:\/\/www.griddb.net\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/griddbcommunity\/","https:\/\/x.com\/GridDBCommunity","https:\/\/www.linkedin.com\/company\/griddb-by-toshiba"]},{"@type":"Person","@id":"https:\/\/www.griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233","name":"griddb-admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.griddb.net\/en\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g","caption":"griddb-admin"},"url":"https:\/\/www.griddb.net\/en\/author\/griddb-admin\/"}]}},"_links":{"self":[{"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/posts\/46713","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/users\/41"}],"replies":[{"embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/comments?post=46713"}],"version-history":[{"count":1,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/posts\/46713\/revisions"}],"predecessor-version":[{"id":51385,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/posts\/46713\/revisions\/51385"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/media\/28596"}],"wp:attachment":[{"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/media?parent=46713"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/categories?post=46713"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/tags?post=46713"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}