{"id":46686,"date":"2022-01-27T00:00:00","date_gmt":"2022-01-27T08:00:00","guid":{"rendered":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/"},"modified":"2025-11-13T12:55:49","modified_gmt":"2025-11-13T20:55:49","slug":"heart-failure-prediction-using-machine-learning-python-and-griddb","status":"publish","type":"post","link":"https:\/\/www.griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/","title":{"rendered":"Heart Failure Prediction using Machine Learning, Python, and GridDB"},"content":{"rendered":"<p>In this tutorial, we will explore the Heart Failure Prediction dataset which is publicly available on Kaggle. We will use GridDB to see how can we extract the data. Later, we will perform some Exploratory Data Analysis. Finally, we will build a Machine Learning Model for making future predictions. The outline of this tutorial is as follows:<\/p>\n<ol>\n<li>Setting up your environment<\/li>\n<li>Introduction to the dataset<\/li>\n<li>Importing the necessary libraries<\/li>\n<li>Loading the Dataset<\/li>\n<li>Exploratory Data Analysis<\/li>\n<li>Handling categorical variables<\/li>\n<li>Machine Learning Model<\/li>\n<li>Model Evaluation<\/li>\n<li>Conclusion<\/li>\n<li>References<\/li>\n<\/ol>\n<p>You can read the Jupyter file here: https:\/\/github.com\/griddbnet\/Blogs\/blob\/main\/Heart%20Failure%20Prediction.ipynb<\/p>\n<h2>1&#46; Setting up your environment<\/h2>\n<p>The following tutorial is carried out in Jupyter Notebooks (Anaconda version 4.8.3) with Python version 3.8 on Windows 10 Operating system. Below mentioned packages need to be installed before the code execution:<\/p>\n<ol>\n<li><a href=\"https:\/\/pandas.pydata.org\/docs\/getting_started\/install.html\">Pandas<\/a><\/li>\n<li><a href=\"https:\/\/numpy.org\/install\/\">NumPy<\/a><\/li>\n<li><a href=\"https:\/\/seaborn.pydata.org\/installing.html\">Seaborn<\/a><\/li>\n<li><a href=\"https:\/\/pypi.org\/project\/plotly\/\">Plotly<\/a><\/li>\n<li><a href=\"https:\/\/pypi.org\/project\/scikit-learn\/\">scikit-learn<\/a><\/li>\n<li><a href=\"https:\/\/pypi.org\/project\/matplotlib\/\">Matplotlib<\/a><\/li>\n<\/ol>\n<p>The hyperlinks will direct you to the installation. Alternatively, if you are using a command line, simply type <code>pip install package-name<\/code>. Or in the case of Anaconda, <code>conda install package-name<\/code> also works.<\/p>\n<p>While loading the dataset, this tutorial will cover two methods &#8211; Using GridDB as well as Using Pandas. To access GridDB using Python, the following packages also need to be installed beforehand:<\/p>\n<ol>\n<li><a href=\"https:\/\/github.com\/griddb\/c_client\">GridDB C-client<\/a><\/li>\n<li>SWIG (Simplified Wrapper and Interface Generator)<\/li>\n<li><a href=\"https:\/\/github.com\/griddb\/python_client\">GridDB Python Client<\/a><\/li>\n<\/ol>\n<h2>2&#46; Introduction to the dataset<\/h2>\n<p>Cardiovascular disease is one of the leading causes of death worldwide. Therefore, if machine learning could help predict heart failure prediction, the contribution would be significant. The dataset used in this tutorial has been developed by Davide Chicco, Giuseppe Jurman of BMC Medical Informatics and Decision Making. It has been open-sourced and can be downloaded from <a href=\"https:\/\/www.kaggle.com\/fedesoriano\/heart-failure-prediction\">Kaggle<\/a>.<\/p>\n<p>The data contains a total of 918 instances (or rows) with 12 attributes (or columns). Out of these 12 attributes, 5 are categorical and 7 are numerical in nature. Let&#8217;s now go ahead and import the necessary libraries.<\/p>\n<h2>3&#46; Importing the necessary libraries<\/h2>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">import numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nfrom sklearn.model_selection import train_test_split\nimport plotly.graph_objects as go\nimport plotly.express as px\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.metrics import classification_report\nfrom sklearn.metrics import plot_confusion_matrix<\/code><\/pre>\n<\/div>\n<p>If the installation was successful, the above cell should execute just fine without any error messages or warnings. However, if you do encounter an error &#8211;<\/p>\n<ol>\n<li>Recheck if the installation was successful. If not, execute <code>pip install package-name<\/code> again. <\/li>\n<li>Check if the version of the packages installed is compatible with your anaconda\/system version.<\/li>\n<\/ol>\n<h2>4&#46; Loading the dataset<\/h2>\n<h3>4&#46;1 Using GridDB<\/h3>\n<p>GridDB is a scalable, in-memory, No SQL database which makes it easier for you to store large amounts of data. Using GridDB&#8217;s python-client, we can now directly load our data as a pandas dataframe into the python environment. If you are new to GridDB, a tutorial on <a href=\"https:\/\/griddb.net\/en\/blog\/using-pandas-dataframes-with-griddb\/\">reading and writing to GridDB<\/a> can be useful.<\/p>\n<p>Assuming that you have already set up your database, we will now write the SQL query in python to load our dataset<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">import griddb_python as griddb\n\nsql_statement = ('SELECT * FROM heart_failure_prediction')\nheart_dataset = pd.read_sql_query(sql_statement, cont)<\/code><\/pre>\n<\/div>\n<p>The <code>cont<\/code> variable has the container information where the data is stored.<\/p>\n<h3>4&#46;2 Using Pandas<\/h3>\n<p>Alternatively, we can use the pandas <code>read_csv()<\/code> function. Note that both the methods would result in the same output as both loads the data in the form of a pandas dataframe.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">heart_dataset = pd.read_csv('heart.csv')<\/code><\/pre>\n<\/div>\n<h2>5&#46; Exploratory Data Analysis<\/h2>\n<p>Let us first determine the shape of our dataset i.e the number of rows and number of columns<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">heart_dataset.shape<\/code><\/pre>\n<\/div>\n<pre><code>(918, 12)\n<\/code><\/pre>\n<p>We will now display the first five rows of our data using the pandas <code>head<\/code> function to get a gist of how our data looks like.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">heart_dataset.head()<\/code><\/pre>\n<\/div>\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          Age\n        <\/th>\n<th>\n          Sex\n        <\/th>\n<th>\n          ChestPainType\n        <\/th>\n<th>\n          RestingBP\n        <\/th>\n<th>\n          Cholesterol\n        <\/th>\n<th>\n          FastingBS\n        <\/th>\n<th>\n          RestingECG\n        <\/th>\n<th>\n          MaxHR\n        <\/th>\n<th>\n          ExerciseAngina\n        <\/th>\n<th>\n          Oldpeak\n        <\/th>\n<th>\n          ST_Slope\n        <\/th>\n<th>\n          HeartDisease\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          0\n        <\/th>\n<td>\n          40\n        <\/td>\n<td>\n          M\n        <\/td>\n<td>\n          ATA\n        <\/td>\n<td>\n          140\n        <\/td>\n<td>\n          289\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          Normal\n        <\/td>\n<td>\n          172\n        <\/td>\n<td>\n          N\n        <\/td>\n<td>\n          0.0\n        <\/td>\n<td>\n          Up\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          1\n        <\/th>\n<td>\n          49\n        <\/td>\n<td>\n          F\n        <\/td>\n<td>\n          NAP\n        <\/td>\n<td>\n          160\n        <\/td>\n<td>\n          180\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          Normal\n        <\/td>\n<td>\n          156\n        <\/td>\n<td>\n          N\n        <\/td>\n<td>\n          1.0\n        <\/td>\n<td>\n          Flat\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          2\n        <\/th>\n<td>\n          37\n        <\/td>\n<td>\n          M\n        <\/td>\n<td>\n          ATA\n        <\/td>\n<td>\n          130\n        <\/td>\n<td>\n          283\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          ST\n        <\/td>\n<td>\n          98\n        <\/td>\n<td>\n          N\n        <\/td>\n<td>\n          0.0\n        <\/td>\n<td>\n          Up\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          3\n        <\/th>\n<td>\n          48\n        <\/td>\n<td>\n          F\n        <\/td>\n<td>\n          ASY\n        <\/td>\n<td>\n          138\n        <\/td>\n<td>\n          214\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          Normal\n        <\/td>\n<td>\n          108\n        <\/td>\n<td>\n          Y\n        <\/td>\n<td>\n          1.5\n        <\/td>\n<td>\n          Flat\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          4\n        <\/th>\n<td>\n          54\n        <\/td>\n<td>\n          M\n        <\/td>\n<td>\n          NAP\n        <\/td>\n<td>\n          150\n        <\/td>\n<td>\n          195\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          Normal\n        <\/td>\n<td>\n          122\n        <\/td>\n<td>\n          N\n        <\/td>\n<td>\n          0.0\n        <\/td>\n<td>\n          Up\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2022\/01\/chest_pain_type.png\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2022\/01\/chest_pain_type.png\" alt=\"\" width=\"983\" height=\"525\" class=\"aligncenter size-full wp-image-28027\" srcset=\"\/wp-content\/uploads\/2022\/01\/chest_pain_type.png 983w, \/wp-content\/uploads\/2022\/01\/chest_pain_type-300x160.png 300w, \/wp-content\/uploads\/2022\/01\/chest_pain_type-768x410.png 768w, \/wp-content\/uploads\/2022\/01\/chest_pain_type-600x320.png 600w\" sizes=\"(max-width: 983px) 100vw, 983px\" \/><\/a><\/p>\n<p>Great! There is a mix of categorical and numerical values in this dataset. Note that we can not pass categorical variables directly to our machine learning model. We will have to encode them before model training. Let us go ahead and check the data types of our attributes.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">heart_dataset.dtypes<\/code><\/pre>\n<\/div>\n<pre><code>Age                 int64\nSex                object\nChestPainType      object\nRestingBP           int64\nCholesterol         int64\nFastingBS           int64\nRestingECG         object\nMaxHR               int64\nExerciseAngina     object\nOldpeak           float64\nST_Slope           object\nHeartDisease        int64\ndtype: object\n<\/code><\/pre>\n<p>5 of the attributes have a data type of <code>object<\/code> which signifies that they are categorical in nature while the rest of them are either float or int which can be directly passed during the model training.<\/p>\n<p>We will also get rid of the null values (if any) as they can produce an error during mathematical operations.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">heart_dataset.isna().sum()<\/code><\/pre>\n<\/div>\n<pre><code>Age               0\nSex               0\nChestPainType     0\nRestingBP         0\nCholesterol       0\nFastingBS         0\nRestingECG        0\nMaxHR             0\nExerciseAngina    0\nOldpeak           0\nST_Slope          0\nHeartDisease      0\ndtype: int64\n<\/code><\/pre>\n<p>Fortunately, we do not have any null values. We will now explore the categorical variables before moving on to the Machine Learning part.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">categorical_cols= heart_dataset.select_dtypes(include=['object'])\ncategorical_cols.columns<\/code><\/pre>\n<\/div>\n<pre><code>Index(['Sex', 'ChestPainType', 'RestingECG', 'ExerciseAngina', 'ST_Slope'], dtype='object')\n<\/code><\/pre>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">for cols in categorical_cols.columns:\n    print(cols,'-', len(categorical_cols[cols].unique()),'Labels')<\/code><\/pre>\n<\/div>\n<pre><code>Sex - 2 Labels\nChestPainType - 4 Labels\nRestingECG - 3 Labels\nExerciseAngina - 2 Labels\nST_Slope - 3 Labels\n<\/code><\/pre>\n<p>Since it is a single CSV file, it is better to split our dataset into <code>train<\/code> and <code>test<\/code> so that we can keep aside the test dataset for calculating the accuracy in later stages. We are using a <code>70-30<\/code> ratio for the <code>train:test<\/code>. The <code>random_state<\/code> variables ensure that these instances are picked randomly to minimize any bias or skewness.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">train, test = train_test_split(heart_dataset,test_size=0.3,random_state= 1234)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">labels = [x for x in train.ChestPainType.value_counts().index]\nvalues = train.ChestPainType.value_counts()<\/code><\/pre>\n<\/div>\n<p>The distribution of data by Chest Pain Type &#8212;<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.3)])\n\nfig.update_layout(\n    title_text=\"Distribution of data by Chest Pain Type (in %)\")\nfig.update_traces()\nfig.show()<\/code><\/pre>\n<\/div>\n<p>Distribution of data by Gender which is further divided into whether a person has a heart disease or not &#8212;<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">fig=px.histogram(heart_dataset, \n                 x=\"HeartDisease\",\n                 color=\"Sex\",\n                 hover_data=heart_dataset.columns,\n                 title=\"Distribution of Heart Diseases by Gender\",\n                 barmode=\"group\")\nfig.show()<\/code><\/pre>\n<\/div>\n<div>\n<div id=\"de4c4df5-fe66-4498-81ac-e8364250c1fc\" class=\"plotly-graph-div\" style=\"height:525px; width:100%;\">\n  <\/div>\n<p>  <script type=\"text\/javascript\">                require([\"plotly\"], function(Plotly) {                    window.PLOTLYENV=window.PLOTLYENV || {};                                    if (document.getElementById(\"de4c4df5-fe66-4498-81ac-e8364250c1fc\")) {                    Plotly.newPlot(                        \"de4c4df5-fe66-4498-81ac-e8364250c1fc\",                        [{\"alignmentgroup\":\"True\",\"bingroup\":\"x\",\"hovertemplate\":\"Sex=M<br \/>HeartDisease=%{x}<br \/>count=%{y}<extra><\/extra>\",\"legendgroup\":\"M\",\"marker\":{\"color\":\"#636efa\",\"pattern\":{\"shape\":\"\"}},\"name\":\"M\",\"offsetgroup\":\"M\",\"orientation\":\"v\",\"showlegend\":true,\"x\":[0,0,0,0,0,1,1,0,1,1,1,1,0,1,0,0,0,0,0,1,0,1,1,0,1,0,0,1,0,0,0,0,1,1,0,1,1,0,1,0,0,1,0,1,0,1,0,1,1,0,1,0,1,0,0,1,0,1,1,1,1,0,0,1,1,0,0,0,0,1,0,1,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,1,1,1,1,0,1,1,1,1,1,0,0,0,0,1,0,0,0,0,0,1,1,0,1,0,1,1,0,0,1,1,0,0,0,0,0,0,0,1,1,1,0,0,1,0,1,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,1,0,1,1,0,0,0,1,0,0,1,0,1,0,0,0,0,0,1,1,1,1,0,1,1,1,0,1,1,1,1,1,1,0,0,1,0,0,1,1,1,0,1,0,1,0,1,0,0,1,1,0,1,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,1,1,1,0,1,1,0,1,1,1,0,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,0,1,0,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,0,1,0,1,1,0,1,1,1,1,0,1,0,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,0,1,1,1,0,0,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,0,1,1,1,0,1,0,1,1,0,1,0,1,1,1,0,0,0,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,0,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,0,1,1,1,1,0,1,0,1,1,1,0,0,1,1,1,0,0,0,1,1,1,0,0,1,0,0,0,1,1,0,1,1,1,1,1,0,0,1,0,0,1,0,1,1,1,1,0,1,1,0,0,0,1,0,1,1,0,1,0,0,1,1,1,0,0,0,0,0,1,0,1,1,1,1,0,1,1,1,1,1,0,1,0,0,1,1,1,1,1,0,1,0,1,1,0,1,0,1,0,1,0,1,0,1,1,1,1,1,0,1,0,0,1,0,1,0,0,1,0,1,1,0,1,1,1,0,0,1,0,0,1,0,1,0,1,0,1,0,1,0,1,1,1,1,0,0,1,0,1,0,1,0,0,0,0,1,1,0,1,0,0,1,0,1,0,1,0,0,1,0,1,1,1,1,1,0,1,0,0,0,1,1,0,1,1,0,1,0,0,0,1,0,1,1,1,0,1,1,0,0,1,1,0,0,0,1,1,1,0,1,1,1,1,0],\"xaxis\":\"x\",\"yaxis\":\"y\",\"type\":\"histogram\"},{\"alignmentgroup\":\"True\",\"bingroup\":\"x\",\"hovertemplate\":\"Sex=F<br \/>HeartDisease=%{x}<br \/>count=%{y}<extra><\/extra>\",\"legendgroup\":\"F\",\"marker\":{\"color\":\"#EF553B\",\"pattern\":{\"shape\":\"\"}},\"name\":\"F\",\"offsetgroup\":\"F\",\"orientation\":\"v\",\"showlegend\":true,\"x\":[1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,1,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,1,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,1,1,1],\"xaxis\":\"x\",\"yaxis\":\"y\",\"type\":\"histogram\"}],                        {\"template\":{\"data\":{\"bar\":[{\"error_x\":{\"color\":\"#2a3f5f\"},\"error_y\":{\"color\":\"#2a3f5f\"},\"marker\":{\"line\":{\"color\":\"#E5ECF6\",\"width\":0.5},\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"bar\"}],\"barpolar\":[{\"marker\":{\"line\":{\"color\":\"#E5ECF6\",\"width\":0.5},\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"barpolar\"}],\"carpet\":[{\"aaxis\":{\"endlinecolor\":\"#2a3f5f\",\"gridcolor\":\"white\",\"linecolor\":\"white\",\"minorgridcolor\":\"white\",\"startlinecolor\":\"#2a3f5f\"},\"baxis\":{\"endlinecolor\":\"#2a3f5f\",\"gridcolor\":\"white\",\"linecolor\":\"white\",\"minorgridcolor\":\"white\",\"startlinecolor\":\"#2a3f5f\"},\"type\":\"carpet\"}],\"choropleth\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"type\":\"choropleth\"}],\"contour\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"contour\"}],\"contourcarpet\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"type\":\"contourcarpet\"}],\"heatmap\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"heatmap\"}],\"heatmapgl\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"heatmapgl\"}],\"histogram\":[{\"marker\":{\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"histogram\"}],\"histogram2d\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"histogram2d\"}],\"histogram2dcontour\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"histogram2dcontour\"}],\"mesh3d\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"type\":\"mesh3d\"}],\"parcoords\":[{\"line\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"parcoords\"}],\"pie\":[{\"automargin\":true,\"type\":\"pie\"}],\"scatter\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatter\"}],\"scatter3d\":[{\"line\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatter3d\"}],\"scattercarpet\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scattercarpet\"}],\"scattergeo\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scattergeo\"}],\"scattergl\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scattergl\"}],\"scattermapbox\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scattermapbox\"}],\"scatterpolar\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatterpolar\"}],\"scatterpolargl\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatterpolargl\"}],\"scatterternary\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatterternary\"}],\"surface\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"surface\"}],\"table\":[{\"cells\":{\"fill\":{\"color\":\"#EBF0F8\"},\"line\":{\"color\":\"white\"}},\"header\":{\"fill\":{\"color\":\"#C8D4E3\"},\"line\":{\"color\":\"white\"}},\"type\":\"table\"}]},\"layout\":{\"annotationdefaults\":{\"arrowcolor\":\"#2a3f5f\",\"arrowhead\":0,\"arrowwidth\":1},\"autotypenumbers\":\"strict\",\"coloraxis\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"colorscale\":{\"diverging\":[[0,\"#8e0152\"],[0.1,\"#c51b7d\"],[0.2,\"#de77ae\"],[0.3,\"#f1b6da\"],[0.4,\"#fde0ef\"],[0.5,\"#f7f7f7\"],[0.6,\"#e6f5d0\"],[0.7,\"#b8e186\"],[0.8,\"#7fbc41\"],[0.9,\"#4d9221\"],[1,\"#276419\"]],\"sequential\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"sequentialminus\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]]},\"colorway\":[\"#636efa\",\"#EF553B\",\"#00cc96\",\"#ab63fa\",\"#FFA15A\",\"#19d3f3\",\"#FF6692\",\"#B6E880\",\"#FF97FF\",\"#FECB52\"],\"font\":{\"color\":\"#2a3f5f\"},\"geo\":{\"bgcolor\":\"white\",\"lakecolor\":\"white\",\"landcolor\":\"#E5ECF6\",\"showlakes\":true,\"showland\":true,\"subunitcolor\":\"white\"},\"hoverlabel\":{\"align\":\"left\"},\"hovermode\":\"closest\",\"mapbox\":{\"style\":\"light\"},\"paper_bgcolor\":\"white\",\"plot_bgcolor\":\"#E5ECF6\",\"polar\":{\"angularaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"},\"bgcolor\":\"#E5ECF6\",\"radialaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"}},\"scene\":{\"xaxis\":{\"backgroundcolor\":\"#E5ECF6\",\"gridcolor\":\"white\",\"gridwidth\":2,\"linecolor\":\"white\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"white\"},\"yaxis\":{\"backgroundcolor\":\"#E5ECF6\",\"gridcolor\":\"white\",\"gridwidth\":2,\"linecolor\":\"white\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"white\"},\"zaxis\":{\"backgroundcolor\":\"#E5ECF6\",\"gridcolor\":\"white\",\"gridwidth\":2,\"linecolor\":\"white\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"white\"}},\"shapedefaults\":{\"line\":{\"color\":\"#2a3f5f\"}},\"ternary\":{\"aaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"},\"baxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"},\"bgcolor\":\"#E5ECF6\",\"caxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"}},\"title\":{\"x\":0.05},\"xaxis\":{\"automargin\":true,\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\",\"title\":{\"standoff\":15},\"zerolinecolor\":\"white\",\"zerolinewidth\":2},\"yaxis\":{\"automargin\":true,\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\",\"title\":{\"standoff\":15},\"zerolinecolor\":\"white\",\"zerolinewidth\":2}}},\"xaxis\":{\"anchor\":\"y\",\"domain\":[0.0,1.0],\"title\":{\"text\":\"HeartDisease\"}},\"yaxis\":{\"anchor\":\"x\",\"domain\":[0.0,1.0],\"title\":{\"text\":\"count\"}},\"legend\":{\"title\":{\"text\":\"Sex\"},\"tracegroupgap\":0},\"title\":{\"text\":\"Distribution of Heart Diseases by Gender\"},\"barmode\":\"group\"},                        {\"responsive\": true}                    ).then(function(){<\/p>\n<p>var gd = document.getElementById('de4c4df5-fe66-4498-81ac-e8364250c1fc');\nvar x = new MutationObserver(function (mutations, observer) {{\n        var display = window.getComputedStyle(gd).display;\n        if (!display || display === 'none') {{\n            console.log([gd, 'removed!']);\n            Plotly.purge(gd);\n            observer.disconnect();\n        }}\n}});<\/p>\n<p>\/\/ Listen for the removal of the full notebook cells\nvar notebookContainer = gd.closest('#notebook-container');\nif (notebookContainer) {{\n    x.observe(notebookContainer, {childList: true});\n}}<\/p>\n<p>\/\/ Listen for the clearing of the current output cell\nvar outputEl = gd.closest('.output');\nif (outputEl) {{\n    x.observe(outputEl, {childList: true});\n}}<\/p>\n<p>                        })                };                });            <\/script>\n<\/div>\n<p>Try experimenting with other categorical variables using the <code>histogram<\/code> or <code>pie<\/code> function.<\/p>\n<h2>6&#46; Handling categorical variables<\/h2>\n<p>We saw that the 2 attributes &#8211; <code>Sex<\/code> and <code>ExerciseAngina<\/code> among the 5 total categorical attributes are binary i.e. they only take two values. We can, therefore, manually encode these using 0 and 1. For the other values, we will use an encoding function.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">train['Sex'] = np.where(train['Sex'] == \"M\", 0, 1)\ntrain['ExerciseAngina'] = np.where(train['ExerciseAngina'] == \"N\", 0, 1)\ntest['Sex'] = np.where(test['Sex'] == \"M\", 0, 1)\ntest['ExerciseAngina'] = np.where(test['ExerciseAngina'] == \"N\", 0, 1)<\/code><\/pre>\n<\/div>\n<pre><code>&lt;ipython-input-14-3d5da43d58db&gt;:1: SettingWithCopyWarning:\n\n\nA value is trying to be set on a copy of a slice from a DataFrame.\nTry using .loc[row_indexer,col_indexer] = value instead\n\nSee the caveats in the documentation: https:\/\/pandas.pydata.org\/pandas-docs\/stable\/user_guide\/indexing.html#returning-a-view-versus-a-copy\n\n&lt;ipython-input-14-3d5da43d58db&gt;:2: SettingWithCopyWarning:\n\n\nA value is trying to be set on a copy of a slice from a DataFrame.\nTry using .loc[row_indexer,col_indexer] = value instead\n\nSee the caveats in the documentation: https:\/\/pandas.pydata.org\/pandas-docs\/stable\/user_guide\/indexing.html#returning-a-view-versus-a-copy\n\n&lt;ipython-input-14-3d5da43d58db&gt;:3: SettingWithCopyWarning:\n\n\nA value is trying to be set on a copy of a slice from a DataFrame.\nTry using .loc[row_indexer,col_indexer] = value instead\n\nSee the caveats in the documentation: https:\/\/pandas.pydata.org\/pandas-docs\/stable\/user_guide\/indexing.html#returning-a-view-versus-a-copy\n\n&lt;ipython-input-14-3d5da43d58db&gt;:4: SettingWithCopyWarning:\n\n\nA value is trying to be set on a copy of a slice from a DataFrame.\nTry using .loc[row_indexer,col_indexer] = value instead\n\nSee the caveats in the documentation: https:\/\/pandas.pydata.org\/pandas-docs\/stable\/user_guide\/indexing.html#returning-a-view-versus-a-copy\n<\/code><\/pre>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">train.head()<\/code><\/pre>\n<\/div>\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          Age\n        <\/th>\n<th>\n          Sex\n        <\/th>\n<th>\n          ChestPainType\n        <\/th>\n<th>\n          RestingBP\n        <\/th>\n<th>\n          Cholesterol\n        <\/th>\n<th>\n          FastingBS\n        <\/th>\n<th>\n          RestingECG\n        <\/th>\n<th>\n          MaxHR\n        <\/th>\n<th>\n          ExerciseAngina\n        <\/th>\n<th>\n          Oldpeak\n        <\/th>\n<th>\n          ST_Slope\n        <\/th>\n<th>\n          HeartDisease\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          578\n        <\/th>\n<td>\n          57\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          ASY\n        <\/td>\n<td>\n          156\n        <\/td>\n<td>\n          173\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          LVH\n        <\/td>\n<td>\n          119\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          3.0\n        <\/td>\n<td>\n          Down\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          480\n        <\/th>\n<td>\n          58\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          ATA\n        <\/td>\n<td>\n          126\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          Normal\n        <\/td>\n<td>\n          110\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          2.0\n        <\/td>\n<td>\n          Flat\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          512\n        <\/th>\n<td>\n          35\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          NAP\n        <\/td>\n<td>\n          123\n        <\/td>\n<td>\n          161\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          ST\n        <\/td>\n<td>\n          153\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          -0.1\n        <\/td>\n<td>\n          Up\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          634\n        <\/th>\n<td>\n          40\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          TA\n        <\/td>\n<td>\n          140\n        <\/td>\n<td>\n          199\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          Normal\n        <\/td>\n<td>\n          178\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          1.4\n        <\/td>\n<td>\n          Up\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          412\n        <\/th>\n<td>\n          56\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          ASY\n        <\/td>\n<td>\n          125\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          Normal\n        <\/td>\n<td>\n          103\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          1.0\n        <\/td>\n<td>\n          Flat\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>For attributes with 3 or more, we will use the pandas <code>get_dummies<\/code> function. It will create a new attribute per label. For instance, <code>ChestPainType<\/code> has 4 labels, therefore 4 new attributes will be created with a value of either 0 or 1.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">train=pd.get_dummies(train)\ntest=pd.get_dummies(test)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">train.head()<\/code><\/pre>\n<\/div>\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          Age\n        <\/th>\n<th>\n          Sex\n        <\/th>\n<th>\n          RestingBP\n        <\/th>\n<th>\n          Cholesterol\n        <\/th>\n<th>\n          FastingBS\n        <\/th>\n<th>\n          MaxHR\n        <\/th>\n<th>\n          ExerciseAngina\n        <\/th>\n<th>\n          Oldpeak\n        <\/th>\n<th>\n          HeartDisease\n        <\/th>\n<th>\n          ChestPainType_ASY\n        <\/th>\n<th>\n          ChestPainType_ATA\n        <\/th>\n<th>\n          ChestPainType_NAP\n        <\/th>\n<th>\n          ChestPainType_TA\n        <\/th>\n<th>\n          RestingECG_LVH\n        <\/th>\n<th>\n          RestingECG_Normal\n        <\/th>\n<th>\n          RestingECG_ST\n        <\/th>\n<th>\n          ST_Slope_Down\n        <\/th>\n<th>\n          ST_Slope_Flat\n        <\/th>\n<th>\n          ST_Slope_Up\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          578\n        <\/th>\n<td>\n          57\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          156\n        <\/td>\n<td>\n          173\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          119\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          3.0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          480\n        <\/th>\n<td>\n          58\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          126\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          110\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          2.0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          512\n        <\/th>\n<td>\n          35\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          123\n        <\/td>\n<td>\n          161\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          153\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          -0.1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          634\n        <\/th>\n<td>\n          40\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          140\n        <\/td>\n<td>\n          199\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          178\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          1.4\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          412\n        <\/th>\n<td>\n          56\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          125\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          103\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          1.0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">test.head()<\/code><\/pre>\n<\/div>\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          Age\n        <\/th>\n<th>\n          Sex\n        <\/th>\n<th>\n          RestingBP\n        <\/th>\n<th>\n          Cholesterol\n        <\/th>\n<th>\n          FastingBS\n        <\/th>\n<th>\n          MaxHR\n        <\/th>\n<th>\n          ExerciseAngina\n        <\/th>\n<th>\n          Oldpeak\n        <\/th>\n<th>\n          HeartDisease\n        <\/th>\n<th>\n          ChestPainType_ASY\n        <\/th>\n<th>\n          ChestPainType_ATA\n        <\/th>\n<th>\n          ChestPainType_NAP\n        <\/th>\n<th>\n          ChestPainType_TA\n        <\/th>\n<th>\n          RestingECG_LVH\n        <\/th>\n<th>\n          RestingECG_Normal\n        <\/th>\n<th>\n          RestingECG_ST\n        <\/th>\n<th>\n          ST_Slope_Down\n        <\/th>\n<th>\n          ST_Slope_Flat\n        <\/th>\n<th>\n          ST_Slope_Up\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          581\n        <\/th>\n<td>\n          48\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          140\n        <\/td>\n<td>\n          208\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          159\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          1.5\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          623\n        <\/th>\n<td>\n          60\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          140\n        <\/td>\n<td>\n          293\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          170\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1.2\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          60\n        <\/th>\n<td>\n          49\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          100\n        <\/td>\n<td>\n          253\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          174\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0.0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          613\n        <\/th>\n<td>\n          58\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          140\n        <\/td>\n<td>\n          385\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          135\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0.3\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          40\n        <\/th>\n<td>\n          54\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          150\n        <\/td>\n<td>\n          230\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          130\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0.0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">train.shape<\/code><\/pre>\n<\/div>\n<pre><code>(642, 19)\n<\/code><\/pre>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">test.shape<\/code><\/pre>\n<\/div>\n<pre><code>(276, 19)\n<\/code><\/pre>\n<p>The total number of attributes have increased because of the encoding.<\/p>\n<p>We will again divide our training and test sets into <code>X<\/code> and <code>Y<\/code>. <code>X<\/code> represents the set of independent variables\/attributes which determine the outcome of the dependent variable, <code>Y<\/code>. In our case, the dependent variable or explanatory variable is <code>HeartDisease<\/code>.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">x_train=train.drop(['HeartDisease'],1)\nx_test=test.drop(['HeartDisease'],1)\n\ny_train=train['HeartDisease']\ny_test=test['HeartDisease']<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">print(x_train.shape)\nprint(x_test.shape)<\/code><\/pre>\n<\/div>\n<pre><code>(642, 18)\n(276, 18)\n<\/code><\/pre>\n<h2>7&#46; Machine Learning Model<\/h2>\n<p>Let us now build a Logistic Regression model with the following parameters &#8212;<\/p>\n<ol>\n<li><code>max_iter=10000<\/code>. Signifies maximum number of iterations taken for the solver to converge. The default choice is 100 iterations.<\/li>\n<li><code>penalty=l2<\/code>. Signifies the norm used for penalty. Options include &#8211; <code>None, l1, l2, and, elasticnet<\/code>. The default is l2, so we do not have to provide it explicitly.<\/li>\n<\/ol>\n<p>There are several parameters available for the function including <code>class_weight, random_state, etc<\/code>. Official documentation with usage and default parameters can be found <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.linear_model.LogisticRegression.html\">here<\/a>.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">lr = LogisticRegression(max_iter=10000)\nmodel1=lr.fit(x_train, y_train)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">print(\"Train accuracy:\",model1.score(x_train, y_train))<\/code><\/pre>\n<\/div>\n<pre><code>Train accuracy: 0.8566978193146417\n<\/code><\/pre>\n<p>The training accuracy is approximately <code>85.6%<\/code> which seems a decent start. Let us go ahead and make predictions for the test dataset.<\/p>\n<h2>8&#46; Model Evaluation<\/h2>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">print(\"Test accuracy:\",model1.score(x_test,y_test))<\/code><\/pre>\n<\/div>\n<pre><code>Test accuracy: 0.894927536231884\n<\/code><\/pre>\n<p>The test accuracy is nearly <code>89.5%<\/code> which is higher than expected. Great! We can now store the predictions using the <code>predict<\/code> method on test dataset.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">lrpred = lr.predict(x_test)<\/code><\/pre>\n<\/div>\n<h3>8&#46;1 Classification Report<\/h3>\n<p>The <code>classification_report<\/code> is one of the metrics in the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.classification_report.html\">scikit-learn<\/a> library used for model evaluation. The function outputs the following:<\/p>\n<ol>\n<li><code>Precision:<\/code> Defined as True Positive\/(True Positive+False Positive)<\/li>\n<li><code>Recall:<\/code> Defined as True Positive\/(True Positive+False Negative)<\/li>\n<li><code>F1 Score:<\/code> The weighted harmonic mean of precision and recall. 1 signifies that both get equal weightage. <\/li>\n<li><code>Support:<\/code> Number of occurences of each class in the ground truth.<\/li>\n<\/ol>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">print(classification_report(lrpred,y_test))<\/code><\/pre>\n<\/div>\n<pre><code>              precision    recall  f1-score   support\n\n           0       0.85      0.90      0.88       114\n           1       0.93      0.89      0.91       162\n\n    accuracy                           0.89       276\n   macro avg       0.89      0.90      0.89       276\nweighted avg       0.90      0.89      0.90       276\n<\/code><\/pre>\n<h3>8&#46;2 Confusion Matrix<\/h3>\n<p>Confusion Matrix is again one of the metrics used for evaluating your classifier. By definition, each entity <code>(i,j)<\/code> in the confusion matrix represents the observations that are actually in group <code>i<\/code> but classified under group <code>j<\/code> by your model. Explore more on what parameters can be customized for the confusion matrix <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/model_evaluation.html#confusion-matrix\">here<\/a>.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">displr = plot_confusion_matrix(lr, x_test, y_test,cmap=plt.cm.OrRd , values_format='d')<\/code><\/pre>\n<\/div>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2022\/01\/confusion_matrix.png\"><img decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2022\/01\/confusion_matrix.png\" alt=\"\" width=\"312\" height=\"262\" class=\"aligncenter size-full wp-image-28026\" srcset=\"\/wp-content\/uploads\/2022\/01\/confusion_matrix.png 312w, \/wp-content\/uploads\/2022\/01\/confusion_matrix-300x252.png 300w\" sizes=\"(max-width: 312px) 100vw, 312px\" \/><\/a><\/p>\n<h2>9&#46; Conclusion<\/h2>\n<p>In this tutorial, we covered how can we use GriDB and Python to build a classifier for the Heart Failure Prediction Dataset. We covered two ways to access our data &#8211; Using GridDB and Pandas. GridDB is an efficient way when dealing with large amounts of data as it is highly scalable and open-source. <a href=\"https:\/\/griddb.net\/en\/downloads\/\">Install GridDB<\/a> today!<\/p>\n<h2>10&#46; References<\/h2>\n<ol>\n<li>https:\/\/www.kaggle.com\/fedesoriano\/heart-failure-prediction<\/li>\n<li>https:\/\/www.kaggle.com\/sisharaneranjana\/machine-learning-to-the-fore-to-save-lives<\/li>\n<li>https:\/\/www.kaggle.com\/durgancegaur\/a-guide-to-any-classification-problem<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we will explore the Heart Failure Prediction dataset which is publicly available on Kaggle. We will use GridDB to see how can we extract the data. Later, we will perform some Exploratory Data Analysis. Finally, we will build a Machine Learning Model for making future predictions. The outline of this tutorial is [&hellip;]<\/p>\n","protected":false},"author":41,"featured_media":28039,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[121],"tags":[],"class_list":["post-46686","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.1.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Heart Failure Prediction using Machine Learning, Python, and GridDB | GridDB: Open Source Time Series Database for IoT<\/title>\n<meta name=\"description\" content=\"In this tutorial, we will explore the Heart Failure Prediction dataset which is publicly available on Kaggle. We will use GridDB to see how can we extract\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Heart Failure Prediction using Machine Learning, Python, and GridDB | GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"og:description\" content=\"In this tutorial, we will explore the Heart Failure Prediction dataset which is publicly available on Kaggle. We will use GridDB to see how can we extract\" \/>\n<meta property=\"og:url\" content=\"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/\" \/>\n<meta property=\"og:site_name\" content=\"GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/griddbcommunity\/\" \/>\n<meta property=\"article:published_time\" content=\"2022-01-27T08:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-11-13T20:55:49+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.griddb.net\/wp-content\/uploads\/2022\/01\/heart.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1160\" \/>\n\t<meta property=\"og:image:height\" content=\"653\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"griddb-admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:site\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"griddb-admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/\"},\"author\":{\"name\":\"griddb-admin\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233\"},\"headline\":\"Heart Failure Prediction using Machine Learning, Python, and GridDB\",\"datePublished\":\"2022-01-27T08:00:00+00:00\",\"dateModified\":\"2025-11-13T20:55:49+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/\"},\"wordCount\":1327,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/griddb.net\/en\/#organization\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/#primaryimage\"},\"thumbnailUrl\":\"\/wp-content\/uploads\/2022\/01\/heart.png\",\"articleSection\":[\"Blog\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/\",\"url\":\"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/\",\"name\":\"Heart Failure Prediction using Machine Learning, Python, and GridDB | GridDB: Open Source Time Series Database for IoT\",\"isPartOf\":{\"@id\":\"https:\/\/griddb.net\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/#primaryimage\"},\"thumbnailUrl\":\"\/wp-content\/uploads\/2022\/01\/heart.png\",\"datePublished\":\"2022-01-27T08:00:00+00:00\",\"dateModified\":\"2025-11-13T20:55:49+00:00\",\"description\":\"In this tutorial, we will explore the Heart Failure Prediction dataset which is publicly available on Kaggle. We will use GridDB to see how can we extract\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/#primaryimage\",\"url\":\"\/wp-content\/uploads\/2022\/01\/heart.png\",\"contentUrl\":\"\/wp-content\/uploads\/2022\/01\/heart.png\",\"width\":1160,\"height\":653},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/griddb.net\/en\/#website\",\"url\":\"https:\/\/griddb.net\/en\/\",\"name\":\"GridDB: Open Source Time Series Database for IoT\",\"description\":\"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL\",\"publisher\":{\"@id\":\"https:\/\/griddb.net\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/griddb.net\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/griddb.net\/en\/#organization\",\"name\":\"Fixstars\",\"url\":\"https:\/\/griddb.net\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"contentUrl\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"width\":200,\"height\":83,\"caption\":\"Fixstars\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/griddbcommunity\/\",\"https:\/\/x.com\/GridDBCommunity\",\"https:\/\/www.linkedin.com\/company\/griddb-by-toshiba\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233\",\"name\":\"griddb-admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g\",\"caption\":\"griddb-admin\"},\"url\":\"https:\/\/www.griddb.net\/en\/author\/griddb-admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Heart Failure Prediction using Machine Learning, Python, and GridDB | GridDB: Open Source Time Series Database for IoT","description":"In this tutorial, we will explore the Heart Failure Prediction dataset which is publicly available on Kaggle. We will use GridDB to see how can we extract","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/","og_locale":"en_US","og_type":"article","og_title":"Heart Failure Prediction using Machine Learning, Python, and GridDB | GridDB: Open Source Time Series Database for IoT","og_description":"In this tutorial, we will explore the Heart Failure Prediction dataset which is publicly available on Kaggle. We will use GridDB to see how can we extract","og_url":"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/","og_site_name":"GridDB: Open Source Time Series Database for IoT","article_publisher":"https:\/\/www.facebook.com\/griddbcommunity\/","article_published_time":"2022-01-27T08:00:00+00:00","article_modified_time":"2025-11-13T20:55:49+00:00","og_image":[{"width":1160,"height":653,"url":"https:\/\/www.griddb.net\/wp-content\/uploads\/2022\/01\/heart.png","type":"image\/png"}],"author":"griddb-admin","twitter_card":"summary_large_image","twitter_creator":"@GridDBCommunity","twitter_site":"@GridDBCommunity","twitter_misc":{"Written by":"griddb-admin","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/#article","isPartOf":{"@id":"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/"},"author":{"name":"griddb-admin","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233"},"headline":"Heart Failure Prediction using Machine Learning, Python, and GridDB","datePublished":"2022-01-27T08:00:00+00:00","dateModified":"2025-11-13T20:55:49+00:00","mainEntityOfPage":{"@id":"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/"},"wordCount":1327,"commentCount":0,"publisher":{"@id":"https:\/\/griddb.net\/en\/#organization"},"image":{"@id":"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2022\/01\/heart.png","articleSection":["Blog"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/","url":"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/","name":"Heart Failure Prediction using Machine Learning, Python, and GridDB | GridDB: Open Source Time Series Database for IoT","isPartOf":{"@id":"https:\/\/griddb.net\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/#primaryimage"},"image":{"@id":"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2022\/01\/heart.png","datePublished":"2022-01-27T08:00:00+00:00","dateModified":"2025-11-13T20:55:49+00:00","description":"In this tutorial, we will explore the Heart Failure Prediction dataset which is publicly available on Kaggle. We will use GridDB to see how can we extract","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/blog\/heart-failure-prediction-using-machine-learning-python-and-griddb\/#primaryimage","url":"\/wp-content\/uploads\/2022\/01\/heart.png","contentUrl":"\/wp-content\/uploads\/2022\/01\/heart.png","width":1160,"height":653},{"@type":"WebSite","@id":"https:\/\/griddb.net\/en\/#website","url":"https:\/\/griddb.net\/en\/","name":"GridDB: Open Source Time Series Database for IoT","description":"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL","publisher":{"@id":"https:\/\/griddb.net\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/griddb.net\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/griddb.net\/en\/#organization","name":"Fixstars","url":"https:\/\/griddb.net\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/","url":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","contentUrl":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","width":200,"height":83,"caption":"Fixstars"},"image":{"@id":"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/griddbcommunity\/","https:\/\/x.com\/GridDBCommunity","https:\/\/www.linkedin.com\/company\/griddb-by-toshiba"]},{"@type":"Person","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233","name":"griddb-admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g","caption":"griddb-admin"},"url":"https:\/\/www.griddb.net\/en\/author\/griddb-admin\/"}]}},"_links":{"self":[{"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/posts\/46686","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/users\/41"}],"replies":[{"embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/comments?post=46686"}],"version-history":[{"count":1,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/posts\/46686\/revisions"}],"predecessor-version":[{"id":51360,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/posts\/46686\/revisions\/51360"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/media\/28039"}],"wp:attachment":[{"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/media?parent=46686"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/categories?post=46686"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/tags?post=46686"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}