{"id":46690,"date":"2022-02-24T00:00:00","date_gmt":"2022-02-24T08:00:00","guid":{"rendered":"https:\/\/www.griddb.net\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/"},"modified":"2026-03-30T14:50:04","modified_gmt":"2026-03-30T21:50:04","slug":"imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb","status":"publish","type":"post","link":"https:\/\/www.griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/","title":{"rendered":"Imbalanced Classification with the Fraudulent Credit Card Transaction Dataset using Python and GridDB"},"content":{"rendered":"<p>In this tutorial, we will explore the Credit Card Fraud Detection dataset available publicly on Kaggle. It is very crucial for credit card companies to detect fraud to avoid making losses and at the same time, not charging the customers for what they did not actually themselves purchase. We will use GridDB to extract the data, followed by building machine learning models to accurately detect frauds.<\/p>\n<p>The outline of the tutorial is as follows:<\/p>\n<ol>\n<li>Dataset overview<\/li>\n<li>Importing required libraries<\/li>\n<li>Loading the dataset<\/li>\n<li>Exploratory Data Analysis<\/li>\n<li>Machine Learning Model Building<\/li>\n<li>Model Evaluation<\/li>\n<li>Conclusion<\/li>\n<li>References<\/li>\n<\/ol>\n<h2>GridDB installation<\/h2>\n<p>While loading the dataset, this tutorial will cover two methods &#8212; Using GridDB as well as Using Pandas. To access GridDB using Python, the following packages also need to be installed beforehand:<\/p>\n<ol>\n<li><a href=\"https:\/\/github.com\/griddb\/c_client\">GridDB C-client<\/a><\/li>\n<li>SWIG (Simplified Wrapper and Interface Generator)<\/li>\n<li><a href=\"https:\/\/github.com\/griddb\/python_client\">GridDB Python Client<\/a><\/li>\n<\/ol>\n<h2>1&#46; Dataset Overview<\/h2>\n<p>The dataset contains all the credit card transactions made by European cardholders on two days in the month of September 2013. There are 284,407 transactions in total out of which 492 are frauds. As evident, the dataset is highly imbalanced, as the fraud class comprises only of 0.172% of all the transactions.<\/p>\n<p>A PCA (Principal Component Analysis) transformation has been applied on the dataset. PCA transformation refers to conversion of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. PCA gives V1, V2, &#8230;, V28 as the features and the only features that have not been transformed are Time of the transaction, and Amount of the transaction. Since the rest of the raw (original) features contain confidential information, these are not provided in the final dataset.<\/p>\n<p>Description of the features:<\/p>\n<ul>\n<li>Time: It contains the seconds elapsed between each transaction and the first transaction in the dataset.<\/p>\n<\/li>\n<li>Amount: Transaction Amount<\/p>\n<\/li>\n<li>\n<p>V1, V2, &#8230;, V28: Transformed features, hence they might not necessarily be raw features related to credit card transactions.<\/p>\n<\/li>\n<\/ul>\n<p>The dataset is available publicly and can be downloaded from <a href=\"https:\/\/www.kaggle.com\/mlg-ulb\/creditcardfraud\">Kaggle<\/a>. Please go ahead and download the dataset to follow along this tutorial.<\/p>\n<h2>2&#46; Importing Required Libraries<\/h2>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">import numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\nfrom sklearn.neighbors import KNeighborsClassifier\nfrom sklearn.model_selection import train_test_split\n\nfrom sklearn.metrics import precision_recall_curve\nfrom sklearn.metrics import auc\nfrom sklearn.metrics import make_scorer\nfrom sklearn.metrics import classification_report\n\nfrom sklearn.model_selection import cross_val_score<\/code><\/pre>\n<\/div>\n<p>In case you run into a package installation error, you can install it by typing <code>pip install package-name<\/code> in the command line. Alternatively, if you&#8217;re using a conda virtual environment, you can type <code>conda install package-name<\/code>.<\/p>\n<h2>3&#46; Loading the Dataset<\/h2>\n<p>Let&#8217;s proceed and load the dataset into our notebook.<\/p>\n<h4>3&#46;a Using GridDB<\/h4>\n<p>To store large amounts of data, a CSV file can be cumbersome. GridDB serves as a perfect alternative as it in open-source and a highly scalable database. GridDB is a scalable, in-memory, No SQL database which makes it easier for you to store large amounts of data. If you are new to GridDB, a tutorial on <a href=\"https:\/\/griddb.net\/en\/blog\/using-pandas-dataframes-with-griddb\/\">reading and writing to GridDB<\/a> can be useful.<\/p>\n<p>Assuming that you have already set up your database, we will now write the SQL query in python to load our dataset.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">import griddb_python as griddb\n\nsql_statement = ('SELECT * FROM credit_card_dataset')\nheart_dataset = pd.read_sql_query(sql_statement, cont)<\/code><\/pre>\n<\/div>\n<p>Note that the <code>cont<\/code> variable has the container information where our data is stored. Replace the <code>credit_card_dataset<\/code> with the name of your container. More info can be found in this tutorial <a href=\"https:\/\/griddb.net\/en\/blog\/using-pandas-dataframes-with-griddb\/\">reading and writing to GridDB<\/a>.<\/p>\n<h4>3&#46;b Using Pandas<\/h4>\n<p>We can also use Pandas&#8217; <code>read_csv<\/code> function to load our data. Both of the above methods will lead to the same output as the data is loaded in the form of a pandas dataframe using either of the methods.<\/p>\n<div class=\"clipboard\">\n<pre><code>credit_card_dataset = pd.read_csv('creditcard.csv')<\/code><\/pre>\n<\/div>\n<h2>4&#46; Exploratory Data Analysis<\/h2>\n<p>Once the dataset is loaded, let us now explore the dataset. We&#8217;ll print the first 10 rows of this dataset using head() function passing in 10 as argument, indicating the number of rows we want to print from the top.<\/p>\n<div class=\"clipboard\">\n<pre><code>credit_card_dataset.head(10)<\/code><\/pre>\n<\/div>\n<div style=\"overflow-x: auto;\">\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          Time\n        <\/th>\n<th>\n          V1\n        <\/th>\n<th>\n          V2\n        <\/th>\n<th>\n          V3\n        <\/th>\n<th>\n          V4\n        <\/th>\n<th>\n          V5\n        <\/th>\n<th>\n          V6\n        <\/th>\n<th>\n          V7\n        <\/th>\n<th>\n          V8\n        <\/th>\n<th>\n          V9\n        <\/th>\n<th>\n          &#8230;\n        <\/th>\n<th>\n          V21\n        <\/th>\n<th>\n          V22\n        <\/th>\n<th>\n          V23\n        <\/th>\n<th>\n          V24\n        <\/th>\n<th>\n          V25\n        <\/th>\n<th>\n          V26\n        <\/th>\n<th>\n          V27\n        <\/th>\n<th>\n          V28\n        <\/th>\n<th>\n          Amount\n        <\/th>\n<th>\n          Class\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          0\n        <\/th>\n<td>\n          0.0\n        <\/td>\n<td>\n          -1.359807\n        <\/td>\n<td>\n          -0.072781\n        <\/td>\n<td>\n          2.536347\n        <\/td>\n<td>\n          1.378155\n        <\/td>\n<td>\n          -0.338321\n        <\/td>\n<td>\n          0.462388\n        <\/td>\n<td>\n          0.239599\n        <\/td>\n<td>\n          0.098698\n        <\/td>\n<td>\n          0.363787\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          -0.018307\n        <\/td>\n<td>\n          0.277838\n        <\/td>\n<td>\n          -0.110474\n        <\/td>\n<td>\n          0.066928\n        <\/td>\n<td>\n          0.128539\n        <\/td>\n<td>\n          -0.189115\n        <\/td>\n<td>\n          0.133558\n        <\/td>\n<td>\n          -0.021053\n        <\/td>\n<td>\n          149.62\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          1\n        <\/th>\n<td>\n          0.0\n        <\/td>\n<td>\n          1.191857\n        <\/td>\n<td>\n          0.266151\n        <\/td>\n<td>\n          0.166480\n        <\/td>\n<td>\n          0.448154\n        <\/td>\n<td>\n          0.060018\n        <\/td>\n<td>\n          -0.082361\n        <\/td>\n<td>\n          -0.078803\n        <\/td>\n<td>\n          0.085102\n        <\/td>\n<td>\n          -0.255425\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          -0.225775\n        <\/td>\n<td>\n          -0.638672\n        <\/td>\n<td>\n          0.101288\n        <\/td>\n<td>\n          -0.339846\n        <\/td>\n<td>\n          0.167170\n        <\/td>\n<td>\n          0.125895\n        <\/td>\n<td>\n          -0.008983\n        <\/td>\n<td>\n          0.014724\n        <\/td>\n<td>\n          2.69\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          2\n        <\/th>\n<td>\n          1.0\n        <\/td>\n<td>\n          -1.358354\n        <\/td>\n<td>\n          -1.340163\n        <\/td>\n<td>\n          1.773209\n        <\/td>\n<td>\n          0.379780\n        <\/td>\n<td>\n          -0.503198\n        <\/td>\n<td>\n          1.800499\n        <\/td>\n<td>\n          0.791461\n        <\/td>\n<td>\n          0.247676\n        <\/td>\n<td>\n          -1.514654\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          0.247998\n        <\/td>\n<td>\n          0.771679\n        <\/td>\n<td>\n          0.909412\n        <\/td>\n<td>\n          -0.689281\n        <\/td>\n<td>\n          -0.327642\n        <\/td>\n<td>\n          -0.139097\n        <\/td>\n<td>\n          -0.055353\n        <\/td>\n<td>\n          -0.059752\n        <\/td>\n<td>\n          378.66\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          3\n        <\/th>\n<td>\n          1.0\n        <\/td>\n<td>\n          -0.966272\n        <\/td>\n<td>\n          -0.185226\n        <\/td>\n<td>\n          1.792993\n        <\/td>\n<td>\n          -0.863291\n        <\/td>\n<td>\n          -0.010309\n        <\/td>\n<td>\n          1.247203\n        <\/td>\n<td>\n          0.237609\n        <\/td>\n<td>\n          0.377436\n        <\/td>\n<td>\n          -1.387024\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          -0.108300\n        <\/td>\n<td>\n          0.005274\n        <\/td>\n<td>\n          -0.190321\n        <\/td>\n<td>\n          -1.175575\n        <\/td>\n<td>\n          0.647376\n        <\/td>\n<td>\n          -0.221929\n        <\/td>\n<td>\n          0.062723\n        <\/td>\n<td>\n          0.061458\n        <\/td>\n<td>\n          123.50\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          4\n        <\/th>\n<td>\n          2.0\n        <\/td>\n<td>\n          -1.158233\n        <\/td>\n<td>\n          0.877737\n        <\/td>\n<td>\n          1.548718\n        <\/td>\n<td>\n          0.403034\n        <\/td>\n<td>\n          -0.407193\n        <\/td>\n<td>\n          0.095921\n        <\/td>\n<td>\n          0.592941\n        <\/td>\n<td>\n          -0.270533\n        <\/td>\n<td>\n          0.817739\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          -0.009431\n        <\/td>\n<td>\n          0.798278\n        <\/td>\n<td>\n          -0.137458\n        <\/td>\n<td>\n          0.141267\n        <\/td>\n<td>\n          -0.206010\n        <\/td>\n<td>\n          0.502292\n        <\/td>\n<td>\n          0.219422\n        <\/td>\n<td>\n          0.215153\n        <\/td>\n<td>\n          69.99\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          5\n        <\/th>\n<td>\n          2.0\n        <\/td>\n<td>\n          -0.425966\n        <\/td>\n<td>\n          0.960523\n        <\/td>\n<td>\n          1.141109\n        <\/td>\n<td>\n          -0.168252\n        <\/td>\n<td>\n          0.420987\n        <\/td>\n<td>\n          -0.029728\n        <\/td>\n<td>\n          0.476201\n        <\/td>\n<td>\n          0.260314\n        <\/td>\n<td>\n          -0.568671\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          -0.208254\n        <\/td>\n<td>\n          -0.559825\n        <\/td>\n<td>\n          -0.026398\n        <\/td>\n<td>\n          -0.371427\n        <\/td>\n<td>\n          -0.232794\n        <\/td>\n<td>\n          0.105915\n        <\/td>\n<td>\n          0.253844\n        <\/td>\n<td>\n          0.081080\n        <\/td>\n<td>\n          3.67\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          6\n        <\/th>\n<td>\n          4.0\n        <\/td>\n<td>\n          1.229658\n        <\/td>\n<td>\n          0.141004\n        <\/td>\n<td>\n          0.045371\n        <\/td>\n<td>\n          1.202613\n        <\/td>\n<td>\n          0.191881\n        <\/td>\n<td>\n          0.272708\n        <\/td>\n<td>\n          -0.005159\n        <\/td>\n<td>\n          0.081213\n        <\/td>\n<td>\n          0.464960\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          -0.167716\n        <\/td>\n<td>\n          -0.270710\n        <\/td>\n<td>\n          -0.154104\n        <\/td>\n<td>\n          -0.780055\n        <\/td>\n<td>\n          0.750137\n        <\/td>\n<td>\n          -0.257237\n        <\/td>\n<td>\n          0.034507\n        <\/td>\n<td>\n          0.005168\n        <\/td>\n<td>\n          4.99\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          7\n        <\/th>\n<td>\n          7.0\n        <\/td>\n<td>\n          -0.644269\n        <\/td>\n<td>\n          1.417964\n        <\/td>\n<td>\n          1.074380\n        <\/td>\n<td>\n          -0.492199\n        <\/td>\n<td>\n          0.948934\n        <\/td>\n<td>\n          0.428118\n        <\/td>\n<td>\n          1.120631\n        <\/td>\n<td>\n          -3.807864\n        <\/td>\n<td>\n          0.615375\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          1.943465\n        <\/td>\n<td>\n          -1.015455\n        <\/td>\n<td>\n          0.057504\n        <\/td>\n<td>\n          -0.649709\n        <\/td>\n<td>\n          -0.415267\n        <\/td>\n<td>\n          -0.051634\n        <\/td>\n<td>\n          -1.206921\n        <\/td>\n<td>\n          -1.085339\n        <\/td>\n<td>\n          40.80\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          8\n        <\/th>\n<td>\n          7.0\n        <\/td>\n<td>\n          -0.894286\n        <\/td>\n<td>\n          0.286157\n        <\/td>\n<td>\n          -0.113192\n        <\/td>\n<td>\n          -0.271526\n        <\/td>\n<td>\n          2.669599\n        <\/td>\n<td>\n          3.721818\n        <\/td>\n<td>\n          0.370145\n        <\/td>\n<td>\n          0.851084\n        <\/td>\n<td>\n          -0.392048\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          -0.073425\n        <\/td>\n<td>\n          -0.268092\n        <\/td>\n<td>\n          -0.204233\n        <\/td>\n<td>\n          1.011592\n        <\/td>\n<td>\n          0.373205\n        <\/td>\n<td>\n          -0.384157\n        <\/td>\n<td>\n          0.011747\n        <\/td>\n<td>\n          0.142404\n        <\/td>\n<td>\n          93.20\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          9\n        <\/th>\n<td>\n          9.0\n        <\/td>\n<td>\n          -0.338262\n        <\/td>\n<td>\n          1.119593\n        <\/td>\n<td>\n          1.044367\n        <\/td>\n<td>\n          -0.222187\n        <\/td>\n<td>\n          0.499361\n        <\/td>\n<td>\n          -0.246761\n        <\/td>\n<td>\n          0.651583\n        <\/td>\n<td>\n          0.069539\n        <\/td>\n<td>\n          -0.736727\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          -0.246914\n        <\/td>\n<td>\n          -0.633753\n        <\/td>\n<td>\n          -0.120794\n        <\/td>\n<td>\n          -0.385050\n        <\/td>\n<td>\n          -0.069733\n        <\/td>\n<td>\n          0.094199\n        <\/td>\n<td>\n          0.246219\n        <\/td>\n<td>\n          0.083076\n        <\/td>\n<td>\n          3.68\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>    10 rows \u00c3\u0097 31 columns<\/p>\n<\/div>\n<p>Here, <code>Time<\/code> is a numeric variable. It doesn&#8217;t appear that it&#8217;s going to be useful. We have an <code>Amount<\/code> column containing the transaction amount. Next, we see the distribution of the target variable <code>Class<\/code>. We have 284,315 non-frauds and 492 fraud transaction in the dataset, making it highly imbalanced as only 0.172% transactions are classified as fraud.<\/p>\n<div class=\"clipboard\">\n<pre><code>credit_card_dataset.Class.value_counts()<\/code><\/pre>\n<\/div>\n<pre><code>    0    284315\n    1       492\n    Name: Class, dtype: int64\n<\/code><\/pre>\n<p>In order to explore the variable distribution, we can plot them as a histogram. <code>&lt;br&gt;<\/code>{=html} We&#8217;ll remove the target variable, and the axis labels so as to de-clutter the plot space.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">credit_card_dataset_without_target = credit_card_dataset.drop(columns = ['Class'], axis=1)\n\nvariable_hist = credit_card_dataset_without_target.hist(bins=100)\n\nfor axis in variable_hist.flatten():\n    axis.set_xticklabels([])\n    axis.set_yticklabels([])\n\nplt.figure(figsize = (20,15))\nplt.show()<\/code><\/pre>\n<\/div>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2022\/02\/43aa0a7ecd4e82b1476db58bd40e1707a11f7084.png\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2022\/02\/43aa0a7ecd4e82b1476db58bd40e1707a11f7084.png\" alt=\"\" width=\"352\" height=\"251\" class=\"aligncenter size-full wp-image-28070\" srcset=\"\/wp-content\/uploads\/2022\/02\/43aa0a7ecd4e82b1476db58bd40e1707a11f7084.png 352w, \/wp-content\/uploads\/2022\/02\/43aa0a7ecd4e82b1476db58bd40e1707a11f7084-300x214.png 300w\" sizes=\"(max-width: 352px) 100vw, 352px\" \/><\/a><\/p>\n<p>We observe above that the distribution of most of the variables is Gaussian, and a lot of them are centered around zero. This indicates that the variables might have been standardized when they went through the PCA transformation.<\/p>\n<h2>5&#46; Machine Learning Model Building<\/h2>\n<p>Now, let&#8217;s proceed to building and evaluating machine learning models on our credit card dataset. We&#8217;ll first create <code>features<\/code> and <code>labels<\/code> for our model and split them into train and test samples. Test size has been kept as 20% of the total dataset size.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">features = credit_card_dataset.drop(columns = ['Time', 'Class'], axis = 1)\nlabels = credit_card_dataset[['Class']]\n\nX_train, X_test, y_train, y_test = train_test_split(features, labels, test_size = 0.2, random_state = 0)<\/code><\/pre>\n<\/div>\n<p>For the purpose of this tutorial, we&#8217;ll build a <strong>k-Nearest Neighbors<\/strong> classification model. In k-nearest neigbors technique, for the purpose of prediction, the k-most similar (nearer) points are looked at and accordingly a prediction is made. We keep the <code>n_neighbors<\/code> parameter set as 3.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">model = KNeighborsClassifier(n_neighbors = 3)\nmodel.fit(X_train, y_train)<\/code><\/pre>\n<\/div>\n<p>{.output .stream .stderr} \/Users\/aniket\/opt\/anaconda3\/lib\/python3.9\/site-packages\/sklearn\/neighbors\/&#95;classification.py:179: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n&#95;samples,), for example using ravel(). return self._fit(X, y)<\/p>\n<p>{.output .execute_result execution_count=&#8221;7&#8243;} KNeighborsClassifier(n_neighbors=3)<\/p>\n<p>After the model has been fit on our training data, we can proceed to predicting for our test set in order to evaluate the model performance. Lets store our predictions in <code>predicted<\/code>.<\/p>\n<div class=\"clipboard\">\n<pre><code>predicted = model.predict(X_test)<\/code><\/pre>\n<\/div>\n<h2>6&#46; Model Evaluation<\/h2>\n<p>We are going to evaluate the model&#8217;s performance using <code>classification report<\/code> metric. Classification report is a widely used metric to evaluate classification models. It outputs the following:<\/p>\n<ol>\n<li><strong>Precision<\/strong>: Ratio of correctly predicted positive observations to the total predicted positive observations<\/li>\n<li><strong>Recall<\/strong>: Ratio of correctly predicted positive observations to the all observations in actual class<\/li>\n<li><strong>F1 score<\/strong>: Harmonic mean of Precision and Recall<\/li>\n<li><strong>Support<\/strong>: Number of predicted observations in each class<\/li>\n<\/ol>\n<div class=\"clipboard\">\n<pre><code>print(classification_report(predicted,y_test))<\/code><\/pre>\n<\/div>\n<pre><code> {.output .stream .stdout}\n                  precision    recall  f1-score   support\n\n               0       1.00      1.00      1.00     56882\n               1       0.72      0.91      0.81        80\n\n        accuracy                           1.00     56962\n       macro avg       0.86      0.96      0.90     56962\n    weighted avg       1.00      1.00      1.00     56962\n<\/code><\/pre>\n<h2>7&#46; Conclusion<\/h2>\n<p>In this tutorial, we saw how can we use GriDB and Python to build a classifier for the Credit Card Fraud Detection Dataset. We examined two ways to import our data, ussing (1) GridDB and (2) Pandas. For large datasets, GridDB provides an excellent alternative to import data in your notebook as it is open-source and highly scalable. <a href=\"https:\/\/griddb.net\/en\/downloads\/\">Download GridDB<\/a> today!<\/p>\n<h2>8&#46; References<\/h2>\n<ol>\n<li><a href=\"https:\/\/www.kaggle.com\/mlg-ulb\/creditcardfraud\">https:\/\/www.kaggle.com\/mlg-ulb\/creditcardfraud<\/a><\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.neighbors.KNeighborsClassifier.html\">https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.neighbors.KNeighborsClassifier.html<\/a><\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.train_test_split.html\">https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.train_test_split.html<\/a><\/li>\n<li><a href=\"https:\/\/www.scribd.com\/document\/557372014\/Accuracy-Precision-Recall-F1-Score-Interpretation-of-Performance-Measures\">https:\/\/blog.exsilio.com\/all\/accuracy-precision-recall-f1-score-interpretation-of-performance-measures\/<\/a><\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we will explore the Credit Card Fraud Detection dataset available publicly on Kaggle. It is very crucial for credit card companies to detect fraud to avoid making losses and at the same time, not charging the customers for what they did not actually themselves purchase. We will use GridDB to extract the [&hellip;]<\/p>\n","protected":false},"author":41,"featured_media":28075,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[121],"tags":[],"class_list":["post-46690","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.1.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Imbalanced Classification with the Fraudulent Credit Card Transaction Dataset using Python and GridDB | GridDB: Open Source Time Series Database for IoT<\/title>\n<meta name=\"description\" content=\"In this tutorial, we will explore the Credit Card Fraud Detection dataset available publicly on Kaggle. It is very crucial for credit card companies to\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Imbalanced Classification with the Fraudulent Credit Card Transaction Dataset using Python and GridDB | GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"og:description\" content=\"In this tutorial, we will explore the Credit Card Fraud Detection dataset available publicly on Kaggle. It is very crucial for credit card companies to\" \/>\n<meta property=\"og:url\" content=\"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/\" \/>\n<meta property=\"og:site_name\" content=\"GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/griddbcommunity\/\" \/>\n<meta property=\"article:published_time\" content=\"2022-02-24T08:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-03-30T21:50:04+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.griddb.net\/wp-content\/uploads\/2022\/02\/bank_2560x1707.jpeg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1707\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"griddb-admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:site\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"griddb-admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/\"},\"author\":{\"name\":\"griddb-admin\",\"@id\":\"https:\/\/www.griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233\"},\"headline\":\"Imbalanced Classification with the Fraudulent Credit Card Transaction Dataset using Python and GridDB\",\"datePublished\":\"2022-02-24T08:00:00+00:00\",\"dateModified\":\"2026-03-30T21:50:04+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/\"},\"wordCount\":1162,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.griddb.net\/en\/#organization\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/#primaryimage\"},\"thumbnailUrl\":\"\/wp-content\/uploads\/2022\/02\/bank_2560x1707.jpeg\",\"articleSection\":[\"Blog\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/\",\"url\":\"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/\",\"name\":\"Imbalanced Classification with the Fraudulent Credit Card Transaction Dataset using Python and GridDB | GridDB: Open Source Time Series Database for IoT\",\"isPartOf\":{\"@id\":\"https:\/\/www.griddb.net\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/#primaryimage\"},\"thumbnailUrl\":\"\/wp-content\/uploads\/2022\/02\/bank_2560x1707.jpeg\",\"datePublished\":\"2022-02-24T08:00:00+00:00\",\"dateModified\":\"2026-03-30T21:50:04+00:00\",\"description\":\"In this tutorial, we will explore the Credit Card Fraud Detection dataset available publicly on Kaggle. It is very crucial for credit card companies to\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/#primaryimage\",\"url\":\"\/wp-content\/uploads\/2022\/02\/bank_2560x1707.jpeg\",\"contentUrl\":\"\/wp-content\/uploads\/2022\/02\/bank_2560x1707.jpeg\",\"width\":2560,\"height\":1707},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.griddb.net\/en\/#website\",\"url\":\"https:\/\/www.griddb.net\/en\/\",\"name\":\"GridDB: Open Source Time Series Database for IoT\",\"description\":\"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL\",\"publisher\":{\"@id\":\"https:\/\/www.griddb.net\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.griddb.net\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.griddb.net\/en\/#organization\",\"name\":\"Fixstars\",\"url\":\"https:\/\/www.griddb.net\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.griddb.net\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"contentUrl\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"width\":200,\"height\":83,\"caption\":\"Fixstars\"},\"image\":{\"@id\":\"https:\/\/www.griddb.net\/en\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/griddbcommunity\/\",\"https:\/\/x.com\/GridDBCommunity\",\"https:\/\/www.linkedin.com\/company\/griddb-by-toshiba\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233\",\"name\":\"griddb-admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.griddb.net\/en\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g\",\"caption\":\"griddb-admin\"},\"url\":\"https:\/\/www.griddb.net\/en\/author\/griddb-admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Imbalanced Classification with the Fraudulent Credit Card Transaction Dataset using Python and GridDB | GridDB: Open Source Time Series Database for IoT","description":"In this tutorial, we will explore the Credit Card Fraud Detection dataset available publicly on Kaggle. It is very crucial for credit card companies to","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/","og_locale":"en_US","og_type":"article","og_title":"Imbalanced Classification with the Fraudulent Credit Card Transaction Dataset using Python and GridDB | GridDB: Open Source Time Series Database for IoT","og_description":"In this tutorial, we will explore the Credit Card Fraud Detection dataset available publicly on Kaggle. It is very crucial for credit card companies to","og_url":"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/","og_site_name":"GridDB: Open Source Time Series Database for IoT","article_publisher":"https:\/\/www.facebook.com\/griddbcommunity\/","article_published_time":"2022-02-24T08:00:00+00:00","article_modified_time":"2026-03-30T21:50:04+00:00","og_image":[{"width":2560,"height":1707,"url":"https:\/\/www.griddb.net\/wp-content\/uploads\/2022\/02\/bank_2560x1707.jpeg","type":"image\/jpeg"}],"author":"griddb-admin","twitter_card":"summary_large_image","twitter_creator":"@GridDBCommunity","twitter_site":"@GridDBCommunity","twitter_misc":{"Written by":"griddb-admin","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/#article","isPartOf":{"@id":"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/"},"author":{"name":"griddb-admin","@id":"https:\/\/www.griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233"},"headline":"Imbalanced Classification with the Fraudulent Credit Card Transaction Dataset using Python and GridDB","datePublished":"2022-02-24T08:00:00+00:00","dateModified":"2026-03-30T21:50:04+00:00","mainEntityOfPage":{"@id":"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/"},"wordCount":1162,"commentCount":0,"publisher":{"@id":"https:\/\/www.griddb.net\/en\/#organization"},"image":{"@id":"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2022\/02\/bank_2560x1707.jpeg","articleSection":["Blog"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/","url":"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/","name":"Imbalanced Classification with the Fraudulent Credit Card Transaction Dataset using Python and GridDB | GridDB: Open Source Time Series Database for IoT","isPartOf":{"@id":"https:\/\/www.griddb.net\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/#primaryimage"},"image":{"@id":"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2022\/02\/bank_2560x1707.jpeg","datePublished":"2022-02-24T08:00:00+00:00","dateModified":"2026-03-30T21:50:04+00:00","description":"In this tutorial, we will explore the Credit Card Fraud Detection dataset available publicly on Kaggle. It is very crucial for credit card companies to","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/blog\/imbalanced-classification-with-the-fraudulent-credit-card-transaction-dataset-using-python-and-griddb\/#primaryimage","url":"\/wp-content\/uploads\/2022\/02\/bank_2560x1707.jpeg","contentUrl":"\/wp-content\/uploads\/2022\/02\/bank_2560x1707.jpeg","width":2560,"height":1707},{"@type":"WebSite","@id":"https:\/\/www.griddb.net\/en\/#website","url":"https:\/\/www.griddb.net\/en\/","name":"GridDB: Open Source Time Series Database for IoT","description":"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL","publisher":{"@id":"https:\/\/www.griddb.net\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.griddb.net\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.griddb.net\/en\/#organization","name":"Fixstars","url":"https:\/\/www.griddb.net\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.griddb.net\/en\/#\/schema\/logo\/image\/","url":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","contentUrl":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","width":200,"height":83,"caption":"Fixstars"},"image":{"@id":"https:\/\/www.griddb.net\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/griddbcommunity\/","https:\/\/x.com\/GridDBCommunity","https:\/\/www.linkedin.com\/company\/griddb-by-toshiba"]},{"@type":"Person","@id":"https:\/\/www.griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233","name":"griddb-admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.griddb.net\/en\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g","caption":"griddb-admin"},"url":"https:\/\/www.griddb.net\/en\/author\/griddb-admin\/"}]}},"_links":{"self":[{"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/posts\/46690","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/users\/41"}],"replies":[{"embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/comments?post=46690"}],"version-history":[{"count":3,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/posts\/46690\/revisions"}],"predecessor-version":[{"id":55162,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/posts\/46690\/revisions\/55162"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/media\/28075"}],"wp:attachment":[{"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/media?parent=46690"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/categories?post=46690"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/tags?post=46690"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}