{"id":46606,"date":"2020-06-25T00:00:00","date_gmt":"2020-06-25T07:00:00","guid":{"rendered":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/blog\/machine-learning-php-griddb\/"},"modified":"2025-11-13T12:54:57","modified_gmt":"2025-11-13T20:54:57","slug":"machine-learning-php-griddb","status":"publish","type":"post","link":"https:\/\/www.griddb.net\/en\/blog\/machine-learning-php-griddb\/","title":{"rendered":"Machine Learning with PHP &#038; GridDB"},"content":{"rendered":"<p><meta property=\"og:image\" content=\"https:\/\/griddb-pro.azureedge.net\/wp-content\/uploads\/2020\/06\/php_ml_blog_fm.png\" \/><\/p>\n<p><span class=\"md-plain\">The output of any given model in machine learning is only ever as good as its input data. As such it becomes crucial that the algorithm creating it can be fed, at will, with specific data. Data which possibly needs to fulfil a large variety of criteria. And that&#8217;s what databases are for. But not any database will do either. Given the quantities of data, there is a need for fast data handling. Which is\u00c2\u00a0where GridDB comes in, with its design for speed and the possibility to represent complex structures through computationally advantageous container schemes and hierarchies.<\/span><\/p>\n<p><span class=\"md-plain\">Let&#8217;s take a look then with a very fundamental example using php and its php-ai\/php-ml library in connection with GridDB. Of course you could also use python or one of the many other languages for which GridDB features a connector.<\/span><\/p>\n<h2>Environment Setup, PHP &amp; GridDB<\/h2>\n<p><span class=\"md-plain\">First setup then. We need php v7.2 or newer, the php-ai\/php-ml library, and a GridDB server.<\/span><span class=\"md-softbreak\"><br \/>\n<\/span><span class=\"md-plain\">Your server will need php just as much as your local workstation, so that both the database server and client can interact with the shared language.<\/span><\/p>\n<p><span class=\"md-plain\">On CentOS php can simply be installed with:<\/span><\/p>\n<pre class=\"md-fences md-end-block ty-contain-cm modeLoaded\" lang=\"php\" spellcheck=\"false\"><span role=\"presentation\"><span class=\"cm-variable\">yum<\/span> <span class=\"cm-variable\">install<\/span> <span class=\"cm-variable\">php7<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-variable\">yum<\/span> <span class=\"cm-variable\">install<\/span> <span class=\"cm-variable\">php7<\/span><span class=\"cm-operator\">-<\/span><span class=\"cm-variable\">devel<\/span><\/span><\/pre>\n<p>T<span class=\"md-plain\">hrough the use of composer we can then add the ml library:<\/span><\/p>\n<pre class=\"md-fences md-end-block ty-contain-cm modeLoaded\" lang=\"php\" spellcheck=\"false\"><span role=\"presentation\"><span class=\"cm-variable\">composer<\/span> <span class=\"cm-keyword\">require<\/span> <span class=\"cm-variable\">php<\/span><span class=\"cm-operator\">-<\/span><span class=\"cm-variable\">ai<\/span><span class=\"cm-operator\">\/<\/span><span class=\"cm-variable\">php<\/span><span class=\"cm-operator\">-<\/span><span class=\"cm-variable\">ml<\/span><\/span><\/pre>\n<p><span class=\"md-plain\">You might also need the php-mbstring extension and more general libraries, installable with:<\/span><\/p>\n<pre class=\"md-fences md-end-block ty-contain-cm modeLoaded\" lang=\"\" spellcheck=\"false\"><span role=\"presentation\">yum install php-mbstring<\/span>\n<span role=\"presentation\">yum groupinstall 'Development Tools'<\/span><\/pre>\n<p><span class=\"md-plain\">GridDB can be installed in a variety of ways, detailed instructions can be found <\/span><span class=\"md-meta-i-c md-link\"><a spellcheck=\"false\" href=\"http:\/\/docs.griddb.net\/gettingstarted\/introduction\/\"><span class=\"md-plain\">here<\/span><\/a><\/span><span class=\"md-plain\">.<\/span><span class=\"md-softbreak\"><br \/>\n<\/span><span class=\"md-plain\">The possibly easiest method to install it, is to use docker:<\/span><\/p>\n<pre class=\"md-fences md-end-block ty-contain-cm modeLoaded\" lang=\"\" spellcheck=\"false\"><span role=\"presentation\">docker pull griddbnet\/griddb<\/span><\/pre>\n<p><span class=\"md-plain\">Check that your database is ready to go with:<\/span><\/p>\n<pre class=\"md-fences md-end-block ty-contain-cm modeLoaded\" lang=\"\" spellcheck=\"false\"><span role=\"presentation\">su - gsadm<\/span>\n<span role=\"presentation\">gs_stat -u \"user\"\/\"password\"<\/span><\/pre>\n<p class=\"md-end-block md-p\" style=\"box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; white-space: pre-wrap; position: relative; color: #333333; font-family: 'Open Sans', 'Clear Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;\"><span class=\"md-plain\">(Replace &#8220;user&#8221; and &#8220;password&#8221; with your admin credentials.)<\/span><\/p>\n<h3>Container Creation<\/h3>\n<p><span class=\"md-plain\">With the database up and running, it is time to create the database container and its associated layout scheme.<\/span><\/p>\n<pre class=\"md-fences md-end-block ty-contain-cm modeLoaded\" lang=\"php\" spellcheck=\"false\"><span role=\"presentation\"><span class=\"cm-keyword\">include<\/span>(<span class=\"cm-string\">'griddb_php_client.php'<\/span>);<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$factory<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable\">StoreFactory<\/span>::<span class=\"cm-variable\">get_default<\/span>();<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$container_name<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-string\">\"income_ml\"<\/span>;<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$gridstore<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$factory<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">get_store<\/span>(<\/span>\n<span role=\"presentation\">    <span class=\"cm-keyword\">array<\/span>(<\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-comment\">\/\/ Connection specifications.<\/span><\/span>\n<span role=\"presentation\">        <span class=\"cm-string\">\"notification_address\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable-2\">$argv<\/span>[<span class=\"cm-number\">1<\/span>],<\/span>\n<span role=\"presentation\">        <span class=\"cm-string\">\"notification_port\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable-2\">$argv<\/span>[<span class=\"cm-number\">2<\/span>],<\/span>\n<span role=\"presentation\">        <span class=\"cm-string\">\"cluster_name\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable-2\">$argv<\/span>[<span class=\"cm-number\">3<\/span>],<\/span>\n<span role=\"presentation\">        <span class=\"cm-string\">\"user\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable-2\">$argv<\/span>[<span class=\"cm-number\">4<\/span>],<\/span>\n<span role=\"presentation\">        <span class=\"cm-string\">\"password\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable-2\">$argv<\/span>[<span class=\"cm-number\">5<\/span>]<\/span>\n<span role=\"presentation\">    )<\/span>\n<span role=\"presentation\">);<\/span>\n<span role=\"presentation\"><span class=\"cm-comment\">\/\/ Connect to the target container (create it if does not exist).<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$gridstore<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">get_container<\/span>(<span class=\"cm-string\">\"container_name\"<\/span>);<\/span>\n<span role=\"presentation\"><span class=\"cm-comment\">\/\/ Place a collection into the container.<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$collection<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$gridstore<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">put_container<\/span>(<span class=\"cm-variable-2\">$container_name<\/span>,<\/span>\n<span role=\"presentation\">    <span class=\"cm-keyword\">array<\/span>(<\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0      <span class=\"cm-comment\">\/\/ Definition of the container layout scheme.<\/span><\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"id\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_INTEGER<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"age\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_INTEGER<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"workclass\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"fnlwgt\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_INTEGER<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"education\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"family\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"occupation\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"relationship\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"race\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"gender\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"nation\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"income_status\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            ),<\/span>\n<span role=\"presentation\">     \u00c2\u00a0<span class=\"cm-variable\">GS_CONTAINER_COLLECTION<\/span>);<\/span><\/pre>\n<h3>Data Insertion<\/h3>\n<p><span class=\"md-plain\">Now that the container has been created, it is time to fill it. Here we&#8217;ll use one of the simplest and most modifiable methods. We call GridDB&#8217;s API via php functions, and pass values from corresponding arrays.<\/span><span class=\"md-softbreak\"><br \/>\n<\/span><span class=\"md-plain\">In order for this project to remain in an easily replicable scope, we&#8217;ll chose a comparatively small data set. <\/span><span class=\"md-meta-i-c md-link\"><a spellcheck=\"false\" href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/Adult\"><span class=\"md-plain\">You can find it here.<\/span><\/a><\/span><span class=\"md-plain\"> Having said that, with the hardware and time available to train a much larger model, and the excellent scalability of GridDB you could employ significantly more data just as easily. <\/span><\/p>\n<pre class=\"md-fences md-end-block ty-contain-cm modeLoaded\" lang=\"php\" spellcheck=\"false\"><span role=\"presentation\"><span class=\"cm-keyword\">for<\/span>(<span class=\"cm-variable-2\">$i<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-number\">0<\/span>; <span class=\"cm-variable-2\">$i<\/span> <span class=\"cm-operator\">&lt;<\/span> <span class=\"cm-variable-2\">$row_count<\/span>; <span class=\"cm-variable-2\">$i<\/span><span class=\"cm-operator\">++<\/span>){<\/span>\n<span role=\"presentation\">    <span class=\"cm-comment\">\/\/ Creation of a new row.<\/span><\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$row_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>] <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$collection<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">create_row<\/span>();<\/span>\n<span role=\"presentation\">    <span class=\"cm-comment\">\/\/ Setting the fields of the row.<\/span><\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$row_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]<span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">set_field_by_integer<\/span>(<span class=\"cm-number\">0<\/span>, <span class=\"cm-variable-2\">$i<\/span>);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$row_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]<span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">set_field_by_integer<\/span>(<span class=\"cm-number\">1<\/span>, <span class=\"cm-variable-2\">$age_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$row_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]<span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">set_field_by_string<\/span>(<span class=\"cm-number\">2<\/span>, <span class=\"cm-variable-2\">$emp_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$row_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]<span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">set_field_by_integer<\/span>(<span class=\"cm-number\">3<\/span>, <span class=\"cm-variable-2\">$wgt_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$row_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]<span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">set_field_by_string<\/span>(<span class=\"cm-number\">4<\/span>, <span class=\"cm-variable-2\">$edc_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$row_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]<span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">set_field_by_string<\/span>(<span class=\"cm-number\">5<\/span>, <span class=\"cm-variable-2\">$fam_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$row_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]<span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">set_field_by_string<\/span>(<span class=\"cm-number\">6<\/span>, <span class=\"cm-variable-2\">$job_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$row_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]<span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">set_field_by_string<\/span>(<span class=\"cm-number\">7<\/span>, <span class=\"cm-variable-2\">$rel_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$row_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]<span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">set_field_by_string<\/span>(<span class=\"cm-number\">8<\/span>, <span class=\"cm-variable-2\">$rce_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$row_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]<span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">set_field_by_string<\/span>(<span class=\"cm-number\">9<\/span>, <span class=\"cm-variable-2\">$gnd_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$row_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]<span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">set_field_by_string<\/span>(<span class=\"cm-number\">10<\/span>, <span class=\"cm-variable-2\">$cid_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$row_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]<span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">set_field_by_string<\/span>(<span class=\"cm-number\">11<\/span>, <span class=\"cm-variable-2\">$inc_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]);<\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-comment\">\/\/ Place the new row in the container.<\/span><\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$collection<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">put_row<\/span>(<span class=\"cm-variable-2\">$row_list<\/span>[<span class=\"cm-variable-2\">$i<\/span>]);<\/span>\n<span role=\"presentation\">}\n<\/span><\/pre>\n<h3>Data Modification<\/h3>\n<p><span class=\"md-plain\">If one were to realise later on that there is no need, for some of this data, it can of course simply be deleted again. Similarly if something is missing, it can just be added later on.<\/span><span class=\"md-softbreak\"><br \/>\n<\/span><span class=\"md-plain\">To delete or add a column, the put_container function is called again with the updated scheme, and parsed &#8220;true&#8221; as an additional argument to set it to update.<\/span><span class=\"md-softbreak\"><br \/>\n<\/span><span class=\"md-plain\">Shown below is the deletion of a unwanted column (compare with above, the fnlwgt column is removed).<\/span><\/p>\n<pre class=\"md-fences md-end-block ty-contain-cm modeLoaded\" lang=\"php\" spellcheck=\"false\"><span role=\"presentation\"><span class=\"cm-variable-2\">$collection<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$gridstore<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">put_container<\/span>(<span class=\"cm-variable-2\">$containerName<\/span>,<\/span>\n<span role=\"presentation\">    <span class=\"cm-keyword\">array<\/span>(<\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0      <span class=\"cm-comment\">\/\/ Definition of the new contianer layout scheme.<\/span><\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"id\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_INTEGER<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"age\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_INTEGER<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"workclass\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"education\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"familiy\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"occupation\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"relationship\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"race\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"gender\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"nation\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            <span class=\"cm-keyword\">array<\/span>(<span class=\"cm-string\">\"income_status\"<\/span> <span class=\"cm-operator\">=&gt;<\/span> <span class=\"cm-variable\">GS_TYPE_STRING<\/span>),<\/span>\n<span role=\"presentation\">            ),<\/span>\n<span role=\"presentation\">     \u00c2\u00a0<span class=\"cm-variable\">GS_CONTAINER_COLLECTION<\/span>, <span class=\"cm-atom\">true<\/span>); <\/span><\/pre>\n<p><span class=\"md-plain\">One thing to note is that you can not change the type of a column. Instead create a new column with a different name. Switching types would force the database to execute some implicit type conversions, an error prone operation and thus not supported.<\/span><\/p>\n<p><span class=\"md-plain\">More rows can be inserted in the exact same way as shown previously, see the section &#8220;Data Insertion&#8221;. <\/span><\/p>\n<p><span class=\"md-plain\">In order to delete a row it is necessary to first fetch said row. Like before first one selects the container, then the row via specification of an appropriate TQL query. Attention needs to be paid to the fact that we will have to set auto-commit to false, before the query and fetch commands are executed. Then finally the selected row is deleted. <\/span><span class=\"md-plain\">We will use this to delete all rows which have incomplete data entries.<\/span><\/p>\n<pre class=\"md-fences md-end-block ty-contain-cm modeLoaded\" lang=\"php\" spellcheck=\"false\"><span role=\"presentation\"><span class=\"cm-variable-2\">$container_name<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-string\">\"income_ml\"<\/span>;<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$query_string<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-string\">\"SELECT * WHERE workclass='?' OR education='?' OR familiy='?'<\/span><\/span>\n<span role=\"presentation\">        <span class=\"cm-string\">OR occupation='?' OR relationship='?' OR race='?' OR gender='?' OR nation='?'\"<\/span>;<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$collection<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$gridstore<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">get_container<\/span>(<span class=\"cm-variable-2\">$container_name<\/span>);<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$collection<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">set_auto_commit<\/span>(<span class=\"cm-atom\">false<\/span>);<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$query<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$collection<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">query<\/span>(<span class=\"cm-variable-2\">$query_string<\/span>);<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$rows<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$query<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">fetch<\/span>(<span class=\"cm-variable-2\">$update<\/span>);<\/span>\n<span role=\"presentation\"><span class=\"cm-keyword\">while<\/span> (<span class=\"cm-variable-2\">$rows<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">has_next<\/span>()) {<\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-comment\">\/\/ Select a row.<\/span><\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$row<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$collection<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">create_row<\/span>();<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$rows<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">get_next<\/span>(<span class=\"cm-variable-2\">$row<\/span>);<\/span>\n<span role=\"presentation\">    <span class=\"cm-comment\">\/\/ Delete the selected row.<\/span><\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$rows<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">delete_current<\/span>();<\/span>\n<span role=\"presentation\">}<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$collection<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">commit<\/span>();<\/span><\/pre>\n<p><span class=\"md-plain\">Rows can also be deleted by specifying the row-key instead of performing a search, for details consult the <a href=\"http:\/\/docs.griddb.net\/GridDB_Java_API_Reference.html\">GridDB documentation<\/a>.<\/span><\/p>\n<h2>No Machine Learning without Data Bias<\/h2>\n<p><span class=\"md-plain\">With GridDB we can easily get an overview over the data that is in the database. Using one of the many chart libraries that exist for php we can also visualize this to help our understanding. The source code of all the charts in this blog is available <a href=\"https:\/\/griddb.net\/en\/download\/26658\">here<\/a>. Furthermore you can download the chart library needed to create these, and read its documentation on the website of <a href=\"https:\/\/www.fusioncharts.com\/php-charts?framework=php\">FusionCharts<\/a>.<\/span><\/p>\n<p>Using said charts to take<span class=\"md-plain\"> a deeper look shows the data is biased in various ways. Comparatively few subjects are women. Especially, there are significantly fewer female subjects with a high income.<\/span><\/p>\n<p><img fetchpriority=\"high\" decoding=\"async\" class=\"aligncenter wp-image-26584 size-full\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2020\/06\/php_ml_blog_fm.png\" alt=\"Chart showing female\/male bias in the data for the machine learning algorithm.\" width=\"853\" height=\"447\" srcset=\"\/wp-content\/uploads\/2020\/06\/php_ml_blog_fm.png 853w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_fm-300x157.png 300w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_fm-768x402.png 768w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_fm-600x314.png 600w\" sizes=\"(max-width: 853px) 100vw, 853px\" \/><\/p>\n<p><span class=\"md-plain\">Other problematic data points are work class, race and nationality. All of which feature a dominant subgroup.<\/span><\/p>\n<p><img decoding=\"async\" class=\"aligncenter wp-image-26598 size-full\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2020\/06\/php_ml_blog_wcl_rce_nat.png\" alt=\"Charts showing biases that might drastically affect the machine learning algorithm.\" width=\"936\" height=\"611\" srcset=\"\/wp-content\/uploads\/2020\/06\/php_ml_blog_wcl_rce_nat.png 936w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_wcl_rce_nat-300x196.png 300w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_wcl_rce_nat-768x501.png 768w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_wcl_rce_nat-600x392.png 600w\" sizes=\"(max-width: 936px) 100vw, 936px\" \/><\/p>\n<p><span class=\"md-plain\">A better distribution can be found in regards to education, family, occupation and relationships.<\/span><\/p>\n<p class=\"md-end-block md-p\" style=\"box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; white-space: pre-wrap; position: relative; color: #333333; font-family: 'Open Sans', 'Clear Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;\"><img decoding=\"async\" class=\"aligncenter wp-image-26599 size-full\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2020\/06\/php_ml_blog_edu_fam.png\" alt=\"Charts showing education and family biases.\" width=\"931\" height=\"540\" srcset=\"\/wp-content\/uploads\/2020\/06\/php_ml_blog_edu_fam.png 931w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_edu_fam-300x174.png 300w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_edu_fam-768x445.png 768w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_edu_fam-600x348.png 600w\" sizes=\"(max-width: 931px) 100vw, 931px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-26600 size-full\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2020\/06\/php_ml_blog_occ_rel.png\" alt=\"Charts showing occupation and relationship biases.\" width=\"935\" height=\"458\" srcset=\"\/wp-content\/uploads\/2020\/06\/php_ml_blog_occ_rel.png 935w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_occ_rel-300x147.png 300w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_occ_rel-768x376.png 768w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_occ_rel-600x294.png 600w\" sizes=\"(max-width: 935px) 100vw, 935px\" \/><\/p>\n<h3>Adjusting the Biases<\/h3>\n<p><span class=\"md-plain\">With the help of GridDB and TQL we can shift the distribution of the data that we will feed into the machine learning algorithm. For example, we might want to create a balance between data points that are female and those that are male. For this we will define a $train_query_limit variable. Which will specify how many rows fulfilling the query request are fetched. And thus in effect will allow us to retrieve an equal amount of data points for both genders.<\/span><\/p>\n<pre class=\"md-fences md-end-block ty-contain-cm modeLoaded\" lang=\"php\" spellcheck=\"false\"><span role=\"presentation\"> <span class=\"cm-variable-2\">$query_list<\/span>[<span class=\"cm-number\">0<\/span>] <span class=\"cm-operator\">=<\/span> <span class=\"cm-string\">\"SELECT * WHERE (gender='Male') LIMIT \"<\/span> . <span class=\"cm-variable-2\">$train_query_limit<\/span>;<\/span>\n<span role=\"presentation\"> <span class=\"cm-variable-2\">$query_list<\/span>[<span class=\"cm-number\">1<\/span>] <span class=\"cm-operator\">=<\/span> <span class=\"cm-string\">\"SELECT * WHERE (gender='Female') LIMIT\"<\/span> . <span class=\"cm-variable-2\">$train_query_limit<\/span>;<\/span><\/pre>\n<p><span class=\"md-plain\">Fetching data is analogous to writing data to the database. The columns are in the same order <\/span><span class=\"md-plain\">as they were set in the scheme.<\/span><span class=\"md-softbreak\"><br \/>\n<\/span><span class=\"md-plain\">Before doing so it might be necessary to increase the PHP memory limit depending on your php.ini <\/span><span class=\"md-plain\">settings. It is possible to either set the available memory to a bigger value, or to simply <\/span><span class=\"md-plain\">remove the limit in its entirety. Considering that the machine learning algorithm will require even more memory later on, the code below specifies no limit.<\/span><\/p>\n<pre class=\"md-fences md-end-block ty-contain-cm modeLoaded\" lang=\"php\" spellcheck=\"false\"><span role=\"presentation\"><span class=\"cm-comment\">\/\/ Remove the memory limit temporarily.<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-builtin\">ini_set<\/span>(<span class=\"cm-string\">'memory_limit'<\/span>, <span class=\"cm-string\">'-1'<\/span>);<\/span>\n<span role=\"presentation\"><span class=\"cm-comment\">\/\/ Loop through all rows returned by the query.<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-keyword\">while<\/span> (<span class=\"cm-variable-2\">$rows<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">has_next<\/span>()) {<\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-comment\">\/\/ Update row.<\/span><\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$rows<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">get_next<\/span>(<span class=\"cm-variable-2\">$row<\/span>);<\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-comment\">\/\/ Write fields into temporary variables.<\/span><\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$age<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$row<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">get_field_as_integer<\/span>(<span class=\"cm-number\">1<\/span>);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$employement<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$row<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">get_field_as_string<\/span>(<span class=\"cm-number\">2<\/span>);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$education<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$row<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">get_field_as_string<\/span>(<span class=\"cm-number\">3<\/span>);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$family<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$row<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">get_field_as_string<\/span>(<span class=\"cm-number\">4<\/span>);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$occupation<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$row<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">get_field_as_string<\/span>(<span class=\"cm-number\">5<\/span>);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$relationship<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$row<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">get_field_as_string<\/span>(<span class=\"cm-number\">6<\/span>);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$race<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$row<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">get_field_as_string<\/span>(<span class=\"cm-number\">7<\/span>);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$gender<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$row<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">get_field_as_string<\/span>(<span class=\"cm-number\">8<\/span>);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$country<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$row<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">get_field_as_string<\/span>(<span class=\"cm-number\">9<\/span>);<\/span>\n<span role=\"presentation\">    <span class=\"cm-variable-2\">$income_status<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$row<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">get_field_as_string<\/span>(<span class=\"cm-number\">10<\/span>);<\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-comment\">\/\/ Write to array.<\/span><\/span>\n<span role=\"presentation\"> \u00c2\u00a0  ...<\/span>\n<span role=\"presentation\">}<\/span><\/pre>\n<p><span class=\"md-plain\">Collecting the data points into two arrays, one for the input data called <\/span><span class=\"md-pair-s\" spellcheck=\"false\"><code>$samples<\/code><\/span><span class=\"md-plain\"> and one for the expected results called <\/span><span class=\"md-pair-s\" spellcheck=\"false\"><code>$targets<\/code><\/span><span class=\"md-plain\">, allows us to feed them then to machine learning algorithm.<\/span><\/p>\n<h2>Training and Prediction<\/h2>\n<p>A<span class=\"md-plain\"> support vector machine, is the machine learning algorithm deployed here, it is referenced in the library as SVC (support vector classification). In this blog we will use it with RBF kernel. Naturally other kernels, and algorithms are available in the php-ai\/php-ml library.<\/span><\/p>\n<p><span class=\"md-plain\">For the algorithm to understand the data, it will need to be represented as a feature array. For that purpose the raw data will undergo vectorization and tf-idf transformation. Afterwards a small portion of the train dataset will be set aside as test dataset to provide a first impression of the models performance. If it is unsatisfying then the algorithm variables need to be adjusted accordingly.<\/span><\/p>\n<pre class=\"md-fences md-end-block ty-contain-cm modeLoaded\" lang=\"php\" spellcheck=\"false\"><span role=\"presentation\"><span class=\"cm-comment\">\/\/ Tokenizing the raw data.<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$vectorizer<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-keyword\">new<\/span> <span class=\"cm-variable\">TokenCountVectorizer<\/span>(<span class=\"cm-keyword\">new<\/span> <span class=\"cm-variable\">WordTokenizer<\/span>());<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$vectorizer<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">fit<\/span>(<span class=\"cm-variable-2\">$samples<\/span>);<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$vectorizer<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">transform<\/span>(<span class=\"cm-variable-2\">$samples<\/span>);<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$tf_idf_transformer<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-keyword\">new<\/span> <span class=\"cm-variable\">TfIdfTransformer<\/span>();<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$tf_idf_transformer<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">fit<\/span>(<span class=\"cm-variable-2\">$samples<\/span>);<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$tf_idf_transformer<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">transform<\/span>(<span class=\"cm-variable-2\">$samples<\/span>);<\/span>\n<span role=\"presentation\">\u00e2\u20ac\u2039<\/span>\n<span role=\"presentation\"><span class=\"cm-comment\">\/\/ Setting up the train and test datasets.<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$dataset<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-keyword\">new<\/span> <span class=\"cm-variable\">ArrayDataset<\/span>(<span class=\"cm-variable-2\">$samples<\/span>, <span class=\"cm-variable-2\">$targets<\/span>);<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$random_split<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-keyword\">new<\/span> <span class=\"cm-variable\">StratifiedRandomSplit<\/span>(<span class=\"cm-variable-2\">$dataset<\/span>, <span class=\"cm-number\">0.1<\/span>);<\/span>\n<span role=\"presentation\">\u00e2\u20ac\u2039<\/span>\n<span role=\"presentation\"><span class=\"cm-comment\">\/\/ Selection of the algorithm and its kernel.<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$classifier<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-keyword\">new<\/span> <span class=\"cm-variable\">SVC<\/span>(<\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-variable\">Kernel<\/span>::<span class=\"cm-variable\">RBF<\/span>,    <span class=\"cm-comment\">\/\/ kernel<\/span><\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-number\">1.0<\/span>, \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-comment\">\/\/ cost<\/span><\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-number\">3<\/span>, \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-comment\">\/\/ degree<\/span><\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-atom\">null<\/span>, \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 <span class=\"cm-comment\">\/\/ gamma<\/span><\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-number\">0.0<\/span>, \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-comment\">\/\/ coef 0<\/span><\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-number\">0.001<\/span>, \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-comment\">\/\/ tolerance<\/span><\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-number\">100<\/span>, \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-comment\">\/\/ cache size<\/span><\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-atom\">true<\/span>, \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 <span class=\"cm-comment\">\/\/ use shrinking<\/span><\/span>\n<span role=\"presentation\"> \u00c2\u00a0 \u00c2\u00a0<span class=\"cm-atom\">false<\/span> \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 <span class=\"cm-comment\">\/\/ generate probability estimates<\/span><\/span>\n<span role=\"presentation\">);<\/span>\n<span role=\"presentation\">\u00e2\u20ac\u2039<\/span>\n<span role=\"presentation\"><span class=\"cm-comment\">\/\/ Train and prediction.<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$classifier<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">train<\/span>(<span class=\"cm-variable-2\">$random_split<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">getTrainSamples<\/span>(), <span class=\"cm-variable-2\">$random_split<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">getTrainLabels<\/span>());<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$predicted_labels<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$classifier<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">predict<\/span>(<span class=\"cm-variable-2\">$random_split<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">getTestSamples<\/span>());<\/span>\n<span role=\"presentation\">\u00e2\u20ac\u2039<\/span>\n<span role=\"presentation\"><span class=\"cm-comment\">\/\/ Accuracy score:<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$accuracy<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable\">Accuracy<\/span>::<span class=\"cm-variable\">score<\/span>(<span class=\"cm-variable-2\">$predicted_labels<\/span>, <span class=\"cm-variable-2\">$random_split<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">getTestLabels<\/span>());<\/span><\/pre>\n<h3>Validation of the Machine Learning Algorithm<\/h3>\n<p><span class=\"md-plain\">Once a good accuracy score is reported on the test dataset it is time to check the model against a validation dataset. We can create such a dataset easily by fetching additional data points from the GridDB. All data points for a validation-set should be unknown to the model. As such it is important to ensure that the chosen query returns no data points which have been used already. <\/span><\/p>\n<p><span class=\"md-plain\">This can be achieved for example with GridDB&#8217;s indexes, row-keys or the OFFSET keyword.<\/span><\/p>\n<pre class=\"md-fences md-end-block ty-contain-cm modeLoaded\" lang=\"php\" spellcheck=\"false\"><span role=\"presentation\"><span class=\"cm-comment\">\/\/ Train\/Test dataset queries:<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-string\">\"SELECT * WHERE (gender='Male') LIMIT \"<\/span> . <span class=\"cm-variable-2\">$train_query_limit<\/span>;<\/span>\n<span role=\"presentation\"><span class=\"cm-string\">\"SELECT * WHERE (gender='Female') LIMIT \"<\/span> . <span class=\"cm-variable-2\">$train_query_limit<\/span>;<\/span>\n<span role=\"presentation\">\u00e2\u20ac\u2039<\/span>\n<span role=\"presentation\"><span class=\"cm-comment\">\/\/ Validation dataset queries:<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-string\">\"SELECT * WHERE (gender='Male') OFFSET \"<\/span> . <span class=\"cm-variable-2\">$train_query_limit<\/span> . <span class=\"cm-string\">\"<\/span> <span class=\"cm-string\">LIMIT \"<\/span> . <span class=\"cm-variable-2\">$validation_query_limit<\/span>;<\/span>\n<span role=\"presentation\"><span class=\"cm-string\">\"SELECT * WHERE (gender='Female') OFFSET \"<\/span> . <span class=\"cm-variable-2\">$train_query_limit<\/span> . <span class=\"cm-string\">\"<\/span> <span class=\"cm-string\">LIMIT \"<\/span> . <span class=\"cm-variable-2\">$validation_query_limit<\/span>;<\/span><\/pre>\n<p><span class=\"md-plain\">The model can understand the new dataset once it has been transformed by the feature array, which had been previously created from the training data.<\/span><\/p>\n<pre class=\"md-fences md-end-block ty-contain-cm modeLoaded\" lang=\"php\" spellcheck=\"false\"><span role=\"presentation\"><span class=\"cm-comment\">\/\/ Apply the transformations to the validation data.<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$vectorizer<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">transform<\/span>(<span class=\"cm-variable-2\">$validation_samples<\/span>);<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$tf_idf_transformer<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">transform<\/span>(<span class=\"cm-variable-2\">$validation_samples<\/span>);<\/span>\n<span role=\"presentation\">\u00e2\u20ac\u2039<\/span>\n<span role=\"presentation\"><span class=\"cm-comment\">\/\/ Calculate accuracy score for the validation data:<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$validation_predicted_labels<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable-2\">$classifier<\/span><span class=\"cm-operator\">-&gt;<\/span><span class=\"cm-variable\">predict<\/span>(<span class=\"cm-variable-2\">$validation_samples<\/span>);<\/span>\n<span role=\"presentation\"><span class=\"cm-variable-2\">$accuracy<\/span> <span class=\"cm-operator\">=<\/span> <span class=\"cm-variable\">Accuracy<\/span>::<span class=\"cm-variable\">score<\/span>(<span class=\"cm-variable-2\">$validation_targets<\/span>, <span class=\"cm-variable-2\">$validation_predicted_labels<\/span>;<\/span><\/pre>\n<h3>Exploring the Model<\/h3>\n<p><span class=\"md-plain\">To learn more about the created model we will want to test it against various validation data sets. For example we could choose to validate it on specific subjects grouped by education. This does not just potentially tell us about the prediction capabilities of our model, but also of the importance of these characteristics in relation to income.<\/span><\/p>\n<pre class=\"md-fences md-end-block ty-contain-cm modeLoaded\" lang=\"php\" spellcheck=\"false\"><span role=\"presentation\"><span class=\"cm-string\">\"SELECT * WHERE (education='HS-grad') LIMIT \"<\/span> . <span class=\"cm-variable-2\">$validation_query_limit<\/span>;<\/span>\n<span role=\"presentation\"><span class=\"cm-string\">\"SELECT * WHERE (education='Some-college') LIMIT \"<\/span> . <span class=\"cm-variable-2\">$validation_query_limit<\/span>;<\/span>\n<span role=\"presentation\"><span class=\"cm-string\">\"SELECT * WHERE (education='Bachelor') LIMIT \"<\/span> . <span class=\"cm-variable-2\">$validation_query_limit<\/span>;<\/span>\n<span role=\"presentation\">\u00e2\u20ac\u2039<\/span><\/pre>\n<p>In t<span class=\"md-plain\">he chart below the achieved accuracies are depicted. While the test accuracies remain more or less constant, we can see a large variation in the validation accuracies. This shows that the model does indeed make different classification choices based on education. A look at the accuracies reported for a general dataset, one that contains all educations, furthermore illustrates the practical difference between the test and validation datasets. Predictions are not as accurate for the later set. The vectorizer is taking the test dataset into account, as a part of the larger train set, during the construction of the feature array. However it has no knowledge of the validation set, thus can&#8217;t necessarily completely map it onto the feature array itself. <\/span><span class=\"md-plain\">This accuracy gap can be used to adjust bias and variance to achieve a better fitting.<\/span><\/p>\n<p><span class=\"md-plain\">The model does particularly good in the case of Highschool graduates and rather poorly in the case of Bachelor degree holders. It might overvalue education as an indicator in the later case. Therefore it struggles if there&#8217;s a roughly equal likelihood for high or low income in regards to a specific education.<\/span><\/p>\n<p class=\"md-end-block md-p\" style=\"box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; white-space: pre-wrap; position: relative; color: #333333; font-family: 'Open Sans', 'Clear Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-26601 size-full\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2020\/06\/php_ml_blog_accuracy_education.png\" alt=\"Chart showing machine learning model accuracy of data subsets by education.\" width=\"906\" height=\"678\" srcset=\"\/wp-content\/uploads\/2020\/06\/php_ml_blog_accuracy_education.png 906w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_accuracy_education-300x225.png 300w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_accuracy_education-768x575.png 768w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_accuracy_education-600x449.png 600w\" sizes=\"(max-width: 906px) 100vw, 906px\" \/><\/p>\n<h3>Other Data Subsets<\/h3>\n<p><span class=\"md-plain\">Now let&#8217;s take a look at what else we can learn. We do so with the help of a more complicated database query. Here we essentially group together several data subsets. These groupings can be just as helpful as the specific selection we performed before. <\/span><\/p>\n<pre class=\"md-fences md-end-block ty-contain-cm modeLoaded\" lang=\"php\" spellcheck=\"false\"><span role=\"presentation\"><span class=\"cm-comment\">\/\/ Married households:<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-string\">\"SELECT * WHERE (familiy='Married-civ-spouse' OR familiy='Married-AF-spouse') LIMIT \"<\/span> . <span class=\"cm-variable-2\">$validation_query_limit<\/span>;<\/span>\n<span role=\"presentation\">\u00e2\u20ac\u2039<\/span>\n<span role=\"presentation\"><span class=\"cm-comment\">\/\/ Single households:<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-string\">\"SELECT * WHERE (familiy='Divorced' OR familiy='Never-married' OR familiy='Separated' OR familiy='Widowed') LIMIT \"<\/span> . <span class=\"cm-variable-2\">$validation_query_limit<\/span>;<\/span>\n<span role=\"presentation\">\u00e2\u20ac\u2039<\/span>\n<span role=\"presentation\"><span class=\"cm-comment\">\/\/ Physical laborers:<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-string\">\"SELECT * WHERE (occupation='Craft-repair' OR occupation='Handlers-cleaners' OR occupation='Farming-fishing' OR occupation='Priv-house-serv') LIMIT \"<\/span> . <span class=\"cm-variable-2\">$validation_query_limit<\/span>;<\/span>\n<span role=\"presentation\">\u00e2\u20ac\u2039<\/span>\n<span role=\"presentation\"><span class=\"cm-comment\">\/\/ Outliers of the dataset:<\/span><\/span>\n<span role=\"presentation\"><span class=\"cm-string\">\"SELECT * WHERE (NOT race='White' AND NOT nation='United-States' AND NOT nation='Outlying-US(Guam-USVI-etc)') LIMIT \"<\/span> . <span class=\"cm-variable-2\">$validation_query_limit<\/span>;<\/span>\n<span role=\"presentation\">\u00e2\u20ac\u2039<\/span><\/pre>\n<p><span class=\"md-plain md-expand\">Just like before we can see a wide variety of reported accuracies. The model can predict the income of single households very accurately, most likely because it knows the majority of variables concerning said household. The opposite holds true for a married household. The model is not aware of important factors such as the income of the partner, and it thus does poorly without further adjustments to compensate. The accuracy concerning the group titled physical laborers resembles that of a general dataset. Finally the outlier group is remarkably unremarkable. One could conclude that race and origin play only a little role in regards to the income of our subjects. Because of that there is also only little difference in the measured accuracy.<\/span><\/p>\n<p class=\"md-end-block md-p\" style=\"box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; white-space: pre-wrap; position: relative; color: #333333; font-family: 'Open Sans', 'Clear Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-26602 size-full\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2020\/06\/php_ml_blog_accuracy_various.png\" alt=\"Charts showing occupation and relationship biases.\" width=\"870\" height=\"673\" srcset=\"\/wp-content\/uploads\/2020\/06\/php_ml_blog_accuracy_various.png 870w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_accuracy_various-300x232.png 300w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_accuracy_various-768x594.png 768w, \/wp-content\/uploads\/2020\/06\/php_ml_blog_accuracy_various-600x464.png 600w\" sizes=\"(max-width: 870px) 100vw, 870px\" \/><\/p>\n<h2 class=\"md-end-block md-heading\"><span class=\"md-plain\">In Machine Learning Data makes a Difference<\/span><\/h2>\n<p><span class=\"md-plain md-expand\">In this very rudimentary introduction we&#8217;ve seen how GridDB can facilitate the application of machine learning algorithms in php. And we&#8217;ve seen the effect of input data on the accuracy of our model. Fast, reliable, and specific data acquisition is key to modern machine learning and as such often an issue. Given the right tools however, it is achievable with relative ease. If you are interested in exploring the model or the charts further the source code is available <a href=\"https:\/\/griddb.net\/en\/download\/26658\">here<\/a>. <\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The output of any given model in machine learning is only ever as good as its input data. As such it becomes crucial that the algorithm creating it can be fed, at will, with specific data. Data which possibly needs to fulfil a large variety of criteria. And that&#8217;s what databases are for. But not [&hellip;]<\/p>\n","protected":false},"author":780,"featured_media":26584,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[121],"tags":[],"class_list":["post-46606","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.1.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Machine Learning with PHP &amp; GridDB | GridDB: Open Source Time Series Database for IoT<\/title>\n<meta name=\"description\" content=\"The output of any given model in machine learning is only ever as good as its input data. As such it becomes crucial that the algorithm creating it can be\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Machine Learning with PHP &amp; GridDB | GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"og:description\" content=\"The output of any given model in machine learning is only ever as good as its input data. As such it becomes crucial that the algorithm creating it can be\" \/>\n<meta property=\"og:url\" content=\"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/\" \/>\n<meta property=\"og:site_name\" content=\"GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/griddbcommunity\/\" \/>\n<meta property=\"article:published_time\" content=\"2020-06-25T07:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-11-13T20:54:57+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.griddb.net\/wp-content\/uploads\/2020\/06\/php_ml_blog_fm.png\" \/>\n\t<meta property=\"og:image:width\" content=\"853\" \/>\n\t<meta property=\"og:image:height\" content=\"447\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Patrick Ludwig\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:site\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Patrick Ludwig\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/\"},\"author\":{\"name\":\"Patrick Ludwig\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/482ea4ed5d62f2dcd0dc0c7691a6df9f\"},\"headline\":\"Machine Learning with PHP &#038; GridDB\",\"datePublished\":\"2020-06-25T07:00:00+00:00\",\"dateModified\":\"2025-11-13T20:54:57+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/\"},\"wordCount\":1709,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/griddb.net\/en\/#organization\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/#primaryimage\"},\"thumbnailUrl\":\"\/wp-content\/uploads\/2020\/06\/php_ml_blog_fm.png\",\"articleSection\":[\"Blog\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/\",\"url\":\"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/\",\"name\":\"Machine Learning with PHP & GridDB | GridDB: Open Source Time Series Database for IoT\",\"isPartOf\":{\"@id\":\"https:\/\/griddb.net\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/#primaryimage\"},\"thumbnailUrl\":\"\/wp-content\/uploads\/2020\/06\/php_ml_blog_fm.png\",\"datePublished\":\"2020-06-25T07:00:00+00:00\",\"dateModified\":\"2025-11-13T20:54:57+00:00\",\"description\":\"The output of any given model in machine learning is only ever as good as its input data. As such it becomes crucial that the algorithm creating it can be\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/#primaryimage\",\"url\":\"\/wp-content\/uploads\/2020\/06\/php_ml_blog_fm.png\",\"contentUrl\":\"\/wp-content\/uploads\/2020\/06\/php_ml_blog_fm.png\",\"width\":853,\"height\":447,\"caption\":\"Censor data distribution chart concerning female\/male distribution.\"},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/griddb.net\/en\/#website\",\"url\":\"https:\/\/griddb.net\/en\/\",\"name\":\"GridDB: Open Source Time Series Database for IoT\",\"description\":\"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL\",\"publisher\":{\"@id\":\"https:\/\/griddb.net\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/griddb.net\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/griddb.net\/en\/#organization\",\"name\":\"Fixstars\",\"url\":\"https:\/\/griddb.net\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"contentUrl\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"width\":200,\"height\":83,\"caption\":\"Fixstars\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/griddbcommunity\/\",\"https:\/\/x.com\/GridDBCommunity\",\"https:\/\/www.linkedin.com\/company\/griddb-by-toshiba\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/482ea4ed5d62f2dcd0dc0c7691a6df9f\",\"name\":\"Patrick Ludwig\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/a011857002d99311e3fd68ebb5678f5c50c3b6062305eefee5903bf7f55a70e7?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/a011857002d99311e3fd68ebb5678f5c50c3b6062305eefee5903bf7f55a70e7?s=96&d=mm&r=g\",\"caption\":\"Patrick Ludwig\"},\"url\":\"https:\/\/www.griddb.net\/en\/author\/patrick\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Machine Learning with PHP & GridDB | GridDB: Open Source Time Series Database for IoT","description":"The output of any given model in machine learning is only ever as good as its input data. As such it becomes crucial that the algorithm creating it can be","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/","og_locale":"en_US","og_type":"article","og_title":"Machine Learning with PHP & GridDB | GridDB: Open Source Time Series Database for IoT","og_description":"The output of any given model in machine learning is only ever as good as its input data. As such it becomes crucial that the algorithm creating it can be","og_url":"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/","og_site_name":"GridDB: Open Source Time Series Database for IoT","article_publisher":"https:\/\/www.facebook.com\/griddbcommunity\/","article_published_time":"2020-06-25T07:00:00+00:00","article_modified_time":"2025-11-13T20:54:57+00:00","og_image":[{"width":853,"height":447,"url":"https:\/\/www.griddb.net\/wp-content\/uploads\/2020\/06\/php_ml_blog_fm.png","type":"image\/png"}],"author":"Patrick Ludwig","twitter_card":"summary_large_image","twitter_creator":"@GridDBCommunity","twitter_site":"@GridDBCommunity","twitter_misc":{"Written by":"Patrick Ludwig","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/#article","isPartOf":{"@id":"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/"},"author":{"name":"Patrick Ludwig","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/482ea4ed5d62f2dcd0dc0c7691a6df9f"},"headline":"Machine Learning with PHP &#038; GridDB","datePublished":"2020-06-25T07:00:00+00:00","dateModified":"2025-11-13T20:54:57+00:00","mainEntityOfPage":{"@id":"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/"},"wordCount":1709,"commentCount":0,"publisher":{"@id":"https:\/\/griddb.net\/en\/#organization"},"image":{"@id":"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2020\/06\/php_ml_blog_fm.png","articleSection":["Blog"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/","url":"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/","name":"Machine Learning with PHP & GridDB | GridDB: Open Source Time Series Database for IoT","isPartOf":{"@id":"https:\/\/griddb.net\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/#primaryimage"},"image":{"@id":"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2020\/06\/php_ml_blog_fm.png","datePublished":"2020-06-25T07:00:00+00:00","dateModified":"2025-11-13T20:54:57+00:00","description":"The output of any given model in machine learning is only ever as good as its input data. As such it becomes crucial that the algorithm creating it can be","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/blog\/machine-learning-php-griddb\/#primaryimage","url":"\/wp-content\/uploads\/2020\/06\/php_ml_blog_fm.png","contentUrl":"\/wp-content\/uploads\/2020\/06\/php_ml_blog_fm.png","width":853,"height":447,"caption":"Censor data distribution chart concerning female\/male distribution."},{"@type":"WebSite","@id":"https:\/\/griddb.net\/en\/#website","url":"https:\/\/griddb.net\/en\/","name":"GridDB: Open Source Time Series Database for IoT","description":"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL","publisher":{"@id":"https:\/\/griddb.net\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/griddb.net\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/griddb.net\/en\/#organization","name":"Fixstars","url":"https:\/\/griddb.net\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/","url":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","contentUrl":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","width":200,"height":83,"caption":"Fixstars"},"image":{"@id":"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/griddbcommunity\/","https:\/\/x.com\/GridDBCommunity","https:\/\/www.linkedin.com\/company\/griddb-by-toshiba"]},{"@type":"Person","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/482ea4ed5d62f2dcd0dc0c7691a6df9f","name":"Patrick Ludwig","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/a011857002d99311e3fd68ebb5678f5c50c3b6062305eefee5903bf7f55a70e7?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a011857002d99311e3fd68ebb5678f5c50c3b6062305eefee5903bf7f55a70e7?s=96&d=mm&r=g","caption":"Patrick Ludwig"},"url":"https:\/\/www.griddb.net\/en\/author\/patrick\/"}]}},"_links":{"self":[{"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/posts\/46606","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/users\/780"}],"replies":[{"embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/comments?post=46606"}],"version-history":[{"count":1,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/posts\/46606\/revisions"}],"predecessor-version":[{"id":51290,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/posts\/46606\/revisions\/51290"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/media\/26584"}],"wp:attachment":[{"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/media?parent=46606"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/categories?post=46606"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.griddb.net\/en\/wp-json\/wp\/v2\/tags?post=46606"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}