MySQL: best practices

This post will contain several tips, for working with MySQL: how to import data to an empty MySQL database faster and also how to fetch data from queries in ways, that are likely to produce less errors.

Setting connection charset
First an important tip for people who are using international characters in their database (like Lithuanian or Chinese letters). If you are using UTF-8 charset in your application, and find out that data is not written or read from the database correctly (most often in this case you will see bogus UTF-8 characters like ą, č, ę, and similar where there should be international characters instead), there is one thing you should do to avoid this.
After you connect to your database using mysql_connect(), run this query:

SET names 'utf8'

This will make sure, that when MySQL talks to your application, it will use the correct charset. You can read more about this in the MySQL documentation.
Actually it is a general best practice to use the UTF-8 charset, not just for international developers, but for everyone.

Importing data
Say you have to import a MySQL database from an .sql dump file, to an empty database. First thing that comes to my mind (and the approach i’ve used before, and im sure many are using too) is to do that by using phpMyAdmin (or PMA for short). But the thing you start to notice soon, is how as the database gets larger, the import time get much much slower. The reason behind this, is that PMA imports an .sql file to the database engine, by simply executing the queries, that are provided in the dump, and this is really the most logical way of doing that. But even though individual queries run fast, when you combine them into hundreds and thousands and hundreds of thousands, the total import time becomes huge. So this way of importing works for small databases (size of a couple of MB). But what do you do, if you have to import a large one (a couple of hundreds of MB) ? There are two main ways of improving speed here:

  1. Make sure you are importing the database, using the MyISAM storage engine, and NOT InnoDB.
  2. Use the command-line mysql utility, instead of PMA.

I’ll explain these two methods in detail:

  1. Make sure you are importing the database, using the MyISAM storage engine, and NOT InnoDB. Importing with the InnoDB engine is very slow. I don’t really know why, but it is. It probably has something to do with InnoDB transaction capabilities, but I’m not sure. I guess it stores data in specific ways, which allows more advanced engine logic and faster, safer retrieval queries, as opposed to the MyISAM way of doing things, where it just stores flat table information, and has no transaction capabilities as a result. So running a time-crucial information system is significantly safer on InnoDB (if you use transactions of course), but it provides a speed penalty for adding or modifying data. So what you should do is (assuming your database does use InnoDB):
    1. run a find/replace operation on the dump file. Search for: ENGINE=InnoDB, and replace it with: ENGINE=MyISAM.
    2. Then, once you have imported the database, just change the table engines back to InnoDB (if necessary). You can do that with PMA, by selecting a table, going to Operations, and then in Table options changing the Storage Engine to InnoDB. Or just run this SQL:

      ALTER TABLE <table-name> ENGINE = InnoDB

      for all the tables. The altering of table storage engine, does take some time too, but it is much less in comparison to what it takes to import a database with the InnoDB engine.

  2. Use the command-line mysql utility, instead of PMA. If you have server shell access, or if you are importing the database at your local development system, just use the command line mysql utility to import the database. This way is much much faster, than importing the database with PMA. First, create an empty database with PMA. Then, run a command like this:

    /path/to/mysql.exe -u database_root_user [-p] database_name < “path_to_database_sql_dump.sql”

    You will need the -p switch, if your root user has a password set (which it really should). You will then be asked for the password, after you hit return. Example (on Windows, with a wamp installation): C:\wamp\bin\mysql\mysql5.0.51b\bin\mysql.exe -u root my_project < “C:\wamp\www\my_project\sql\dump.sql” The command line interpreter will seem to stop responding for a while, but what is actually happening is MySQL is importing your database. And after some time, the importing will finish. A tip for Windows users: after you have typed the first part of the command (up to the < sign), just drag-and-drop the sql dump file to the command line interpreter. The full path to the dump file will now be added. Convenient :)

Some rough statistics: i did some tests to see what is the actual time difference of importing a database dump, using each of these methods. To do that i imported an .sql dump of about 20MB into an empty database several times, using a different method. The database was pretty much a general database, it had 27 tables, some of them had 0 or 20 rows, some had a couple thousand and one had about 24 thousand rows. The total size of the database shown in PMA was 25.3 MiB. The results:

Table storage engine used (for all tables) Method used Time it took to import (approximate, in seconds (+- 2 sec.)
InnoDB phpMyAdmin 03:25
InnoDB command-line mysql utility 02:21
MyISAM phpMyAdmin 00:31
MyISAM command-line mysql utility 00:14

So, the difference of importing the dump using PMA+InnoDB (205 sec.) and mysql+MyISAM (14 sec.) combinations is 1364% (mysql+MyISAM is about 14 times faster than PMA+InnoDB). So, for small databases PMA is OK, but for large ones, mysql+MyISAM is the obvious choice. Also, if you need to convert the database from MyISAM to InnoDB afterwards, you can use this simple php script to do that:

mysql_connect(db_server, db_user, db_password);
$res = mysql_query("SELECT DISTINCT table_name FROM information_schema.columns WHERE table_schema = 'db_name'");
if ($res) {
	while ($table = mysql_fetch_assoc($res))
		mysql_query("ALTER TABLE `db_name`.`{$table['table_name']}` ENGINE = InnoDB");
}

Just replace the db_server, db_user, db_password and db_name here, to the correct values. Also, you will need to use a username, which has SELECT privileges for the information_schema database, and ALTER privileges for the database for which you are doing this change. The time it took to convert the database, which i used for testing, from MyISAM to InnoDB with this script is 13 seconds. So total time of import, using the InnoDB/MyISAM find/replace would be 14 + 13 = 27 seconds. A worthy gain over 205 i think.

Saner data retrieval
Another tip for working with MySQL with PHP is to retrieve data from queries in a more sane way, than the default way that is provided by the mysql_* functions with PHP:

  • When fetching an array of rows, if no rows are fetched, return an empty array instead of NULL. The reason you should do this, is because it is likely that you will then use this array in a foreach statement, to iterate through it. But if you use NULL in a foreach, PHP will throw an error. That’s why it is more sane to use an empty array. It is also more coherent with the functionality of the result retrieval function: you return as many rows as you should, depending on the query. If the result contains 10 rows, you return an array of 10 rows, if the result contains 1 row, you return an array of 1 row. And if the result is 0 rows, you return an array of 0 rows. It is simply more coherent, and there really shouldn’t be a special case of returning NULL. Just use something like this:
    function get_array($q)
    {
      $arr = Array();
    
      $res = mysql_query($q);
      if ($res)
      {
        while ($row = mysql_fetch_assoc($res))
          $arr[] = $row;
      }
    
      return $arr;
    }

    And avoid NULL in general. NULL is evil, it is much better to use the boolean FALSE for returning an error.

  • And if you expect to return one row (say you write a query that returns some row by it’s ID), as an array, then return that array on a succesfull query, and return FALSE on error:
    function get_one($q)
    {
      $one = FALSE;
    
      $res = mysql_query($q);
      if ($res)
      {
        $row = mysql_fetch_assoc($res);
        if (!empty($row))
          $one = $row;
      }
    
      return $one;
    }

These two little things will help you save some time, if you will use them.

A neat thing:

Twitter! It is a social service, for writing short mood messages (up to 140 symbols), and sharing them with others. You can post random thoughts or findings or simply the mood that you are in. You can follow other people and others can follow you. Google has it’s official twitter: http://twitter.com/google, and even the president of the United States has too: http://twitter.com/BarackObama. What are you doing right now? :) You can get fired really fast with Twitter too, as it appears: http://www.msnbc.msn.com/id/29796962/

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 1.00 out of 5)
Loading...

One response to “MySQL: best practices”

  1. Lettie

    Hello, the whole thing is going nicely here and ofcourse every
    one is sharing information, that’s actually fine, keep up writing.

Leave a Reply