2008-04-26

Using the MySQL Event Scheduler to Emulate Threads

The MySQL Event Schedule is one of the new features in MySQL 5.1. It is explained well in this article from the MySQL site, and in the manual itself. I'd like to demonstrate an interesting way to use it. What gave me this idea is actually a feature in SQL Server 2005, called Service Broker. It's a sort of queue manager / messaging platform, which can also be exploited in the same way.

Lets say you are writing a stored procedure, that loads data into the database. Benchmarks show that loading data in parallel from several threads increases overall throughput, as MySQL is unable to utilize several CPUs to process a single SELECT query, moreover a bulk load operation. To work in parallel, you'd probably need to write a script that uses threads, open several connections to the database, and perform those queries in parallel.

I propose an alternative.

The MySQL Event Scheduler is capable of performing several tasks at once, since it's highly probable that several events might need to run at the same time. Why not use it?
What we will create, is a "self-destructible" event, that performs a task once and disappears. The syntax that allows this is self explanatory:

CREATE EVENT IF NOT EXISTS event_name ON SCHEDULE AT NOW() ON COMPLETION NOT PRESERVE
DO statement_or_batch;

I don't think this is the place to descend into the implementation detail of using it for data loads, so I won't. Note, however, how flexible it is.

  1. You could use the event scheduler to perform several SELECT statements in parallel into some summary or union table, and collect the returned result afterwards.
  2. To see if an event is still running, you can poll the mysql.event system table for the event name, or names (when waiting for several events to complete).
  3. Combine it with the FEDERATED engine. That is one feature which is seriously in need of more development efforts. It is not even included in 5.1.24 due to severe problems. When it goes back to a stable status, you could use events to execute remote queries in your cluster in parallel - with a single command, from a single server.
  4. Use it for an unordered set of commands that your application shouldn't wait for at all, or it doesn't care when it gets done.
  5. Use it to delay execution, just replace NOW() with NOW() + INTERVAL 5 MINUTE, as demonstrated earlier.

The possibilities are endless. Well, not endless, but I'm sure that the event schedule can be tricked into doing a lot more cool things than these... If you can think of anything of that sort, I'd love to hear it.

Some points:

  1. As the mysql.events table is using the MyISAM engine, I'm not sure how transactional adding an event is. Most likely, it isn't.
  2. If you want to test the scheduler, make sure it's turned on (SET GLOBAL EVENT_SCHEDULER = ON).
  3. Each event run adds two lines to the error log (hostname.err) file reporting start and end of the event. This might be a problem similar to leaving the general log on, when using events extensively.

4 comments:

  1. Hi,
    you can use GET_LOCK() and the other lock functions instead of polling mysql.event .
    What's the problem of the table being non-tx with executing & finishing?

    Andrey

    ReplyDelete
  2. GET_LOCK() is a great tool, but I don't see how you can use it in this case. Each connection can only get one lock. You can test for existence of many locks, but you can't assure the order in which the events started. What was your idea?

    Regarding the non transactional nature: if you want to do something like:
    START TRANSACTION;
    CREATE event...
    CREATE event...
    COMMIT;
    And you need to be certain all of the events were added, you can't. Moreover, since it's MyISAM, a crash in the middle of the event creation can even corrupt the table.

    Honestly, I really don't understand why MySQL uses MyISAM for the meta-data database.

    ReplyDelete
  3. Hi again,
    MYISAM is the only built-in engine. In the future MMARlA will be the standard engine.
    Maria will offer ACID.
    You are right about the locks , but probably you can poll using different lock names. For instance, every event puts its lock name includinga tx table. Or just when finishing inserting a row in a tx table.

    Andrey

    ReplyDelete
  4. The future is 6.0+, and it's all too far away... :)
    I'd use the built-in InnoDB for the mysql schema if I were to decide, but MySQL AB probably avoids depending on it internally due to the fact it's owned by Oracle.

    ReplyDelete