Showing posts with label Databases. Show all posts
Showing posts with label Databases. Show all posts

2008-06-17

How to Truncate All or Some of the Tables in a MySQL Database

Truncating all tables in a database is a common problem which arises a lot during testing or debugging.
One of the most common answers to this question is to drop & recreate the database, most likely utilizing the shell. For example, something like this:

mysqldump --add-drop-table --no-data [dbname] | mysql [dbname]

This dumps the entire schema structure to disk, without dumping any data, and with commands for dropping existing tables. When loading it back into mysql, it essentially truncates all the tables in the database. Basically, this is a decent solution for many uses.

We had a requirement for a solution that needed these additional features:

  1. Does not require shell (We work with both Linux and Windows)
  2. Resides inside the MySQL Server (To minimize outside dependencies - for example - the mysql command line client)
  3. Can truncate only specified tables using a some sort of filter

In other words, the main requirement here is encapsulation and simplicity of use for the developers.
Basically, this means I've got no other choice except writing a stored procedure for this purpose, ugly as it may be. Specifically, this will require using the pseudo-dynamic SQL in MySQL (more about this later) and a server-side cursor.

The code of the Stored Procedure:

DELIMITER $$
CREATE PROCEDURE TruncateTables()
BEGIN
DECLARE done BOOL DEFAULT FALSE;
DECLARE truncate_command VARCHAR(512);
DECLARE truncate_cur
CURSOR FOR /*This is the query which selects the tables we want to truncate*/
SELECT CONCAT('TRUNCATE TABLE ',table_name)
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME LIKE 'prefix_%';
DECLARE
CONTINUE HANDLER FOR
SQLSTATE '02000'
SET done = TRUE;

OPEN truncate_cur;

truncate_loop: LOOP
FETCH truncate_cur INTO truncate_command;
SET @truncate_command = truncate_command;

IF done THEN
CLOSE truncate_cur;
LEAVE truncate_loop;
END IF;

/*Main part - preparing and executing the statement*/
PREPARE truncate_command_stmt FROM @truncate_command;
EXECUTE truncate_command_stmt;

END LOOP;
END$$

DELIMITER ;

This is a generalized version, you can (and should) modify it to suit your needs.

Regarding dynamic SQL. Initially I've tried to make this query a lot simpler, and use something along the lines of the previous blog post. This would look something like:

DELIMITER $$
CREATE PROCEDURE TruncateTables()
BEGIN /*Build a long user variable containing the varchars*/
SELECT GROUP_CONCAT('TRUNCATE TABLE ',table_name SEPARATOR ';')
INTO @truncate_command
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME LIKE 'prefix_%';

SET @truncate_command = CONCAT(@truncate_command,';');

PREPARE truncate_command_stmt FROM @truncate_command;
EXECUTE truncate_command_stmt;

END$$

DELIMITER ;

The main (and unsolvable) problem with this code is that the PREPARE cannot prepare more than one statement. In other words, while in other databases you could have sent an entire batch (real dynamic SQL), in MySQL the only way to do this is to run statement by statement. On the other hand, there is no GROUP_CONCAT() in other databases (even though it can apparently be created with ease in PostgreSQL).

The second and less worrying problem is the fact that by default, GROUP_CONCAT() result tends to be limited to a certain length. This is easily solved by setting group_concat_max_len to something higher than default.

End Transmission.

2008-05-31

Taking Cache Warm-up Time Into Consideration

Last week there there was a short scheduled downtime for the system, and I've used the chance to upgrade the MySQL servers to a newer version. We've tested it for more than a month before deploying to production, so I wasn't worried about any potential problems.

The upgrade itself went smoothly and was complete in about less than 10 minutes. Speaking of which, this is one of the things I like in MySQL upgrades as opposed to SQL Server.

  1. Copy new version
  2. Shut down the server
  3. Rename directories
  4. Start server
  5. Run mysql_upgrade
  6. Restart server

All of the above usually takes about 2 minutes, and can be reverted quite easily, if you have backup for the data files. This is compared to a SQL Server upgrade, or even a Service Pack install which includes running an installer for what can be 20 minutes or more. Uninstalling is equality lengthy, though both are accompanied by nice and useless progress bars.

After the upgrade was complete, we started the application and it wouldn't work - it was reporting timeouts. Why? Was there a problem with the upgrade? Some bug we haven't found out about? The circumstantial evidence pointed directly to the upgrade as the problem. After some digging by Pasha, the problematic query was found - it was a group by query updating some counter table. It ran in under 5 seconds before, but now it just wouldn't finish in the 30sec timeframe before timing out.

This immediately pointer to the real cause of the problem: the server restart. The buffer cache was flushed, and the index scanned by this query was not in memory. Fixing was easy, I ran the problematic query manually (it took almost 2 minutes to run). After this, the application worked again. Problem solved? Not really.

Queries of this sort should almost never exist in a system. This specific query was long since been noticed due to proactive slow-log monitoring. This problem was (ironically) fixed on the day of this incident, but the newer version was not yet deployed.

Conclusion: Think of how your application works with a cold database cache. If you need to warm it up, do so. A better path is to design the database more efficiently and avoid certain types of queries. In this case, the counter table was now updated on batches of inserts and deletes - instead of periodically.

2008-04-26

Using the MySQL Event Scheduler to Emulate Threads

The MySQL Event Schedule is one of the new features in MySQL 5.1. It is explained well in this article from the MySQL site, and in the manual itself. I'd like to demonstrate an interesting way to use it. What gave me this idea is actually a feature in SQL Server 2005, called Service Broker. It's a sort of queue manager / messaging platform, which can also be exploited in the same way.

Lets say you are writing a stored procedure, that loads data into the database. Benchmarks show that loading data in parallel from several threads increases overall throughput, as MySQL is unable to utilize several CPUs to process a single SELECT query, moreover a bulk load operation. To work in parallel, you'd probably need to write a script that uses threads, open several connections to the database, and perform those queries in parallel.

I propose an alternative.

The MySQL Event Scheduler is capable of performing several tasks at once, since it's highly probable that several events might need to run at the same time. Why not use it?
What we will create, is a "self-destructible" event, that performs a task once and disappears. The syntax that allows this is self explanatory:

CREATE EVENT IF NOT EXISTS event_name ON SCHEDULE AT NOW() ON COMPLETION NOT PRESERVE
DO statement_or_batch;

I don't think this is the place to descend into the implementation detail of using it for data loads, so I won't. Note, however, how flexible it is.

  1. You could use the event scheduler to perform several SELECT statements in parallel into some summary or union table, and collect the returned result afterwards.
  2. To see if an event is still running, you can poll the mysql.event system table for the event name, or names (when waiting for several events to complete).
  3. Combine it with the FEDERATED engine. That is one feature which is seriously in need of more development efforts. It is not even included in 5.1.24 due to severe problems. When it goes back to a stable status, you could use events to execute remote queries in your cluster in parallel - with a single command, from a single server.
  4. Use it for an unordered set of commands that your application shouldn't wait for at all, or it doesn't care when it gets done.
  5. Use it to delay execution, just replace NOW() with NOW() + INTERVAL 5 MINUTE, as demonstrated earlier.

The possibilities are endless. Well, not endless, but I'm sure that the event schedule can be tricked into doing a lot more cool things than these... If you can think of anything of that sort, I'd love to hear it.

Some points:

  1. As the mysql.events table is using the MyISAM engine, I'm not sure how transactional adding an event is. Most likely, it isn't.
  2. If you want to test the scheduler, make sure it's turned on (SET GLOBAL EVENT_SCHEDULER = ON).
  3. Each event run adds two lines to the error log (hostname.err) file reporting start and end of the event. This might be a problem similar to leaving the general log on, when using events extensively.

2008-03-20

Two MySQL Monitoring Tools and a Hidden Knowledge Base

This post is incomplete without mentioning the previous one about development tools, so make sure you read it before/after this.
In this post, I'd like to mention two monitoring tools, and one knowledge base.

Two of the tools are made by Quest Software/ToadSoft that make Toad for MySQL. Both of them are Freeware.
A side note: TOAD was born as "Tool for Oracle Application Developers". Because "Tool for Oracle Application Developers for MySQL" sounds weird, TOAD is known officially as "Tool For Application Developers".

Spotlight for MySQL
Spotlight is a tool perhaps more known among SQL Server and Oracle users. It shows you the entire flow of data throughout your database in a nicely laid out executive-summary kind of way. It has lots of colors and it's impressive to look at. When it works. Seriously, I've had a lot of trouble with it, to make it work the way it should too many times. It's also slow as hell, and tends to hang or crash, like other tools by Quest. But, when it does work - it's very nice. Quest released this tool for MySQL as well, and since it's free it doesn't hurt to try it out.

Knowledge Xpert for MySQL
This is basically a well laid-out knowledge base, of what Quest Software thinks you should know about MySQL. Full of useful examples, and clear explanations for things that the MySQL manual doesn't mention. I recommend installing it and taking a look, you might find something interesting. It was helpful to me when I started learning MySQL.
Warning: practically this is just one big help file, but Quest decided to bundle an annoying interface, a slow installer and one of those load-on-startup-take-memory-do-nothing-system-tray-icon which you need to get rid off after installation.

The next one is made by Webyog, (the same guys that make SQLyog which I mentioned before).

MONyog
MONyog is a small program with a web front-end which can monitor several MySQL servers and supports some customizable metrics. I won't go into detail on the good and bad things in MONyog here (I plan to write a review of it later on), but I do suggest you try the trial version. MONyog is not free, and doesn't have a free or community "lite" version.
At the time of writing the latest stable version is 1.52, but 2.0 beta 2 is already available and brings additional features like historical reports and better graphs.

2008-03-09

SQL Server Data Services Coming Soon

Microsoft decided to jump into Amazon's territory and provide a data storage web service accessible from anywhere on the internet.
It's not live yet, and you have to register to see the beta when it comes - but it looks interesting.

Take a look at the FAQ. Looks interesting. Note the following points (taken from the FAQ):

  • Benefit: Scale, with virtually no restriction on storage
  • Target Customers: Running applications that have limited or batched data access.
  • Typical scenarios include business solutions like HR services, healthcare records management, data archiving and internet facing applications like social networking and picture sharing.
  • The SQL Server Data Storage Service can store multiple types of data, from birth to archival. Users can upload and query structured data, semi-structured data, stored in flexible entities, and unstructured data. Customers will be able to associate entities with large unstructured data objects (blobs) which could be accessed as URL addressable resources.
  • Data is stored in large storage clusters in various Microsoft data centers located across North America. We are planning to offer the service from international locations such as Europe and Asia. Users can group their data into authorities, which are affiliated with specific data centers and therefore provide control over the geographic location of the data.
  • Easy access with REST and SOAP protocols
There is not much information available about it yet, so I'll leave my speculations for later.

2008-02-28

MySQL Date and Time Arithmetic

One thing that sometimes bugs me in MySQL is the huge amount of errata. The database is packed with features and abilities that are side-effects from "historical decisions". Decisions that were made when it wasn't planned to be an enterprise product, or changes made to ease porting applications to MySQL from other databases. Legacy, so to speak.

The timestamp data type is a good example - the timestamp manual section is a long list of exceptions and special cases. I usually re-read the manual page before using it. Another good example is the amount of date and time functions that exist in MySQL. There are way too many of them, some overlapping in function, and some just exist as aliases to other functions for compatibility. Just go over the list.

On the other hand, this artistic freedom has its merits.
For example, take the date arithmetic. Chances are that if you came from SQL Server background like me, you were used to all the variations on the DATEDIFF and DATEADD commands. While it works as well (the DATEADD function is there and works almost the same way), MySQL supports a much readable form of date and time arithmetic.

For example:

SELECT NOW() - INTERVAL 5 MINUTE

Will return, naturally, current date+time minus 5 minutes. Or:

SELECT NOW() + INTERVAL 10 DAY

You guessed it, this will return the current date+time plus 10 days.

Clean and simple, isn't it? The syntax goes like this:

date + INTERVAL expression unit

The unit can be MINUTE, HOUR, DAY, MONTH, YEAR etc. The expression can be any legal T-SQL expression, and the date is just the date variable, which can be a result from another expression, a variable or a table column. Check out the manual for more examples.

This is an elegant solution to a common problem, which is also easy to remember and use. Why so few people use it is a mystery to me - I've almost never seen it in code samples.

2008-01-29

Opaqueness and the Illusion of Greatness

When you don't know how things work, you image them as how you would want them to be, or according to an impression based on a random fact or advertising.

There is this old joke that goes like this:

When I was a child, I slept well since I knew someone was guarding me.
When I was in the army, I didn't sleep since I was the one guarding.
When I left the army, I didn't sleep since I knew who was guarding me.

If you served in the Army, I presume that you already know the joke and/or can relate to how sad it really is.

"What has this got to do with anything?" you wonder? When using open-source vs. closed source software, this is sometimes the case.

I got it when I was reading Jay Pipe's book "Pro MySQL". He mentions there that although the manual describes different parts of MySQL as different and encapsulated components, working beautifully together - in reality, it's all just a big pile of code, tightly coupled together and not always as modular as you'd like to imagine. He also tells of parts of the engine that were apparently written by different teams with different standards, where the code just looks and feels very different from the rest of the system. I didn't dig into the source code myself to this depth, but I believe him.
If you have worked before on the source code of an open source project, this is probably not a surprise for you - but for me it was a sort of a moment of re-enlightenment. The ones writing the database are people just like you and me, and they even write bad code sometimes. Only in MySQL, it's bad code anyone curious enough can see.

I wasn't used to this type of thinking for years, having grown up in the Microsoft closed-source ecosystem. I had no one from SQL Server come to me and tell me that they did some stupid mistakes here and there, and that the code is a mess. Yeah, there were Service Packs change logs with lists of fixed bugs, but I doubt someone in Microsoft would release a statement in the change log that tells how the sort sometimes didn't work (famous bug in InnoDB a while back).
The solid UI, the great documentation and the overall behavior of the database does its best to hide the implementation details from you. Since it was solid on the outside, I assumed it was solid on the inside. It was sort of anti-FUD. I wanted to believe it's as elegant inside as it seems outside.

I said "re-enlightenment" before, since this kind of realization actually did happen to me before, but I didn't let it sink in completely.
I've been working a lot with SQL Server in the past, and if you are familiar with it as well, you know that you can access the source code of all the internal system stored procedures. There are a lot of them, and usually you don't need to dig into their code, unless you're using some cool undocumented trick or trying to find out if something is a bug. I can tell you one thing - much of the code in those stored procedures is extremely ugly. No proper casing/indentation, hard-coded magic numbers everywhere... Maybe later I'll dig up one of these gems to show to the world. I mean yeah, it works, but still... Yuck.

Thing is, this anti-FUD was so strong, I didn't want to believe that it's that ugly on the inside - so I left my "beliefs" then as they were.

Conclusions:

  1. Don't stay in the army.
  2. Bad programmers can write both open and closed source programs.

2008-01-18

D'bah

When people that are not somehow related to computers ask me what I do - I tell them I'm a programmer. This profession has become enough of a commodity in the last years, so that there everyone know what a programmer looks like. The knowledgeable ones ask in which language I do my programming. Then it gets tricky.

-"Well you see, I'm not really a programmer, I'm actually a DBA!"
-"What's a D'bah?"

Now it's down to two options:

  1. Explain what databases are, and what part they take in the products they know and love. Explain why it is different from programming, and why you have a specific person to do it. Dive into the implicit and explicit duties and responsibilities of a DBA, and describe how a person in this spot interacts with the development team and management.
    Usually this a great choice if you want to be left alone or be met with bored looks.
  2. Say "It's like playing with Lego for money." or "It's like herding data."

How can you explain that being a DBA is different from being a programmer? Maybe it shouldn't be? What really defines a DBA's role?
The answer is the typical and classic reply you'll usually get from a DBA: "Well, that depends".

I'll try and go into more detail about this in another post.

2007-12-14

Amazon Announced SimpleDB

Amazon just added a new product to their cloud computing offering besides EC2 and S3 - the SimpleDB. It's quite an interesting idea.  It's a key-value store, which can also understand the internal structure of the value column. They have a  new set of names for old ideas. translated from RDBMS speak - tables are "domains", rows are "items", values are "attributes". You can get an item by it's identifier (classic key-value store example) and you can also query the "domain" for items with specific "attributes".

I really wonder how fast it's going to be. They are kind of vague on the performance, except the fact they promise it to be "quick", "fast", "high performance" and "real time", but only if you promise to be good and put everything on their cool SLA-less computing cloud.

Interesting notes:

  • The data is limited to 10GB for the duration of the beta.
  • They mention this interesting fact: "Amazon S3 and Amazon SimpleDB use different types of physical storage.  Amazon S3 uses dense storage drives that are optimized for storing larger objects inexpensively.  Amazon SimpleDB stores smaller bits of data and uses less dense drives that are optimized for data access speed." In other words, they use fast 10/15K RPM SCSI drives for the DB and cheap 7200 SATA drives for S3.

Can we do this using an RDBMS?

Yes, you can. You could mimic this system right now by using an RDBMS, up to a degree.

"Amazon SimpleDB automatically indexes all of your data, enabling you to easily query for an item based on attributes and their values.  In the above example, you could submit a query for items where (color = blue AND description = dress shirt), and Amazon SimpleDB would quickly return item 456 as the result." 
Create a table with ID,XML columns. Scan the XML column by using an XPath query, for matching items. In MySQL this can be achieved even now - albeit slowly, since each such action is a heavy table scan with an XPath expression processed for each row. In SQL Server, there is a new XML index type since 2005, which might do just that. It's slower than regular indices on columns, but it works.