2011-08-16

How to Quickly Create a Histogram in MySQL

This is a post about a super quick-and-dirty way to create a histogram in MySQL for numeric values.
There are multiple other ways to create histograms that are better and more flexible, using CASE statements and other types of complex logic. This method wins me over time and time again since it's just so easy to modify for each use case, and so short and concise.
This is how you do it:
SELECT ROUND(numeric_value, -2)    AS bucket,
       COUNT(*)                    AS COUNT,
       RPAD('', LN(COUNT(*)), '*') AS bar
FROM   my_table
GROUP  BY bucket;
Just change numeric_value to whatever your column is, change the rounding increment, and that's it. I've made the bars to be in logarithmic scale, so that they don't grow too much when you have large values.
This is an example of such query on some random data that looks pretty sweet. Good enough for a quick evaluation of the data.
+--------+----------+-----------------+
| bucket | count    | bar             |
+--------+----------+-----------------+
|   -500 |        1 |                 |
|   -400 |        2 | *               |
|   -300 |        2 | *               |
|   -200 |        9 | **              |
|   -100 |       52 | ****            |
|      0 |  5310766 | *************** |
|    100 |    20779 | **********      |
|    200 |     1865 | ********        |
|    300 |      527 | ******          |
|    400 |      170 | *****           |
|    500 |       79 | ****            |
|    600 |       63 | ****            |
|    700 |       35 | ****            |
|    800 |       14 | ***             |
|    900 |       15 | ***             |
|   1000 |        6 | **              |
|   1100 |        7 | **              |
|   1200 |        8 | **              |
|   1300 |        5 | **              |
|   1400 |        2 | *               |
|   1500 |        4 | *               |
+--------+----------+-----------------+
Some notes:
Ranges that have no match will not appear in the count - you will not have a zero in the count column. Also, I'm using the ROUND function here. You can just as easily replace it with TRUNCATE if you feel it makes more sense to you.

2011-08-10

How to Kill All Connections to MongoDB

This post is similar to the previous one I have posted before for MySQL. Only this time, it's for MongoDB.
What's so nice about MongoDB is that the command line is actually a fully fledged javascript console. So you can do this:
(the embedded script is not visible from the RSS feed though, so click through to the post itself)
The function accepts two variables, both optional.
The first one is the namespace for which you'd like the connections to be working with. It's optional, but you'd usually want to specify it. If you don't, you'll kill all the connections, including the ones that perform replication duties.
The second one is a "wet" run parameter, that if set to true actually does the killing. The default is false, thus not doing anything but print the connections to be killed to your console.

The whole thing is a work in progress, plus my javascript skills are far from perfect, so comments here or on github are welcome.
A potential issue I've already noticed is that when some operations are in a yeild state, the namespace's first letter is prefixed with a question mark "?". In that case, the simple namespace filter will probably miss it. But, I didn't want to make the function any more complex than it already is.

2011-08-08

How to Easily See Who's Connected to Your MySQL Server

I'm posting this here since it has been useful for me, and the blog is a nice place to keep public notes.
If you have servers which have multiple application servers connected to them, you often need to see things like who's connected, how many connections they have, and which users. Using SHOW PROCESSLIST doesn't work that well, since it gives you a row for each server.

What we want is an output similar to this:
+-----------------+-----------------+----------+
| host_short      | users           | count(*) |
+-----------------+-----------------+----------+
| slave1          | repl            |        1 |
| slave2          | repl            |        1 |
| localhost       | event_scheduler |        1 |
| 111.111.222.111 | root, foo       |        2 |
| 111.111.222.222 | appuser, bar    |        3 |
| 111.111.222.333 | appuser, moshe  |        9 |
+-----------------+-----------------+----------+
And it is achieved using a simple query such as this one:

SELECT SUBSTRING_INDEX(host, ':', 1) AS host_short,
       GROUP_CONCAT(DISTINCT USER)   AS users,
       COUNT(*)
FROM   information_schema.processlist
GROUP  BY host_short
ORDER  BY COUNT(*),
          host_short;

A final note: I'm not sure what version of MySQL this query needs to function, but it works great on MySQL 5.5, and should work just as well on MySQL 5.1.

2011-08-06

How to Move a MongoDB Collection Between Databases

If you need to move a collection between two MongoDB databases, there is no need to dump and restore your data using mongodump / mongorestore.

In a way, it's pretty similar to what we can do with  MySQL's "RENAME TABLE" command:
db.runCommand({renameCollection:"sourcedb.mycol",to:"targetdb.mycol"})
In the background, MongoDB will dump and restore it automatically. There is no metadata "magic" for such a rename since databases reside in different files on disk. It just saves a bit of work, and it's worth knowing about.

How to Fix Erroneously Named MongoDB Collections

In MongoDB, collections are not supposed to include certain characters like, for example, a hyphen "-" or a slash "/". The manual has this to say on the matter:
Collection names should begin with letters or an underscore and may include numbers; $ is reserved. Collections can be organized in namespaces; these are named groups of collections defined using a dot notation.
The keyword here is "should".
It is possible to create collations with names that do not adhere to these rules. This is not enforced at all. When you try to access such a collection from the shell, things don't quite work as expected:
>db.test-col.find()
Sat Aug  6 14:54:06 ReferenceError: col is not defined (shell):1
This is because the javascript console tries to subtract "col" from "test". I've tried different escaping methods for the rogue hyphen, but couldn't find one that works.
Eventually, I've reverted to using the more "internal" renameCollection command, to rename the collection into something manageable:
>db.runCommand({renameCollection:"test.test-col",to:"test.test_col"})
Which solves the problem completely.

Now you might be wondering: how was this created in the first place? Because if you can't read from this collection, you can't "save" into it from the shell as well.
We have a nice list of commands here which I recommend to anyone who's working with MongoDB to be familiar with. One of these commands is the aptly named "create", which will obediently create a collection with such a name:
>db.runCommand({"create":"test-col"});
{ "ok" : 1 }
I can only assume drivers use this same command to create the collections, which was how it was created in the first place.