Amazon just added a new product to their cloud computing offering besides EC2 and S3 - the SimpleDB. It's quite an interesting idea. It's a key-value store, which can also understand the internal structure of the value column. They have a new set of names for old ideas. translated from RDBMS speak - tables are "domains", rows are "items", values are "attributes". You can get an item by it's identifier (classic key-value store example) and you can also query the "domain" for items with specific "attributes".
I really wonder how fast it's going to be. They are kind of vague on the performance, except the fact they promise it to be "quick", "fast", "high performance" and "real time", but only if you promise to be good and put everything on their cool SLA-less computing cloud.
- The data is limited to 10GB for the duration of the beta.
- They mention this interesting fact: "Amazon S3 and Amazon SimpleDB use different types of physical storage. Amazon S3 uses dense storage drives that are optimized for storing larger objects inexpensively. Amazon SimpleDB stores smaller bits of data and uses less dense drives that are optimized for data access speed." In other words, they use fast 10/15K RPM SCSI drives for the DB and cheap 7200 SATA drives for S3.
Can we do this using an RDBMS?
Yes, you can. You could mimic this system right now by using an RDBMS, up to a degree.
"Amazon SimpleDB automatically indexes all of your data, enabling you to easily query for an item based on attributes and their values. In the above example, you could submit a query for items where (color = blue AND description = dress shirt), and Amazon SimpleDB would quickly return item 456 as the result."
Create a table with ID,XML columns. Scan the XML column by using an XPath query, for matching items. In MySQL this can be achieved even now - albeit slowly, since each such action is a heavy table scan with an XPath expression processed for each row. In SQL Server, there is a new XML index type since 2005, which might do just that. It's slower than regular indices on columns, but it works.