[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Showing posts with label benchmarking. Show all posts
Showing posts with label benchmarking. Show all posts

Thursday, July 2, 2009

Benchmarking MongoDB VS. Mysql

EDIT November 2011: Please take this benchmarks with a grain of salt. They are completely non-scientific and when choosing a data store probably raw speed is not all you care about. Try to understand the trade-offs of each database, learn about how your data will be handled in case of losing a server. What happens if there's data corruption? How replication works in your specific database… and so on. After all… don't choose a database over another just because of this blogpost.

One of the projects I work for at the company has a message system with more than 200 million entries in a MySQL database. We've started to think about how can we scale this further. So in my research for alternatives look at nosql databases. In this post I would like to share some benchmarks I ran against Mongodb –a document oriented database– compared to MySQL.

To perform the tests I installed MongoDB as explained here. I also installed PHP, MySQL and Apache2 from Macports. I did no special configurations in any of them.

The hardware used for the tests is my good ol' Macbook White:

Model Name: MacBook
Model Identifier: MacBook2,1
Processor Name: Intel Core 2 Duo
Processor Speed: 2 GHz
Number Of Processors: 1
Total Number Of Cores: 2
L2 Cache: 4 MB
Memory: 4 GB
Bus Speed: 667 MHz
Boot ROM Version: MB21.00A5.B07
SMC Version (system): 1.13f3

Because I don't have enough space to store the MongoDB database in my local hardrive, I launched the server with this command:
./bin/mongod --dbpath '/Volumes/alvaro/data/db/'

which tells MongoDB to use my USB hardrive. YES, my USB hardrive :-P

The MySQL server stored the data in the local hardrive.

What was the test?

I loaded in both databases 2 million records from our real data of the message system. Every record has 28 columns, holding informatin about the sender of the message and the recipient, plus the subject, date, etc. For MySQL I used mysqldump. For MongoDB I used the following:

$query = "SELECT * from messsage";
$result = mysql_query($query);
while($row = mysql_fetch_assoc($result))
{
$collection->insert($row);
}

Of course that for the real data loading I added some paginations, I didn't retrieved 2M records at once. And there was some code to initialize the MySQL connection and Mongo to get that $collection object.
The MySQL databases had index on the sid and tid fields (sender id and target id), so I added them to the MongoDB database.
$m = new Mongo();
$collection = $m->selectDB("msg")->selectCollection("msg_old");
$collection->ensureIndex( array("sid", "tid") );

Then I wrote some simple code that will select a limit of 20 records filtered by sid. In the real application this means I'm watching the first 20 messages of my outbox.

EDIT - 2009/07/03

Due to some confusion I have to make something clear. What I'm benchmarking is not the data loading into both databases, nor traversing the data, etc., but the code that you can find here .

This is a similar case of what a user message outbox (or inbox) could be in a production website. The users access his inbox and we retrieve up to 20 messages of his inbox, which are then displayed in an html table. What siege accessed was a script serving that html generated out of the query results.

So the idea is, if MongoDB or MySQL are the backends of this message system, which one will be faster for this specific use case. This benchmark is not about if MongoDB is better than MySQL for every use case out there. We use MySQL a lot in production and we will keep using it as far as I can tell. And yes, I know that MySQL and MongoDB are two totally different technologies that probably only share the word database in their descriptions.

END EDIT - 2009/07/03

I did the code for Mongodb and MySQL. Then my idea was to launch siege and pick some random user ids from a text file and do the stress tests.

Here's an extract from the url textfile:
http://mongo.al/index.php?id=96
http://mongo.al/index.php?id=105
http://mongo.al/index.php?id=108
http://mongo.al/index.php?id=113
http://mongo.al/index.php?id=116
http://mongo.al/index.php?id=117
http://mongo.al/index.php?id=127
http://mongo.al/index.php?id=129
http://mongo.al/index.php?id=130
http://mongo.al/index.php?id=134

This means that siege will pick a random url and hit the server, requesting the outbox of that user id.

Then I increased the ulimit to be able to run this test:
siege -f ./stress_urls.txt -c 300 -r 10 -d1 -i

With that command I launch siege, telling it to load the urls to visit form the text file. It will simulate 300 concurrent users and will do 10 repetitions with a random delay between 0 and 1. The last option tells siege to work in internet mode, so it will pick urls randomly from the text file.

When I launched the test wit MongoDB as backend it worked without problems. With the MySQL it crashed quite often. Below I add the results I obtained for both of them.

MongoDB test results:

siege -f ./stress_urls.txt -c 300 -r 10 -d1 -i

Transactions: 2994 hits
Availability: 99.80 %
Elapsed time: 11.95 secs
Data transferred: 3.19 MB
Response time: 0.26 secs
Transaction rate: 250.54 trans/sec
Throughput: 0.27 MB/sec
Concurrency: 65.03
Successful transactions: 2994
Failed transactions: 6
Longest transaction: 1.47
Shortest transaction: 0.00

MySQL tets results:
siege -f ./stress_urls_mysql.txt -c 300 -r 10 -d1 -i

Transactions: 2832 hits
Availability: 94.40 %
Elapsed time: 23.53 secs
Data transferred: 2.59 MB
Response time: 0.74 secs
Transaction rate: 120.36 trans/sec
Throughput: 0.11 MB/sec
Concurrency: 89.43
Successful transactions: 2832
Failed transactions: 168
Longest transaction: 16.36
Shortest transaction: 0.00

As we can see, MongoDB performed more than 2X better than MySQL for this specific case. And remember, MongoDB was reading the data from my USB hardrive ;-).

Tuesday, March 3, 2009

Symfony Speed and Hello World Benchmarks

After reading some posts showing that my blah blah framework is way more fast than symfony for a Hello World application I decided to explain why: because symfony is extensible and can adapt to your needs. That’s easy to say you may think, in fact, every framework out there claims that. So what makes symfony so special?


The following list names some of the features provided by symfony.

  • Factories
  • The Filter Chain 
  • The Configuration Cascade
  • The Plugins System
  • Controller adaptability
  • View adaptability


Factories


Since version 1.0 symfony provides a configuration file called factories.yml. This file affects the application core classes configuration. There you can override symfony default classes by your own ones. This means you can set up a custom Front Controller, Web Request, Cache classes, Session storage, etc. 


But this come with an extra price: when a symfony application bootstraps, it reads the configuration file from the filesystem. If the yml file was parsed before, then it loads a PHP file -which can also be cached with APC-, if not, then it parses the yml file, stores the parsed file on the cache folder and loads the configuration from there.


Why should I need that flexibility you may ask? In a project I’m involved with we needed Memcached. This means that we overrode all of the symfony default cache mechanism by our own custom classes. How? Setting up our classes in an easy to read yml file. So for sure when you benchmark your Hello World application symfony will be slower.


Filter Chain


One of the patterns from the Core J2EE Patterns book that impressed me the most is the Intercepting Filter. This patterns teach how to modify a request processing without the need to change Controllers or Model code. The idea is that in a configuration file you plug a class that will take care of pre or post processing the request. This classes are called filters. As an example, you can add a filter that checks if the user has the proper credentials to execute the action she wants. Another filter can cache the response, etc. 


Symfony has the filters.yml file which can be specified by application or by module. This means that we can set filters to be executed for the whole application, and then for specific modules -let’s say for Ajax actions-, we disable them. Does your framework provides this flexibility without resorting to some kind of monkey patching techniques? No? Well symfony does. Say hello to the Filter Chain. So for sure when you benchmark your Hello World application symfony will be slower. Because it adds flexibility to the process. You want to get rid of this behavior? Sure, set up a custom controller in the factories.yml file, and in your new class override the loadFilters method.


Configuration Cascade


As explained in the configuration chapter of the symfony book, symfony allows to modify it’s behavior through some yml files. So for example we have the view.yml that tells symfony which css and javascript files it should load for the current request. We can have an application view.yml configuration and then override the settings per module. When symfony process the request it checks all this files, that’s why in the Hello World benchmark it‘s slow.


Plugins System


Symfony has a very powerful and easy to use Plugin System. It’s more than 400 plugins with a set of 200+ developers speaks by itself of it success. 


The plugins can contain modules of their own and also a config.php file, similar to the project config.php file or the module config.php. When symfony process a request it checks for the settings in those files, this means that they will be read from disk. So in a plugin we can provide a custom logger that is fired up when bootstrapping the application. The Plugin user doesn’t need to care how the logger will be activated, she just now that it will work. 


The same applies for plugin modules. How do you think that symfony knows that certain module/action should be called from a plugin? If you enable the plugin module on the settings.yml file symfony will check inside the plugin module to see if the requested action should be executed there. 


Controller adaptability


For each action that the user call symfony will execute a page controller. Inside our modules symfony let us use a generic myModuleActions class that will extend sfActions or one specific to the action requested by the user, that as an example could be called indexAction and will extend the sfAction class. When a request is processed symfony first checks for the existence of the later. If it doesn’t exists then it tries to load the generic action for that module. Of course you don’t need this kind of flexibility for Hello World apps.


View Adaptability


For rendering the response symfony uses by default the sfPHPView class. If certain module in your application requires a different view, then there are at least three ways to accomplish this as explained here.


Conclusion


Symfony is a Professional Web Application Framework built to cope with real world needs. In a large project with more than a simple salutation feature sooner or later you will need the flexibility provided by the framework. This will save you time and will prevent headaches, because when you have built a whole system with a framework and the business needs start to push in a direction where you have to extend the framework you will thank yourself for having choose symfony at first.

In case that your client requires a Hello World! application, then you can use the following hyper fast framework code: die(“Hello World!”) ;-)

Tuesday, September 2, 2008

We started something!!

It seems that our benchmarking example has pushed people to do their useful benchmarks. 
You can check this framework benchmarks page and try to guess what are they actually benchmarking. If you can find the point of that benchmark, please drop some comments here, because I want to sleep with ease tonight.

So I want to left here a just a few remarks about this kind of stuff:
  1. Stop benchmarking your just created framework against symfony, Zend Framework, Cake, or whatever.
  2. When are we going to realize that the point of a framework is not to run as fast as assembly code, but to improve developer productivity and save money in developer time?
  3. I'm not pissed off, I just can't get the point of those benchmarks.  
If you have more examples of this kind of useless benchmarks, please add them to the comments.

Monday, September 1, 2008

Benchmarking die("Hello world!"); VS. exit("Hello world!"); VS. echo "Hello World!"; on PHP

After doing some useful and problem solving benchmarking with Siege we can scream to every corner of the world that exit() is faster than die() and echo

Show me the facts! I hear you screaming. Let there be facts!:

echo "Hello world!";
Transactions:         250 hits
Availability:      100.00 %
Elapsed time:        7.10 secs
Data transferred:        0.00 MB
Response time:        0.01 secs
Transaction rate:       35.21 trans/sec
Throughput:        0.00 MB/sec
Concurrency:        0.20
Successful transactions:         250
Failed transactions:           0
Longest transaction:        0.05
Shortest transaction:        0.00


die "Hello world!";
Transactions:         250 hits
Availability:      100.00 %
Elapsed time:       10.04 secs
Data transferred:        0.00 MB
Response time:        0.01 secs
Transaction rate:       24.90 trans/sec
Throughput:        0.00 MB/sec
Concurrency:        0.17
Successful transactions:         250
Failed transactions:           0
Longest transaction:        0.04
Shortest transaction:        0.00


exit "Hello world!";
Transactions:         250 hits
Availability:      100.00 %
Elapsed time:        6.05 secs
Data transferred:        0.00 MB
Response time:        0.01 secs
Transaction rate:       41.32 trans/sec
Throughput:        0.00 MB/sec
Concurrency:        0.27
Successful transactions:         250
Failed transactions:           0
Longest transaction:        0.04
Shortest transaction:        0.00


The previous test were performed simulating 25 concurrent users with a 10 times repetition. Now just imagine for one minute (or two if you need more) if 20.000 concurrent users hits your echo "Hello World" website! That will be a mess! Just by imagine myself this scenario I can't stop hearing the sirens on my mind, so please, grep through your code and preg_replace() all those echo "Hello world!" you may have there!