Wednesday, September 18, 2013

Query logging and profiling in MongoDB

In my previous blog post I mentioned that the 1.5.0 release of the MongoDB driver for PHP has improved notification support using the PHP Notification API, and showed off a fairly useless (but fun) progress bar.
Now its time for a little bit more practical examples using this system: Query logging and profiling.

Those of you who clicked through to the github Pull Request may have noticed the long long description of the API and noticed all the different "MONGO_STREAM_NOTIFY_LOG_xxx" constants it included..

When instantiating MongoClient you can pass in "Driver Options" using the 3rd argument. Currently there is only one available Driver Option: "context".

The context key takes a Stream Context, and is assigned to all of the underlaying connections the driver creates - which means you can set any of the Socket Context and SSL Context options and it will Just Work (maaan.. that saved me a ton of work.. Not needing to implement any of the SSL things myself!).
Since its just a normal Stream Context, the lesser known Context Parameters will also Just Work, but what does that mean?

Well, PHP has a trivial notifications system when operating on streams. The out-of-the-box support is (incremental) progress updates when reading and writing to streams, which I covered in my previous blog post. The MongoDB driver however extends this support with a lot of other things.

Keep in mind I haven't documented any of this just yet as its all still experimental.

OK. Lets check this out.


(The entire script is available as a Gist on Github).
<?php
class MyStreamNotifications {
    public $reqs = array();


    /* Our callback function
     * The prototype is define by the PHP Stream Notification system:
     * http://php.net/stream_notification_callback
     */
    function update($notification_code, $severity, $message, $message_code, $bytes_transferred, $bytes_max) {
        switch($notification_code) {
            /* For now, lets focus on the TYPE_LOG, and ignore the others */
            case MONGO_STREAM_NOTIFY_TYPE_LOG:
                /* This notification only uses two arguments, so we can ignore the rest */
                return $this->logMessage($message_code, $message);

            default:
        }
    }

    /* Our logging function
     * The $code describes the operation being executed,
     * and $message is a json serialized array, packed with information
     * Note: If ext/json is not available, $message with be PHP serialize()d instead
     */
    function logMessage($code, $message) {
        $data = json_decode($message, true);

        switch($code) {
            /* Write operations are a one way street in MongoDB, so these don't end with a
             * MONGO_STREAM_NOTIFY_LOG_RESPONSE_HEADER.
             * When using WriteConcern (defaults to 1), and additional query (command) is
             * sent to check for errors, called getLastError*/
            case MONGO_STREAM_NOTIFY_LOG_INSERT:
            case MONGO_STREAM_NOTIFY_LOG_UPDATE:
            case MONGO_STREAM_NOTIFY_LOG_DELETE:
            case MONGO_STREAM_NOTIFY_LOG_GETMORE:
            case MONGO_STREAM_NOTIFY_LOG_KILLCURSOR:
            case MONGO_STREAM_NOTIFY_LOG_BATCHINSERT:
                /* We only care about the query and response headers, for now */
                break;

            /* We´ve got a query */
            case MONGO_STREAM_NOTIFY_LOG_QUERY:
                /* Retrieve the Wire protocol Request ID, so we can match it with the
                 * Response headers once the query as executed */
                $reqid = $data["server"]["request_id"];

                /* Log all the metadata we have about the query, and add the start time */
                $this->reqs[$reqid] = $data;
                $this->reqs[$reqid]["start"] = microtime(true);
            break;

            /* The query has executed on the server, and now we are reading the
             * response headers */
            case MONGO_STREAM_NOTIFY_LOG_RESPONSE_HEADER:
                // Get the Request ID this respose is for
                $id = $data["server"]["request_id"];

                /* The driver runs some commands internally, such as isMaster() to
                 * sanitycheck the connection to the server, these commands currently
                 * don't trigger the notification system - that may/will change in the
                 * future though */
                if (!isset($this->reqs[$id])) {
                    return;
                }

                // Fetch the request info
                $req = $this->reqs[$id];
                // End the timer
                $end = microtime(true);

                // Add the ending time, and how long it took
                $req["ended"] = $end;
                $req["total"] = $end - $req["start"];

                $this->reqs[$id] = $req;
            break;

            default:
                // Unknown log message code, ignore
        }
    }

    function getRequests() {
        return $this->reqs;
    }
}
Thats a good chunk of code, just to profile queries.. But bare with me for a moment.

There is a lot of things happening right there, but hopefully the inline comments make sense of it all.
The update() method is the actual callback function (which we will be registering a little later).
It has a lot of arguments, and the $bytes_transferred and $bytes_max aren't even relevant in most scenarios, but ohhwell. Nothing we can do about that as this is the interface PHP provides us with.

The $notification_code is the "main argument" in the callback function, it tells us what sort of notification it is, and there are a handful of different notifications that can happen.
Each notification then has its own set of "$message_code", in this case, one of the MONGO_STREAM_NOTIFY_LOG_xxx constants.
Since we are only interested in logging and profiling the queries we ignore all the different MONGO_STREAM_NOTIFY_LOG_xxx (they are only there for completeness - in case you want to extend the example and experiment with them later), and only catch the MONGO_STREAM_NOTIFY_LOG_QUERY and MONGO_STREAM_NOTIFY_LOG_RESPONSE_HEADER.

Moving on..
/* Some Helper Functions */

/* Split a Server Hash to extract the server name from it */
function getServerName($hash) {
    list($server, $replicaset, $dbname, $pid) = explode(";", $hash);

    return $server;
}

/* Resolve the server type to a readable string */
function getServerType($type) {
    switch($type) {
        // FIXME: Do we not export these as constants?
        case 0x01: return "STANDALONE";
        case 0x02: return "PRIMARY";
        case 0x04: return "SECONDARY";
        case 0x10: return "MONGOS";
    }
}

/* Resolve Cursor (Query) Options to something meaningful */
function getCursorOptions($opts) {
    // FIXME: Do we not export these as constants?
    if (($opts & 0x04) == 0x04) {
        return "slaveOkay";
    }
    return "Primary Only";
}
Here we define some helper functions, and actually few FIXMEs for myself so you don't have to remember these magic numbers :)
Other then that, nothing really important.
Next up, boilerplate setting to connect to a ReplicaSet.
/* Boilerplate stuff to setup a ReplicaSet connection */
// My replicaset seed list
$seeds = "toddler.local:30200,toddler.local:30201";

// MongoDB Connection options
$moptions = array(
    // Don't wait to long on servers that are down
    "connectTimeoutMS" => 300,
    // Read from secondaries, if available
    "readPreference" => MongoClient::RP_SECONDARY_PREFERRED,
    // My local ReplicaSet name
    "replicaSet" => "REPLICASET",
    // Don't wait more then a second to confirm Replication
    "wTimeoutMS" => 1000,
);
Again, that wasn't fun... Lets take a look at some fun things next!
/* Registering a stream context! This is the fun part! */

// Create an instance of our callback object
$mn = new MyStreamNotifications;

// No Stream Context Options, for now
$coptions = array();
// Configure the Notification Stream Context Parameter to call MyStreamNotifications->update()
$cparams  = array("notification" => array($mn, "update"));

// Create the Stream Context
$ctx = stream_context_create($coptions, $cparams);

$doptions = array("context" => $ctx);


// Instanciate a MongoClient with all our options
$mc = new MongoClient($seeds, $moptions, $doptions);
If you have never played around with PHP Stream Contexts, this may look a bit alien..
The stream_context_create() function takes two arguments, context options and parameters.
There is a very subtle difference between those two, and we do support both (yes yes, not documented.. :) still highly experimental), but for now, we are using the notification parameter.



Now that we have setup everything we need and have connected to MongoDB, configured just the way we like it, its time to run few commands and queries to check this thing out
// Query the example collection in the test database
// you should probably change this to something meaningful
$collection = $mc->selectCollection("test", "example");


// Random search criteria, you should change this to something meaningful
$criteria = array("search" => "criteria");
// Add some sorting & limit for the fun of it
$cursor = $collection->find($criteria)->skip(13)->limit(5)->sort(array("_id" => 1));

// Count on a secondary (we constructed the MongoClient with SECONDARY_PREFERRED)
$count = $cursor->count();
echo "We have $count results in total\n";

// Fetch the data from a primary
$cursor->setReadPreference(MongoClient::RP_PRIMARY);

/* The query isn't execute until we start iterating over it.. */
foreach($cursor as $doc) {
}
When you are trying this out for yourself, make sure you have establish a connection to your MongoDB instance (modifying the $seeds and $moptions["replicaSet"] variables up there), and modify the database+collection name, along with the query criteria.

Now that we have executed some things, our $mn->reqs array has now some interesting info its time to print out
// Fetch our requests from our logger
$reqs = $mn->getRequests();
foreach($reqs as $request) {
    printf("Hit %s (which is a %s, I asked for %s) with:\n%s\n",
        getServerName($request["server"]["hash"]),
        getServerType($request["server"]["type"]),
        getCursorOptions($request["info"]["options"]),
        json_encode($request["query"], JSON_PRETTY_PRINT)
    );
    printf("Skipping the first %d matches, limiting the results to %d documents..\n",
        $request["info"]["skip"],
        $request["info"]["limit"]
    );
    printf("Server execution time+network latency: %.8f sec\n\n", $request["total"]);
}
Putting it all together, and running the script should then print out something like:
We have 205 results in total
Hit localhost:30201 (which is a SECONDARY, I asked for slaveOkay) with:
{
    "count": "example",
    "query": {
        "search": "criteria"
    }
}
Skipping the first 0 matches, limiting the results to -1 documents..
Server execution time+network latency: 0.00042319 sec

Hit localhost:30200 (which is a PRIMARY, I asked for Primary Only) with:
{
    "$query": {
        "search": "criteria"
    },
    "$orderby": {
        "_id": 1
    }
}
Skipping the first 13 matches, limiting the results to 5 documents..
Server execution time+network latency: 0.00045013 sec



You may want to var_dump() the $request variable to see what other fun things it contains to play with, hopefully I'll have it all properly documented by the time we release 1.5.0 version of the driver.
As you can see, the driver needs to massage the query a little bit to add things like orderby clauses, where as "skip" and "limit" are part of the wire protocol and therefore not part of the actual query syntax.
If you have the MongoDB profiler enabled, you could then lookup how long the query execution took on the server, subtract the time we got and you have the time it took us to read and write over the network.

Another related feature is the $comment operator that "tags" a query, so its easier to spot in the profiler. You could use this to for example group your queries ("person model", "retrieving comments", "product query", ...).



Although this is a good chunk of code, I hope it will be useful, especially for applications that require thorough auditing of its queries. This should also be easily integrable into most "Profiler/Developer toolbar" that the various frameworks provide.


Since this is all still highly experimental, there is still time to change some things, so I wonder.. what do you think?

-Hannes


No comments:

Post a Comment