-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt at removing locks #14
Comments
This is really interesting and I've thought of implementing something like this before. It's a great idea. I need to review this a bit more. I have some questions about contention and what happens in those cases: if ($session_data_array[$key.'_SMT'] < $_SERVER['REQUEST_TIME_FLOAT']) {
$session_data_array[$key] = $val;
$session_data_array[$key.'_SMT'] = $_SERVER['REQUEST_TIME_FLOAT'];
$something_changed = true;
} In this previous case, I'm wondering, for example, what happens in the non-existent Overall, this is fairly complex logic and I think that if there's ever been a reason to unit test this, this is definitely the way to go. My struggle with that was surrounding mocking how PHP actually calls these methods and in which order in order to create sessions. A misunderstanding in this area would create the potential for for false positives in the testing realm. I think it would be a great idea to put some effort into creating a "mock" that mimics PHP's way of dealing with sessions. Todos from this:
I would still be interested in maintaining 5.2 compatibility, including the Are you interested in generating a pull request for this? |
@mikeytag this is a very interesting implementation. How big of a replicaSet are you running this on, and how much traffic are you pushing through it? @nicktacular I am curious of the cost of all of this logic on every session write, especially with heavy session use (large multi-dimensional arrays). Once/if you get a unittest going I'd love to stress test as well. I have to say I am intrigued, I am also a little hesitant. I will be watching! |
@nicktacular In the non-existent else case that you referenced above nothing would happen as the script that is looking to change a session key finds out that it is older (i.e. started before) another script that has manipulated the session since it started. Here's an example that we have experienced in our application. A page loads and fires off 2 ajax scripts. The first one to make it to the webserver is a script that let's say sets $_SESSION['foo'] = 1, so the code manipulates $_SESSSION and then moves on it's merry way and does some other stuff, let's say a long running map/reduce stats gathering routine that takes 30 seconds. Shortly after the first script is fired, the second script begins ajax execution. Now, in normal PHP land the second script waits to execute until the first script has released it's lock on the session (PHP sets a read lock when session_start() is called). Well my code changes this paradigm and on session_start() hands populates the $_SESSION variable to any script that asks it from whatever is in the mongo collection at the time, thus speeding up script execution as a vast majority of my app doesn't actually write to session but reads a lot from it. Immediately, I get a speed bump from simultaneous ajax/iframe calls. But what if the second script manipulates $_SESSION and needs to write it? Well, if it adds a new key like $_SESSION['bar'] = 1. Then that new key gets added with a $_SESSION['bar_SMT'] = microtime and since it wasn't edited by the first script the new session of foo=1 and bar=1 is saved to mongo just fine. The problem that comes up is if the second script edits the same key as the first script. What if the second script writes $_SESSION['foo'] = 2 while the first script is still executing? Obviously the first script doesn't get an event letting it know it just happened. However, on write it pulls a copy of the most recent data in mongo, compares that array to it's own $_SESSION array and has to make a decision. I opted for the decision that if the microtime of the script start is newer than the microtime of the script that last edited that key then it gets overwritten. In our example script 1 (which is older) will look to write $_SESSION['foo'] = 1 to the mongo collection but the code won't allow it because $_SESSION['foo'] = 2 is already in the db with a newer microtime timestamp. All this stuff gets really complex very quickly and I think the timestamping thing will have it's own issues, but this concept has greatly increased our speed and we haven't had any issues with $_SESSION having incorrect data using this approach ... yet. You are absolutely right about testing. I'll work on a set of phpunit tests and add it to my fork. I think a pull request is a good idea but I should probably change my code so that you can invoke my style of logic with a config parameter or mimic PHP's lock handling which should be the default. |
@rocksfrow We're running our mongo cluster on AWS (along with all our other servers) and currently have 3 mongo config instances with 7GB of RAM and a sharded replica set of 3 mongo data servers. The data servers have 58GB of RAM and 32 cores. Our shard size is only 1 at the moment so to add capacity I'll be adding data servers in sets of 3. Everything is on a 10G network with Amazon and I 2 of my data servers are using the ephemeral SSD drives and 1 of them (which has a lower priority) with an SSD EBS volume for resiliency. In terms of collection size our sessions collection is currently at a count of 6,441 with a size of 12,526,192 (12 MB). We're not breaking any records over here ;) |
One concern I have with using PHP to timestamp is the lack of microseconds. Without microseconds like dbs provide, these is a high chance of two No?
|
That environment sounds beast :)
|
@rocksfrow agreed. that's why I used microtime() (well technically $_SERVER['REQUEST_TIME_FLOAT' which includes microseconds) |
Oh my bad I didn't see that.
|
I think this is the basis for a fine-grained locking feature. I think we need to consider how best to implement this. I've thought a lot about this and I think it's very useful. There are 2 types of users of PHP I want to discuss:
For the first type of developer, they won't know, nor will they care what is happening here as long as it just works, so let's focus our efforts on the second. The first problem arises from the way PHP treats session data in that it's a read-all-at-once, block, do stuff, then write-all-at-once and continue. As we've all well experienced at this point, this is a major pain point for in-parallel processing which results from multi-server environments and browser multi-threading. So how to we work around this? A couple of options. Option 1: use frequent opens, writes, and closes (and buffers!)ob_start();//only if you do intermittent writing
session_start();
$data = $_SESSION['thing'];
$_SESSION['thing'] = process($data);
session_write_close();
//do more stuff that doesn't depend on session
// { cooooooode }
session_start();
$other = $_SESSION['count'];
$other++;
$_SESSION['count'] = $other;
session_write_close();
//do more stuff that doesn't depend on session
// { cooooooode } Ok, so this is painful because who writes code this way? And what if you're using a framework that abstracts session handling for you? Well, then, good luck, since some frameworks don't consider ability to re-open after a Option 2: use regular sessions, implement hard-core logic to deal with contentionsThis is essentially @mikeytag's solution. Basically, the idea is to introduce metadata that will help with merging the write attempts due to contention. This is feasible, but complicated and will require a lot of tests to make sure this is correct. Option 3: Introduce advanced fine-grained session data handlingOk, so this option means that you can continue using The second part of this solution is to introduce fine-grained session data handling for those that do care about contention. This means that data that is read/written with // request #1 and #2 are happening at the same time here:
//request #1
$handler = MongoSession::create();
$id = $handler->read('user_id');
$name = $hander->read('user_name');
$view->render(['id' => $id, 'name' => $name], 'template.phtml');
//request #2
$handler = MongoSession::create();
$id = $handler->read('user_id');
$name = $hander->read('user_name');
$view->render(['id' => $id, 'name' => $name], 'template.phtml' Neither of these requests block since acquiring a read lock in // request #1 and #2 are happening at the same time here:
//request #1
$handler = MongoSession::create();
$count = $handler->read('counter');
$handler->set('counter', $count + 1);//locking is implicit here
//request #2
$handler = MongoSession::create();
$handler->set('last_request_uri', $_SERVER['REQUEST_URI']);//locking is implicit here There would, of course, be cases where locking and blocking would be unavoidable, but, there could be specific cases where a non-blocking lock attempt would allow for the code trying to access that lock to do whatever it wants. This would work in the exact same way as // request #1 and #2 are happening at the same time here:
//request #1
$handler = MongoSession::create();
$count = $handler->read('counter');
//explicit locking to show non-blocking option
if (false === $handler->lock('counter', NO_BLOCK)) {
//lock failed, perhaps we wait or perhaps we move on, depending on what we want to do
} else {
$handler->set('counter', $count + 1);
$handler->unlock('counter');
}
//request #2
$handler = MongoSession::create();
$count = $handler->read('counter');
//explicit locking to show non-blocking option
if (false === $handler->lock('counter', NO_BLOCK)) {
//lock failed, perhaps we wait or perhaps we move on, depending on what we want to do
} else {
$handler->set('counter', $count + 1);
$handler->unlock('counter');
} This third option, of course, is disadvantageous in that it requires re-coding how we use sessions in the cases where contention may occur or in the cases where we want to take advantage of these locks. The interesting note here is that you can still use Thoughts @mikeytag @rocksfrow @arski ? |
@nicktacular Thanks for the great writeup. Personally I'm fond of Option 2 for a few reasons:
However, if each of these approaches are fully tested then I think the right thing for the project would be to give users the option of using any approach. We can make Option 3 the default so you get a better performance boost than traditional PHP session handling but I still feel that Option 2 should be an option if it makes sense for you particular app. That being said, I am more than happy to rewrite my version as a new class and have all the meta locking stuff fully tested. I am thinking a good way to move forward for the project would be to have one abstract class of MongoSession and then we can create different objects to handle the different approaches called MongoSession_Default, MongoSession_Flock, MongoSession_Meta_Lock which all extend the abstract class. Then the implementation is as simple as calling the init() of the respective class. I'm excited at lending a hand to move this project forward. FWIW, we have been running Option 2 in production for hundreds of thousands of visitors and overall seen a performance boost and better responsiveness without any issues. |
@nicktacular - I'd have to agree option #3, as awesome as it sounds, is I think keeping this classes default behavior as a drop-in session I know for me, rewriting our entire application to handle session data via Thanks again for all of your work put into this. I will dig through it more On Tue, Feb 10, 2015 at 4:54 PM, Michael Taggart notifications@github.com
|
@nicktacular
First off, thank you for creating the MongoSession class. It really helped me out on a project. I did something a bit crazy to it, so I didn't want to issue a Pull Request but I wanted to let you know how I modified your class so that locks aren't needed at all.
As you know, PHP does a read lock by default for the entire life of the script that called session_start(). The downside to this is if you ever fire off multiple script invocations for the same user at the same time. In the old days this only happened with framesets but we run into slowdowns due to multiple async ajax calls all the time. I decided to remove this paradigm completely and I want to let you know how I did it.
I first removed any reference to the lock method. This was really only in the read method anyway. I changed the session.serialize_handler to php_serialize. I'll explain why in just a bit. Next I changed the write method to the following:
Basically on write() I hit the mongo db for the session data and unserialize it into an array called $session_data_array. I then take the $data string and unserialize it into $data_array. I then proceed to diff the in memory $data_array vs. the last entry in mongo ($session_data_array) the following way.
If there is an entry in $data_array that is not set in $session_data_array then go ahead and create the key in $session_data_array as well as a microtime timestamp from when the script was first invoked. Since 5.4 this is available in $_SERVER['REQUEST_TIME_FLOAT'] (I know you are trying to support 5.2 and I could've created a property called startMicrotime and set it to microtime(true) in __construct, but I was a bit lazy.
Ok, if there is an entry in $data_array that is set in $session_data_array first check to see if it is different than the same value that is already in $session_data_array. If it is different then it looks for the timestamp of the last script that updated that key. If the timestamp of the running script (i.e. when it started) is newer than the timestamp on the session key then it updates that key value and it's corresponding $key.'_SMT' to be the new timestamp. (_SMT was a suffix I thought wouldn't collide much with apps and stands for Session Micro Time)
Last but not least, it loops through the $data array looking for the removal of any keys (i.e. you did an unset($_SESSION['foo'])) It does the similar timestamp check and if this script is newer than it wins.
After all is said and done it only updates mongo if, in fact, something in the session data has changed. If it hasn't it doesn't waste the network round trip by saving the same data over itself.
Cons to this approach:
You may be asking yourself why didn't he just use session_decode() and session_encode()? The problem is that session_decode only returns a boolean and actually manipulates $_SESSION. I can't have the write method potentially jacking up that super global so I wanted to be able to decode and encode it to my own variables without touching $_SESSION and having inadvertent side effects.
I also set about on this journey because after using MongoSession in production for a day I noticed stale locks in the locks table. I also had users report to me that they couldn't login and in fact I was able to figure out that they were stuck in the immutable lock problem that you referenced in your README I believe.
My fork is located at https://github.com/mikeytag/php-mongo-session if you want to browse around and see the other changes I made in the file (had to add an ini_set at the top to change the session handler. My fork is now in production and will receive about 10,000 visitors each day. I'll report back with any issues that may arise. Like I said before, I am going out on a bit of limb here and I am confident in the way my app uses sessions that everything will be fine, but I realize that pulling in my changes to your repo may not be the best course of action as it's a pretty drastic change.
TL;DR: $_SESSION key level locking, saving of writes to Mongo, bloat of $_SESSION array and more.
The text was updated successfully, but these errors were encountered: