-
Notifications
You must be signed in to change notification settings - Fork 5
Overhaul Hopper, break up mono-lock hegemony #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hopper is pokey. We have a need to expose more telemetry about Hopper's internal workings so Ops folks can make good decisions about how queues are behaving in practice. A good chunk of that work will be teasing the information _away_ into a queriable form. Work is inspired by http://blog.troutwine.us/tag/hopper/ Signed-off-by: Brian L. Troutwine <blt@postmates.com>
I have done a fair bit of trimming of FsSync. We no longer have _so_ much repetitive fiddling going on in that shared structure and the sender / receiver are simpler for it. I _hope_ to get the FsSync totally destroyed. The only things that remain are a count of _disk_ writes and the VecDeque. I don't have a plan per se but maybe one will come to me. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
The key insight here is that _if_ we write information into the deque buffer we can instruct the receiver to switch into disk mode without coordinating through any other memory points. This does mean every payload incurs an additional enum variant cost but, meh, I'll take it. What's _especially_ nice is this opens up the possibility of keeping order without flushing the whole of the memory buf -- since we can increase the disk read indicate at need -- and will, I think, allow us to use a deque not protected by a mutex. I think. There are a few tests that need to be uncommented that'll come here shortly. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
The test runs are _noticably_ faster now. :D Signed-off-by: Brian L. Troutwine <blt@postmates.com>
We were not previously compressing on-disk representations. This is now corrected. The compression-level is not configurable and is pegged to 'fast'. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
This is causing -- I think -- travis to spin out. It does _not_ fail on my local machine, probably because travis disks are very slow. As a part of this work I'm going to add proper AFL testing back into the project. They won't run in travis but they'll be there. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
The current queue has issues in that it does not actually work across threads: the pointers don't update. Which, makes sense. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
Okey dokey, so this is some kind of progress. What's happening here is it's possible for the enq to lose the thread if it rumbles oddly with deq. The thing is, this won't square with my need to attempt an enq, get bounced and then deq_n. I need to invest in contiguous storage which I believe will be better for me anyhow as I can press put the enq/deq index into the enq_lock / deq_lock mutex, this skip out on the condvar entirely. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
This implementation is still not perfect: there are pauses and the ocassional crash now but it's actually workable and we've hit the finicky QC stage of things now. Nifty. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
We still yet need to do an AFL run and longer-duration QC runs as well but this is progress. The primary change is we no longer point to contiguous options but to const pointers, being sure to synchronize on size as AoMP suggests for ARM and friends' sake. Special thanks to https://users.rust-lang.org/t/why-do-my-distinct-values-resolve-to-the-same-pointer/15517/3 for help figuring out my stack allocation issue. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
I've currently got a double mut issue with the way deque works but I'm closer to where I need to be. Namely, the sender / receiver _should_ have fine grained control over their locks and the sender loop is closer to functional. Now I just have to figure out how to take a lock and then meddle with it. I think the answer is pushing InnerQueue back up into Queue. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
This commit manages to work back some of the mutable borrow errors that plauged the last commit. The main difficulty now is located entirely in Sender, which, uh, is progress. Somehow I need to fiddle with write_to_disk so that we can do rotations without having to write to self, or something. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
This commit was hard won. Note now that we no longer leak memory when we pop/push and correctly set the offsets when doing so. I'm keeping a 'check' of which all-up tests are passing, going through one-by-one. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
The tests were a bit rough, I'm now realizing. In the next commit I'll do more to probe at the implementation, aswering these questions: * In a single sender / single receiver setup do we get things in sender write-order? * In a multi-sender / single receiver setup do we get all the same elements back, in the end? Signed-off-by: Brian L. Troutwine <blt@postmates.com>
The approach taken to the quickcheck tests from deque has been migrated upward into the main library. The deque module is now untested except but indirectly. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
This benchmark replaces the older bencher based benchmark which, itself, didn't do too much to probe the functional speed of hopper. The timing I get out of this now is... fishy, considering that one of the hopper tests should be heading to disk and therefore be much slower. Tom Santero has some ideas there. Also, we're significantly more pokey than stdlib mpsc right now. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
I believe in AFL runs I'm seeing crashes to do with over-consumption of the filesystem. Right now hopper assumes that filesystem ops will never fail. Which, that's not correct. This commit starts to pay some of that down. The API will have to change to accomidate the rest. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
Well hey! Who knew that thread::spawn could crash? The library authors, for one. I sure didn't. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
Whew! Well guess what, you can get a lot of fun crashes when manipulating the filesystem if there are no more FDs available. Fun fact: hopper startup is now much less expensive because of the way Sender clone is reworked. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
This is the first commit that is ready for review. There's still work to be done but it's polish. Well, except for that need for Drop. That's an actual problem. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
Current problems: if the on-disk queue size is very small relative to I'm considering making the minimum on-disk queue size relatively normal, something like 1Mb. That won't absolve us of FD exhaustion problems but will make life considerably nicer for almost all real systems. |
This commit includes a lot of material to guard against failures owing to file-descriptor exhaustion on-system. Mostly this just means careful looping and passing of errors up the call-stack, which previously we assumed "would not happen" and let slide. I think there might be a collision happening somewhere where two receivers will start-up at once and delete one's queue files. I think adding a strict mode -- where a Receiver will refuse to start if there are queue files present -- would be a solid idea to avoid this situation. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
Some test data: Benchmark runner systemThe following benchmarks were performed on a machine with 12 core Intel Xeon E5-1650 and 32GB RAM.
hopper_tst/HopperInput { in_memory_total: 16384, max_disk_total: 32768, total_senders: 4, ith: 8192 }
hopper_tst/HopperInput { in_memory_total: 4096, max_disk_total: 32768, total_senders: 4, ith: 8192 }
mspc_tst
QuickcheckI also ran over 5 million iterations of QuickCheck tests overnight, all passing. |
Note: still running up against fd exhaustion issues on my laptop (2 core Intel i7-7500U, 16GB RAM), which might be a more accurate representation of resource allocation within cloud environments / k8s pods provisioned on AWS/GCE. Worth continued investigation IMO |
@tsantero Agreed. I'll keep plugging away. |
Prior to this commit the Receiver was responsible for nuking unused queue files before starting up. Consider what happens if the receiver is very, very late to the party. It'll toast the files it needs to read, then fall over when the files it needs to read are gone. Oops! Signed-off-by: Brian L. Troutwine <blt@postmates.com>
My last commit broke the comment test we had in place. I've shuffled senders to start first as they create the heirarchy of directories needed. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
These changes are non-functional and satisfy clippy. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
@tsantero FWIW I think we're in the clear. I passed 1M quickcheck tests locally and I have an AFL run going on two cores. |
This commit is non-functional but it does add a lot of commentary to the codebase. The rustdocs are also updated and brought up to the times as they are. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
src/deque.rs
Outdated
}; | ||
let elem: Box<T> = Box::from_raw(*self.data.offset((*guard).offset) as *mut T); | ||
*self.data.offset((*guard).offset) = ptr::null_mut(); | ||
self.size.fetch_sub(1, Ordering::Release); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be moved before any of the other pointer modifications, else we might end up allowing Receiver to pull something that is about to be popped off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions and comments for you, @blt
src/deque.rs
Outdated
} | ||
|
||
pub unsafe fn pop_front(&self) -> T { | ||
let mut guard = self.front_lock.lock().expect("front lock poisoned"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we're not using the self.lock_front()
interface here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't think of one. Historical accident, I'd say.
.to_str() | ||
.unwrap() | ||
.parse::<usize>() | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you skip over unparseable entries instead of crashing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically we've assumed that if someone fiddles with hopper's queue directory it should just fail to start or otherwise fall over. Maybe that's a bad assumption?
.unwrap(); | ||
max = cmp::max(num, max); | ||
} | ||
Ok(max) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a Result if it never returns an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will return an error if any of the ?
trigger. That'll happen when the system has exhausted all its file descriptors, for instance.
min = cmp::min(num, min); | ||
} | ||
assert!(worked); | ||
Ok(min) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here about crashing and Result. Why assert instead of returning an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fine question. No good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I get what's going on here. This is a guard to ensure we don't start the receiver before the sender, in which case there'd be no file for it to take a handle on.
In the future we could make file opening a lazy thing, rather than something we do in Receiver::new
if data_dir.is_dir() { | ||
for directory_entry in fs::read_dir(data_dir)? { | ||
let de = directory_entry?; | ||
fs::remove_file(de.path())? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mkdir
in the data directory is an attack vector here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow entirely? Or, to be clear, I guess hopper is not resilient at all to someone fiddling with its data directory. I'd like to understand what scenario you've cooked up though.
fslock.disk_writes_to_read -= 1; | ||
return Some(event); | ||
self.disk_writes_to_read -= 1; | ||
return Ok(event); | ||
} | ||
Err(e) => panic!("Failed decoding. Skipping {:?}", e), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error says skipping, but isn't this a crash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, the crashing-est kind of skip.
Will correct the language.
src/sender.rs
Outdated
guard: &mut MutexGuard<BackGuardInner<SenderSync>>, | ||
) -> Result<(), (T, super::Error)> { | ||
let mut buf: Vec<u8> = Vec::with_capacity(64); | ||
buf.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This clear shouldn't be necessary, as the length of this vector is initially zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yep. Totally redundant.
It turns out there's a bug still yet in the code. It's possible to cause the Receiver to lock up. This happens when the queue is full and we pull two items, even though the atomic counter had previously been incremented to show the presence of one of those items. The Receiver and Sender then double-pop -- in effect -- the item. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
Okay, as demonstrated in ae75bc3 we have a problem. If a Sender calls Which, okay, how do we deal with overflow? Can't throw it out. Here's what we do: the Sender flips into disk-write mode on overflow. The value gets written to disk and the Sender attempts to write a disk placement to the queue. If it can't, stays in disk mode. If it can, goes back to memory mode. We need to be careful to flush a Sender in disk-mode to the queue on Drop etc etc but I think this'll work out okay. |
Well, there's progress here of a kind. pop_back is now gone, we only have two operations on the deque and once something's in there it's in there until its permanently gone. Good. We also block the sender on count size rather than in-memory representation. Also good. Issue is, tests still fail and the need for a flush command to drive the last disk writes out is a tedious API. Only needed on shutdown, flush, so that's a positive for actual Hopper use. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
We were forgetting to reset the total number of disk writes outstanding when the Sender was flushed. This instructed the Receiver to wait for writes that would never come. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
One issue we had with the previous now-we-are-done commit was that none of our tests looped. For any given input it's possible that a concurrency bug will force failure in 1/10000 runs, say. Which, boo-urns, right? This commit does not include any 10k loops but does have a 2.5k loop. The cautious developer is encouraged to bulk this up during development. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
@dparton ready for re-review. |
Okey dokey! I'm pretty sure at this point we're back on track. I've passed 1M QC tests plus the new loopers and the deque is about as bare-bones as I can conceive making it at this point, which is a good sign I do believe. Mostly this commit is fiddling with documentation. I did adjust the benchmark to have the same ith values so we could compare apples to apples. Signed-off-by: Brian L. Troutwine <blt@postmates.com>
There's a lot going on in this PR and it's all in the service of supporting postmates/cernan#411 and postmates/cernan#412.
Prior to this PR hopper caller had no good way of strictly limiting how many items were kept in memory: this value was hard-coded. We've changed the interface to allow this. Further, the old FsSync method -- where all senders and receivers rendezvous on a single locked structure -- is now broken up. We have an in-memory deque protected on either end by a mutex. Senders are isolated from one another but the receiver is able to jam independently.
Overall, the Sender / Receriver code is more straightforward but the deque has introduced a fair bit of unsafe code into hopper. We've seen only a 3x performance drop compared to stdlib MPSC on Linux machines, so, uh, that's not so bad.