MB-20351: Fix lock-order inversion in ~CheckpointManager 57/66357/3
authorDave Rigby <daver@couchbase.com>
Mon, 1 Aug 2016 13:24:39 +0000 (14:24 +0100)
committerDave Rigby <daver@couchbase.com>
Tue, 2 Aug 2016 09:00:25 +0000 (09:00 +0000)
commit787dcc66d6c34842d6037a7912e6c3b94924d253
tree38dcfcb2c628fc3717dffaef321e0e662c8eca1d
parent4035fc41e299656af0a18ff417e1a9c64d1a2362
MB-20351: Fix lock-order inversion in ~CheckpointManager

As identified by TSan. Seen whilst testing sherlock->watson merge,
analysed the code and it seems this is a latent issue and hard to
re-produce.

The issue is that when the executor thread does a reset on the current
task, the VBCBAdapator is the last one holding the ref-counted
vbucket, so destruction occurs and ~VBucket calls the destructor of
the checkpoint manager, which is the reverse locks ordering of a
previous code path.

Number of threads in play here, but main ones of interest:

WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=170834)
  Cycle in lock order graph: M_checkpoint (0x7d640002e9a8) => M_exepool (0x7d4c00008288) => M_taskqueue (0x7d4400008e80) => M_exethread (0x7d380000df60) => M_checkpoint

  Mutex M_exepool acquired here while holding mutex M_checkpoint in thread T35:
    #0 pthread_mutex_lock <null> (engine_testapp+0x000000486760)
    #1 std::mutex::lock() /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/x86_64-linux-gnu/c++/4.9/bits/gthr-default.h:748 (ep.so+0x0000000f47b0)
    #2 ExecutorPool::wake(unsigned long) ep-engine/src/executorpool.cc:355 (ep.so+0x0000000f48f1)
    #3 Flusher::wake() ep-engine/src/flusher.cc:155 (ep.so+0x000000101ee6)
    #4 NotifyFlusherCB::callback(unsigned short&) ep-engine/src/flusher.h:88 (ep.so+0x00000010d194)
    #5 Checkpoint::queueDirty(SingleThreadedRCPtr<Item> const&, CheckpointManager*) ep-engine/src/checkpoint.h:675 (ep.so+0x0000000271b0)
    #6 CheckpointManager::closeOpenCheckpoint_UNLOCKED() ep-engine/src/checkpoint.cc:454 (ep.so+0x000000028dcb)
    #7 CheckpointManager::addNewCheckpoint_UNLOCKED(unsigned long, unsigned long, unsigned long) ep-engine/src/checkpoint.cc:371 (ep.so+0x00000002881f)
    #8 CheckpointManager::checkOpenCheckpoint_UNLOCKED(bool, bool) ep-engine/src/checkpoint.cc:361 (ep.so+0x00000002bd71)
    #9 CheckpointVisitor::visitBucket(RCPtr<VBucket>&) ep-engine/src/checkpoint_remover.cc:43 (ep.so+0x00000003c3bd)
    #10 VBCBAdaptor::run() ep-engine/src/ep.cc:3924 (ep.so+0x0000000a6174)
    #11 ExecutorThread::run() ep-engine/src/executorthread.cc:115 (ep.so+0x0000000fe1b6)
    #12 launch_executor_thread(void*) ep-engine/src/executorthread.cc:33 (ep.so+0x0000000fdd15)
    #13 platform_thread_wrap(void*) platform/src/cb_pthreads.cc:54 (libplatform.so.0.1.0+0x0000000057fb)

...

  Mutex M_checkpoint acquired here while holding mutex M_exethread in thread T36:
    #0 pthread_mutex_lock <null> (engine_testapp+0x000000486760)
    #1 CheckpointManager::~CheckpointManager() /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/x86_64-linux-gnu/c++/4.9/bits/gthr-default.h:748 (ep.so+0x000000027fdd)
    #2 VBucket::~VBucket() ep-engine/src/vbucket.cc:152 (ep.so+0x00000014a018)
    #3 PagingVisitor::~PagingVisitor() ep-engine/src/atomic.h:190 (ep.so+0x00000010a5e6)
    #4 PagingVisitor::~PagingVisitor() ep-engine/src/item_pager.cc:43 (ep.so+0x00000010a645)
    #5 std::_Sp_counted_ptr<PagingVisitor*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/shared_ptr_base.h:373 (ep.so+0x00000010a2b0)
    #6 VBCBAdaptor::~VBCBAdaptor() /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/shared_ptr_base.h:149 (ep.so+0x0000000aea7e)
    #7 ExecutorThread::run() ep-engine/src/atomic.h:325 (ep.so+0x0000000fdee4)
    #8 launch_executor_thread(void*) ep-engine/src/executorthread.cc:33 (ep.so+0x0000000fdd15)
    #9 platform_thread_wrap(void*) platform/src/cb_pthreads.cc:54 (libplatform.so.0.1.0+0x0000000057fb)

Change-Id: I0a966b3d112963243e17647184123fd8b3200656
Reviewed-on: http://review.couchbase.org/66357
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
src/checkpoint.cc