ep-engine.git
4 years agoMB-20943: Set state to dead when deleting vbucket 73/67873/8 4.5.1 v4.5.1
Daniel Owen [Wed, 21 Sep 2016 09:50:33 +0000 (10:50 +0100)]
MB-20943: Set state to dead when deleting vbucket

When executing the VBucketMemoryDeletionTask the vbucket state is
unchanged.  notifyAllPendingConnsFailed is called in the run
function of VBucketMemoryDeletionTask.  This inturn calls fireAllOps,
which ensures all pending ops are cleared if the vbucket is in an
active state.

However if the vbucket is in a pending state is does nothing and
therefore the pending operations remain.  This results in connections
not being closed down.

The solution provided is to set the vbucket state to dead in
deleteVBucket, prior to calling scheduleVBDeletion.

A corresponding test has been added, which without the fix will hang.

Change-Id: I09cd4597b26576dd4b99d26f3a60c031e1b5f41d
Reviewed-on: http://review.couchbase.org/67873
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoMB-20716: Ensure DCP consumers in EWOULDBLOCK are unpaused on bucket delete 75/67375/6
Dave Rigby [Tue, 6 Sep 2016 08:04:16 +0000 (09:04 +0100)]
MB-20716: Ensure DCP consumers in EWOULDBLOCK are unpaused on bucket delete

This is a follow-up to 8734958 - in addition to ensuring that DCP
producers are unpaused, also unpause DCP _consumers_.

Change-Id: I538e7bca865c4fa41240263da1c92312b3866bfa
Reviewed-on: http://review.couchbase.org/67375
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20751: Fix lock cycle (deadlock) during bucket delete & client disconnect 52/67252/4
Dave Rigby [Thu, 1 Sep 2016 15:30:41 +0000 (16:30 +0100)]
MB-20751: Fix lock cycle (deadlock) during bucket delete & client disconnect

MB-20716 recently fixed an issue where idle DCP connections (in
EWOULDBLOCK state) would not be notified (woken up) in the frontend
when a bucket was deleted. The fix for this was to trigger a notify
(via producer->notifyPaused()) as part of ep-engine bucket delete.

Unfortunately this introduced a lock-order-inversion (deadlock)
between two mutxes, which caused memcached to hang during shutdown,
as one (or more) worker threads would never terminate.

The issue is between:

1. Frontend_worker thread mutex (threadLock)
2. ConnMap::connsLock

And at least two threads (although normally 3 in the wild):

T1: Frontend worker thread
T2: DestroyBucket thread
(optional T3: A NONIO thread running ConnManager)

During bucket delete, the follow sequence occurs which creates a cycle
between threadLock and connsLock:

T1<Worker>:
    event_handler() ... conn_pending_close()
      -> LOCK(threadLock)
    DcpConnMap::disconnect()
      -> ACQUIRE(connsLock)

T2<DeleteBucket>:
    EventuallyPersistentEngine::handleDeleteBucket() ...
    DcpConnMap::shutdownAllConnections()
      -> LOCK(connsLock)
    notifyIOComplete() ... DcpProducer::notifyPaused()
      -> ACQUIRE(threadLock)

Part of the issue here is that DcpProducer::notifyPaused() *must* be
called with schedule==false, as there is no longer a ConnNotifier task
running on another thread (which never acquires the connsLock and
hence avoids any deadlock), as the ConnNotifier has been shutdown in
DcpConnMap::shutdownAllConnections previously. Therefore we need to
use the variant of notifyPaused which calls notify_IO_complete in the
same thread.

The solution chosen is to essentially drop the connsLock in
shutdownAllConnections before calling notify. We achive this by taking
a _copy_ of the connections map (under connsLock), and then iterating
over this copy and calling notify etc. This is safe as the elements of
the map are all ref-counted pointers.

Change-Id: I73f9b7576e42030a9f5219ae51e604e36fabcac7
Reviewed-on: http://review.couchbase.org/67252
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
4 years agoMB-20697: Return false if commit2couchstore fails 93/67093/3
Dave Rigby [Fri, 26 Aug 2016 18:29:43 +0000 (19:29 +0100)]
MB-20697: Return false if commit2couchstore fails

This ensures that callers are notified of the failure, and
specifically we correctly increment the ep_item_commit_failed stat

Change-Id: I56f2591479c45c03fba184236aa3790a67290b38
Reviewed-on: http://review.couchbase.org/67093
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>

4 years agoMB-20716: Ensure DCP connections in EWOULDBLOCK are unpaused on bucket delete 69/67169/4
Dave Rigby [Tue, 30 Aug 2016 14:10:32 +0000 (15:10 +0100)]
MB-20716: Ensure DCP connections in EWOULDBLOCK are unpaused on bucket delete

When a bucket delete occurs, memcached notifies the deleted engine via
the ON_DELETE_BUCKET callback, which in turn calls
DCPConnmap::shutdownAllConnections(). This correctly shuts down all
the DCP streams associated with DCP connections, however if any of
these DCP connections are in the EWOULDBLOCK state - i.e. the frontend
is waiting for a notify_IO_complete call to "wake" them up, then the
frontend will be blocked waiting for a notify_IO_complete which will
never arrive.

This behaviour is essentially a latent bug, however prior to the fix
for MB-20549, memcached would (incorrectly) call signalIfIdle on
connections in the EWOULDBLOCK state, forcing them to wake up. With
that fix in place this longer occurs.

The solution here is to explictly unpause all producer connections
when all streams are closed.

Change-Id: Ia105e78304f5481bb56a0c0ff1cfc973959e1016
Reviewed-on: http://review.couchbase.org/67169
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20645: Don't request stats from null DCP backfill manager 25/67025/3
Dave Rigby [Wed, 24 Aug 2016 10:54:13 +0000 (11:54 +0100)]
MB-20645: Don't request stats from null DCP backfill manager

If a DCP Producer has DcpProducer::addStats called on it after its
been disconnected (but before it's removed from the connMap) then we
end up dereferencing a null backfillMgr pointer.

Fix by adding a guard that the manager is valid before including its
stats.

Change-Id: Idc97b447090f5390054a9c40f207dae5494e63b9
Reviewed-on: http://review.couchbase.org/67025
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
4 years agoMB-20425: Change options parameter to correct values 37/66537/9
Daniel Owen [Fri, 5 Aug 2016 11:50:09 +0000 (12:50 +0100)]
MB-20425: Change options parameter to correct values

Updates epstore get to use the options passed in.
Requires the call to ep_engine get from ep_engine
arithmetic to be updated to use the following
options:
QUEUE_BG_FETCH | HONOR_STATES |
TRACK_REFERENCE | HIDE_LOCKED_CAS

Requires the call to ep_engine get from epstore
store to be updated to use the following options:
QUEUE_BG_FETCH | HONOR_STATES | TRACK_REFERENCE |
DELETE_TEMP | HIDE_LOCKED_CAS

Also adds an associated test, where the bloom filter
is disabled which in the presense of the bug will
cause the test to hang.

Change-Id: I8fd275c3e14b0050e172b32f15fb7ed555e4b0c2
Reviewed-on: http://review.couchbase.org/66537
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
4 years agoMB-20425: Remove default options parameter from get functions 31/66531/4
Daniel Owen [Fri, 5 Aug 2016 10:20:30 +0000 (11:20 +0100)]
MB-20425: Remove default options parameter from get functions

The ep_engine get function defaults the option parameter.
The ep_store get function also defaults the option parameter.

These multiple levels of defaulting has made it difficult to
track the value of the options parameter for different calls.
Therefore the use of defaults are removed for these cases.

This will make the change that addresses the regression of
MB-20425 much easier to understand.  This patch makes no
functional change.

Change-Id: I69aaa31a9a437f13299eb019956aa0488f13b95a
Reviewed-on: http://review.couchbase.org/66531
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
4 years agoMerge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson' 25/66325/11
Jim Walker [Wed, 3 Aug 2016 13:54:49 +0000 (14:54 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'

* commit '1301ca609be559248af78d6fa52ce766dd8e4915':
  MB-20307: Re-enable dcp ep_dcp_dead_conn_count
  MB-20312: Initialise snapshot task priority
  MB-20330: ReaderLockHolder with no lvalue

Change-Id: I5878d95f8d792971fbb4ab5342baf4b017b6614a

4 years agoMerge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson' 17/66317/12
Jim Walker [Wed, 3 Aug 2016 13:53:54 +0000 (14:53 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'

* commit '62f88138da834e216b953d3cf8064accb521c205':
  MB-19837: Increase number of NONIO threads
  MB-18453: Make task scheduling fairer
  [BP] MB-18452: Single threaded test harness improvements

Change-Id: If16ed42aed060f94d3180e832aaae0a7f5c5f052

4 years agoMerge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson' 16/66316/12
Jim Walker [Tue, 2 Aug 2016 09:08:25 +0000 (10:08 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'

* commit '36d772883b9bf2179694f2ca9d0575ed52135a66':
  MB-20182: Update checkpoint snapshot correctly during TAP backfill
  MB-20105: Ensure purge_seq is not reset when no items are purged in a compaction
  MB-20054: Fix windows build error by adding size() func in class AtomicQueue
  MB-20054: Fix windows build error by including a missing header file
  MB-20054: Regression test - bucket is deleted with DCPBackfill running
  MB-20054: Account for memory alloc/dealloc in unregisterBucket
  MB-20054: [BP] Add verbose (logging) output to ep_unit_tests_main
  MB-20054: Backport ep-engine_unit_tests from watson to 3.0.x

Change-Id: I4e82985e4ed7c506faa44b19b456b98d1067ed6a

4 years agoMerge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson' 98/66298/5
Jim Walker [Tue, 2 Aug 2016 09:07:21 +0000 (10:07 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'

* commit '6403bc0e8bbd7e94bb03672f505d99ff68d56c36':
  MB-18453: Give all tasks their own stats and priority
  [BP] MB-18580: Wait for VB state to be persisted before starting tests

Change-Id: Ia573467344f4c9ee2e2092322e54f7788201310e

4 years agoMerge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson' 96/66296/6
Jim Walker [Tue, 2 Aug 2016 09:05:49 +0000 (10:05 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'

* commit 'c509624bda146dfdc26ebaf8044657ecc1160912':
  MB-19948: enable disabled meta-data tests.
  MB-19948: Handle 18 bytes of metadata
  MB-19948: CouchKVStore metadata tests
  MB-19897: Fix the data race on lastSendTime between stats and dcp worker threads

Change-Id: Ia8b67427b969eefc50310762b92e9aa6a4662003

4 years agoMB-20351: Fix lock-order inversion in ~CheckpointManager 57/66357/3
Dave Rigby [Mon, 1 Aug 2016 13:24:39 +0000 (14:24 +0100)]
MB-20351: Fix lock-order inversion in ~CheckpointManager

As identified by TSan. Seen whilst testing sherlock->watson merge,
analysed the code and it seems this is a latent issue and hard to
re-produce.

The issue is that when the executor thread does a reset on the current
task, the VBCBAdapator is the last one holding the ref-counted
vbucket, so destruction occurs and ~VBucket calls the destructor of
the checkpoint manager, which is the reverse locks ordering of a
previous code path.

Number of threads in play here, but main ones of interest:

WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=170834)
  Cycle in lock order graph: M_checkpoint (0x7d640002e9a8) => M_exepool (0x7d4c00008288) => M_taskqueue (0x7d4400008e80) => M_exethread (0x7d380000df60) => M_checkpoint

  Mutex M_exepool acquired here while holding mutex M_checkpoint in thread T35:
    #0 pthread_mutex_lock <null> (engine_testapp+0x000000486760)
    #1 std::mutex::lock() /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/x86_64-linux-gnu/c++/4.9/bits/gthr-default.h:748 (ep.so+0x0000000f47b0)
    #2 ExecutorPool::wake(unsigned long) ep-engine/src/executorpool.cc:355 (ep.so+0x0000000f48f1)
    #3 Flusher::wake() ep-engine/src/flusher.cc:155 (ep.so+0x000000101ee6)
    #4 NotifyFlusherCB::callback(unsigned short&) ep-engine/src/flusher.h:88 (ep.so+0x00000010d194)
    #5 Checkpoint::queueDirty(SingleThreadedRCPtr<Item> const&, CheckpointManager*) ep-engine/src/checkpoint.h:675 (ep.so+0x0000000271b0)
    #6 CheckpointManager::closeOpenCheckpoint_UNLOCKED() ep-engine/src/checkpoint.cc:454 (ep.so+0x000000028dcb)
    #7 CheckpointManager::addNewCheckpoint_UNLOCKED(unsigned long, unsigned long, unsigned long) ep-engine/src/checkpoint.cc:371 (ep.so+0x00000002881f)
    #8 CheckpointManager::checkOpenCheckpoint_UNLOCKED(bool, bool) ep-engine/src/checkpoint.cc:361 (ep.so+0x00000002bd71)
    #9 CheckpointVisitor::visitBucket(RCPtr<VBucket>&) ep-engine/src/checkpoint_remover.cc:43 (ep.so+0x00000003c3bd)
    #10 VBCBAdaptor::run() ep-engine/src/ep.cc:3924 (ep.so+0x0000000a6174)
    #11 ExecutorThread::run() ep-engine/src/executorthread.cc:115 (ep.so+0x0000000fe1b6)
    #12 launch_executor_thread(void*) ep-engine/src/executorthread.cc:33 (ep.so+0x0000000fdd15)
    #13 platform_thread_wrap(void*) platform/src/cb_pthreads.cc:54 (libplatform.so.0.1.0+0x0000000057fb)

...

  Mutex M_checkpoint acquired here while holding mutex M_exethread in thread T36:
    #0 pthread_mutex_lock <null> (engine_testapp+0x000000486760)
    #1 CheckpointManager::~CheckpointManager() /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/x86_64-linux-gnu/c++/4.9/bits/gthr-default.h:748 (ep.so+0x000000027fdd)
    #2 VBucket::~VBucket() ep-engine/src/vbucket.cc:152 (ep.so+0x00000014a018)
    #3 PagingVisitor::~PagingVisitor() ep-engine/src/atomic.h:190 (ep.so+0x00000010a5e6)
    #4 PagingVisitor::~PagingVisitor() ep-engine/src/item_pager.cc:43 (ep.so+0x00000010a645)
    #5 std::_Sp_counted_ptr<PagingVisitor*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/shared_ptr_base.h:373 (ep.so+0x00000010a2b0)
    #6 VBCBAdaptor::~VBCBAdaptor() /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/shared_ptr_base.h:149 (ep.so+0x0000000aea7e)
    #7 ExecutorThread::run() ep-engine/src/atomic.h:325 (ep.so+0x0000000fdee4)
    #8 launch_executor_thread(void*) ep-engine/src/executorthread.cc:33 (ep.so+0x0000000fdd15)
    #9 platform_thread_wrap(void*) platform/src/cb_pthreads.cc:54 (libplatform.so.0.1.0+0x0000000057fb)

Change-Id: I0a966b3d112963243e17647184123fd8b3200656
Reviewed-on: http://review.couchbase.org/66357
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoMerge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson' 94/66294/1
Jim Walker [Thu, 28 Jul 2016 18:29:17 +0000 (19:29 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'

* commit 'dd3b6ae5e919bf51adaf5183fc8f1a076eac5357':
  MB-19982: Don't hold connsLock for duration of dcp stats
  MB-19982: Fix potential deadlock between DcpConsumer::bufMutex & connsLock
  MB-14859: Handle quick successive BG Fetch of a key interleaved with exp pager

Change-Id: Ie620baa1dc2151124f072084868020d3067c5fb2

4 years agoMB-20307: Re-enable dcp ep_dcp_dead_conn_count 84/66284/4 sherlock v4.1.2 v4.1.2-MP1 v4.1.2-MP2
Jim Walker [Thu, 28 Jul 2016 10:38:45 +0000 (11:38 +0100)]
MB-20307: Re-enable dcp ep_dcp_dead_conn_count

The call to collect this stat was dropped in
a recent merge commit. This commit adds it back.

Change-Id: I06d1d18cb4479edb2a74d899d4c3a8089a0c4656
Reviewed-on: http://review.couchbase.org/66284
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>

4 years agoMB-20312: Initialise snapshot task priority 83/66283/3
Jim Walker [Thu, 28 Jul 2016 10:36:53 +0000 (11:36 +0100)]
MB-20312: Initialise snapshot task priority

The internal priority of VBSnapshotTask is not
intitialised, it is likely tasks requested to run at
low may actually become high (or vice versa).

Note this is not the GlobalTask priority, just an internal
one to this particular task.

Change-Id: Iabf91a8fe6fee0a8cf8bce99e72e4b22dd57040b
Reviewed-on: http://review.couchbase.org/66283
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>

4 years agoMB-20330: ReaderLockHolder with no lvalue 82/66282/2
Jim Walker [Thu, 28 Jul 2016 10:32:46 +0000 (11:32 +0100)]
MB-20330: ReaderLockHolder with no lvalue

3.x merge brought in the wrong version of some
code meaning that a read lock is never acquired.

Change-Id: I139ac041d54fdf8d459f4309a9c2be22e40afb8e
Reviewed-on: http://review.couchbase.org/66282
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>

4 years agoMerge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson' 64/66064/4
Jim Walker [Wed, 27 Jul 2016 09:20:31 +0000 (10:20 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'

* commit '9b194271f12e9b620c803a11b77a62e5402fb346': (22 commits)
  MB-19886: Fix data race on ActiveStream::curChkSeqno by making it atomic
  MB-19886: In markDiskSnapshot() get current vb snapshot info outside streamMutex
  MB-19843: Modify the end_seqno in DCP stream request after checking for rollback
  MB-19732: Fix the data race on lastSendTime between stats and dcp worker threads
  MB-19732: Record time for all DCP consumer messages
  MB-19732: Only update sendTime if successfully send noop
  MB-19691: Address data race on vb_state::high_seqno
  MB-19678: Merge backfill and in-memory snapshots correctly on replica vb
  MB-19636: Initialise failovers correctly from 2.5.x vbstate
  MB-19673: Log the actual last seqno sent before closing the stream.
  MB-19503: Fix ConnMap so notifications don't go missing [2]
  MB-19503: Fix ConnMap so notifications don't go missing.
  MB-19404: [BP] Address data race in DCP-Producer seen while making a stats request
  MB-19405: [BP] Address possible data races in PassiveStream context
  MB-19359: [3] Address lock inversion with vb's state lock and snapshot lock
  MB-19383: [BP] Address possible data race with startuptime
  MB-19380: Address data race observed with vb's pendingBGFetches
  MB-19360: Init mock server in stream module tests
  MB-19382: [BP] Create a variable to get correct locking scope
  MB-19359: [2] Address lock inversion with vb's state lock and snapshot lock
  ...

Change-Id: I70a7a29f33c7ec276e3bc99bc80a9e6fd739281a

4 years agoMerge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson' 60/66060/5
Jim Walker [Wed, 27 Jul 2016 09:19:50 +0000 (10:19 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'

* commit 'ac78070d8dae90427c4bd3030a7be4ab09f920ae':
  [BP] MB-16366: Obtain vbstate readlock in numerous operations
  MB-19280: Fix data race in CouchKVStore stats access
  MB-19279: Fix race in use of gmtime()
  MB-19113: Suppress test_mb16357 when on thread sanitizer
  MB-19278: Fix lock-order inversion on ActiveStream::streamMutex
  MB-19277: Set executorThread's waketime to atomic
  MB-19276: Fix data race on ExecutorThread::taskStart
  MB-19275: Address data race on a DCP stream's state
  MB-19273: Fix data race on PassiveStream::buffer.{bytes,items}
  MB-19260: Make cookie atomic to serialize set/get in ConnHandler
  MB-19259: Fix data race on DcpConsumer::backoffs
  MB-19258: Address data race with replicationThrottle parameters
  MB-19281: [BP] Add template class RelaxedAtomic<>
  MB-19257: Fix data race on ExecutorThread::now
  MB-19256: Address possible data race on VBCBAdaptor::currentvb

Change-Id: Ie8194d570b1d367a90d277ed086dec90eb99d6e9

4 years agoMerge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson' 59/66059/5
Jim Walker [Wed, 27 Jul 2016 09:19:09 +0000 (10:19 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'

* commit '644c09df35ac4e05006347140240819704848d0f':
  MB-19253: Fix race in void ExecutorPool::doWorkerStat
  MB-19252: Fix data race on Stream::readyQueueMemory
  MB-19251: Fix race in updating Vbucket.file{SpaceUsed,Size}
  MB-19249: Address possible data races in ConnHandler context
  MB-19248: Fix race in TaskQueue.{ready,future,pending}Queue access
  MB-19247: Fix possible data race in workload.h: workloadPattern
  MB-19246: Fix potentially incorrect persist_time in OBSERVE response
  MB-19229: Address possible data race in vbucket.cc: numHpChks
  MB-19228: Address possible data races in ActiveStream context
  MB-19227: Fix race in ConnNotifier.task access
  MB-19226: Address potential data races in the warmup code
  MB-19225: Fix data race on Flusher::taskId
  MB-19225: Fix race in Flusher._state
  MB-19224: Address possible data race with global task's waketime
  MB-19223: Switch to hrtime from timeval in Global Thread Pool

Change-Id: I6cab0405f2ce779e2cf9849fa5364a9549382905

4 years agoMerge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson' 54/66054/6
Jim Walker [Wed, 27 Jul 2016 08:03:35 +0000 (09:03 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'

* commit '8cbe913fa9a2f78388adb2d2ce6dbfeee1e23e6e':
  MB-19222: Fix race condition in TaskQueue shutdown
  MB-19220: Ensure HashTable::size is atomic
  MB-19204: ep_testsuite: Don't release the item while we're using it
  MB-19204: Address data race in ep_test_apis/testsuite
  MB-19204: ep_testsuite: Use std::string for last_key/body
  MB-19204: Remove alarm() call from atomic_ptr_test, reduce iteration count
  MB-19204: hash_table_test: Fix TSan issues
  MB-16656: Send snapshotEnd as highSeqno for replica vb in GET_ALL_VB_SEQNOS call
  MB-19153: Break circular dependency while deleting bucket
  MB-19113: Address false positive lock inversion seen with test_mb16357

Change-Id: I708f67379ab38ea1af8c1602b790e590c3038806

4 years agoMB-20224: [BP] Replace ThreadLocal '#define' with a using 10/66210/2
Will Gardner [Tue, 29 Mar 2016 12:26:35 +0000 (13:26 +0100)]
MB-20224: [BP] Replace ThreadLocal '#define' with a using

Using a define causes issues inside of GoogleTest which has its
own ThreadLocal class. By replacing it with 'using' we avoid
the collision.

*Required to build MB-20224 on watson.*

Change-Id: I05d3e25efc0eb361f7dbe82074d806ba116781c5
Reviewed-on: http://review.couchbase.org/66210
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
4 years agoMB-19837: Increase number of NONIO threads 34/65934/6
Jim Walker [Tue, 19 Jul 2016 14:18:48 +0000 (15:18 +0100)]
MB-19837: Increase number of NONIO threads

We've found that the NONIO threads can become
heavily utilised. On smaller systems only 1 or 2
threads were created, easily overwhelmed during
rebalance leading to rebalance failures.

This commit changes the code from creating
NONIO as 10% of nCPU to be 30% of nCPU and
ensuring at least 2 are always present.

The % is still hardwired because the thread count is global
and would be intialised by the first bucket's config.

Given that we can already override with a config flag the changes
are hardwired to give better throughput for nearly all users.

Comparison of old vs new.

CPU count = 1 NONIO threads old{1} new{2}
CPU count = 2 NONIO threads old{1} new{2}
CPU count = 4 NONIO threads old{1} new{2}
CPU count = 8 NONIO threads old{1} new{2}
CPU count = 16 NONIO threads old{2} new{3}
CPU count = 32 NONIO threads old{3} new{7}
CPU count = 36 NONIO threads old{3} new{8}
CPU count = 64 NONIO threads old{5} new{8}
CPU count = 128 NONIO threads old{8} new{8}

Change-Id: Ifa56730ad934ca9ae83993b3c539f4a725872696
Reviewed-on: http://review.couchbase.org/65934
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-18453: Make task scheduling fairer 29/65929/3
Jim Walker [Thu, 30 Jun 2016 10:23:20 +0000 (11:23 +0100)]
MB-18453: Make task scheduling fairer

The MB identified that we can starve tasks by scheduling
a higher priority task via ExecutorPool::wake().

This occurs because ExecutorPool::wake() pushes tasks
into the readyQueue enabling frequent wakes to trigger
the starvation bug.

The fix is to remove readyQueue.push from wake, so that we only
push to the readyQueue. The fetch side of scheduling only looks at
the futureQueue once the readyQueue is empty, thus the identified
starvation won't happen.

A unit-test demonstrates the fix using the single-threaded harness and
expects that two tasks of differing priorities get executed, rather
than the wake() starving the low-priority task.

This test drives:
 - ExecutorPool::schedule
 - ExecutorPool::reschedule
 - ExecutorPool::wake

These are all the methods which can add tasks into the scheduler
queue.

The fetch side is also covered:
 - ExecutorPool::fetchNextTask

This commit is an update to a previous commit that was reverted due
to performance issues. The original commit was reverted to minimise
disruption.

- original commit is e22c9ebeda1aac
- revert is 27cb1120e3e37

Change-Id: I70a4dcf7cd1c3a6f04548e9bbc3f95e24cdf50ad
Reviewed-on: http://review.couchbase.org/65929
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years ago[BP] MB-18452: Single threaded test harness improvements 28/65928/3
Jim Walker [Thu, 2 Jun 2016 15:05:50 +0000 (16:05 +0100)]
[BP] MB-18452: Single threaded test harness improvements

Refactor parts of the very new evp_store_single_threaded_test so that
it's simpler to drive tasks making new tests easier to write.

The main change is to provide helper methods for running any task from
a queue (with some checks) and a way to push a clean shutdown.

Change-Id: I7add574f0768c642f3c6c7c64293e882337a1cdc
Reviewed-on: http://review.couchbase.org/65928
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20182: Update checkpoint snapshot correctly during TAP backfill 66/65866/4
Manu Dhundi [Tue, 19 Jul 2016 18:21:11 +0000 (11:21 -0700)]
MB-20182: Update checkpoint snapshot correctly during TAP backfill

When we do a TAP backfill we must update checkpoint snapshot start
and end correctly. Otherwise, if we immediately proceed to DCP
after TAP backfill, the checkpoint mgr will have incorrect combination
of {snap_start, snap_end, vb_high_seqno}

Change-Id: I2b738fd3b24486dadbd2962e81e0c3820c5a8786
Reviewed-on: http://review.couchbase.org/65866
Tested-by: buildbot <build@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
4 years agoRevert "MB-18453: Make task scheduling fairer" 80/65780/2
Jim Walker [Thu, 14 Jul 2016 17:07:29 +0000 (10:07 -0700)]
Revert "MB-18453: Make task scheduling fairer"

When running in a >1 node cluster memcached CPU is running
very high. The original fix has introduced a problem which
needs further investigation (fetchTask is very very cpu hot).

This reverts commit e22c9ebeda1aac2fc8f4325cc39a93c3bcefffab.

Change-Id: If53a3a60692fbaaef4e54462f99284a8044cd899
Reviewed-on: http://review.couchbase.org/65780
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMerge remote-tracking branch 'couchbase/3.0.x' 57/65657/2
Manu Dhundi [Mon, 11 Jul 2016 22:10:48 +0000 (15:10 -0700)]
Merge remote-tracking branch 'couchbase/3.0.x'

*
|\
| * 6e10f8a 2016-07-11 | MB-20105: Ensure purge_seq is not reset when no items are purged in a compaction
| * 1fe3aac 2016-07-07 | MB-20054: Fix windows build error by adding size() func in class AtomicQueue
| * 536e32f 2016-07-07 | MB-20054: Fix windows build error by including a missing header file

Change-Id: Ib01ecb053ffa4b6fe2a6bac6cfbe6eccc3630549

4 years agoMB-20105: Ensure purge_seq is not reset when no items are purged in a compaction 26/65626/5 v3.1.6
Manu Dhundi [Mon, 11 Jul 2016 21:38:15 +0000 (14:38 -0700)]
MB-20105: Ensure purge_seq is not reset when no items are purged in a compaction

When a compaction request is made, we initially set the purge_seqno in the req
to 0, hoping to update it when we purge items. However, if there are no purged
items in a compaction call, then we end up reseting the purge_seqno
(correct one) set by the previous compaction call.

This commit addresses the problem by setting the purge seqno in the request
to current purge seqno in the ep-engine.

Change-Id: I9581abe7a4cb9d7cd84c1bf5563b98c91dc67525
Reviewed-on: http://review.couchbase.org/65626
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 15/65615/3
Dave Rigby [Fri, 8 Jul 2016 13:44:29 +0000 (14:44 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* couchbase/3.0.x:
  MB-20054: Regression test - bucket is deleted with DCPBackfill running
  MB-20054: Account for memory alloc/dealloc in unregisterBucket
  MB-20054: [BP] Add verbose (logging) output to ep_unit_tests_main

Change-Id: I5f05dd3355cc0d581350db65463c6c1dc155f3c6

4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 13/65613/3
Dave Rigby [Fri, 8 Jul 2016 09:49:51 +0000 (10:49 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* couchbase/3.0.x:
  MB-20054: Backport ep-engine_unit_tests from watson to 3.0.x

Change-Id: I811e0b796be8611a4a574ab6b6a488ef50219bbf

4 years agodocs/Testing.md: Document the different test types 85/65585/4
Dave Rigby [Thu, 7 Jul 2016 15:03:05 +0000 (16:03 +0100)]
docs/Testing.md: Document the different test types

Change-Id: I3fcdb9e7cb347fa63f94e5c7a760bb54749aa375
Reviewed-on: http://review.couchbase.org/65585
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-20054: Fix windows build error by adding size() func in class AtomicQueue 96/65596/2
Manu Dhundi [Thu, 7 Jul 2016 21:28:52 +0000 (14:28 -0700)]
MB-20054: Fix windows build error by adding size() func in class AtomicQueue

Change-Id: I808e31c9a9ba97b67e75c07534350aa91cb040a2
Reviewed-on: http://review.couchbase.org/65596
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Sriram Ganesan <sriram@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
4 years agoMB-20054: Fix windows build error by including a missing header file 89/65589/2
Manu Dhundi [Thu, 7 Jul 2016 17:50:45 +0000 (10:50 -0700)]
MB-20054: Fix windows build error by including a missing header file

Change-Id: Ifdc8d6a09e8ff1bd68218066cffa44bc9de0a5a3
Reviewed-on: http://review.couchbase.org/65589
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Sriram Ganesan <sriram@couchbase.com>
4 years agoMB-20054: Regression test - bucket is deleted with DCPBackfill running 20/65520/14
Dave Rigby [Thu, 7 Jul 2016 08:23:25 +0000 (09:23 +0100)]
MB-20054: Regression test - bucket is deleted with DCPBackfill running

Regression test for MB-20054 - the following abort is encountered when
a DCPBackfill task is still running when a bucket is deleted:

    Assertion failed: (engine), function verifyEngine, file
    ep-engine/src/objectregistry.cc, line 58.

This issue occurs because the DCPBackfill object (and associated
objects ActiveStream and importantly ActiveStreams' readyQ of Items)
is not deleted earlier in the shutdown sequence (via EvpDestroy), as
we use ref-counted pointers for it and there is a still an outstanding
reference by the AuxIO Thread which is running the task. Hence the
DCPBackfill object is only deleted when we finally unregister the
deleted bucket from the shared ExecutorPool - see the following
backtrace:

    #1  0x00007f513b75a085 in abort () from /lib64/libc.so.6
    #2  0x00007f51337034e2 in ObjectRegistry::onDeleteItem (pItem=<value optimized out>) at ep-engine/src/objectregistry.cc:157
    #3  0x00007f5133652094 in ~Item (this=<value optimized out>) at ep-engine/src/item.h:352
    #4  SingleThreadedRCPtr<Item>::~SingleThreadedRCPtr (this=<value optimized out>) at ep-engine/src/atomic.h:430
    #5  0x00007f51336c7f47 in ~MutationResponse (this=0x3cd87880) at ep-engine/src/dcp-response.h:275
    #6  MutationResponse::~MutationResponse (this=0x3cd87880) at ep-engine/src/dcp-response.h:275
    #7  0x00007f51336d86aa in clear_UNLOCKED (this=0x7a3f5fa0) at ep-engine/src/dcp-stream.cc:201
    #8  ~ActiveStream (this=0x7a3f5fa0) at ep-engine/src/dcp-stream.h:178
    #9  ActiveStream::~ActiveStream (this=0x7a3f5fa0) at ep-engine/src/dcp-stream.h:179
    #10 0x00007f51336cc808 in RCPtr<Stream>::~RCPtr (this=0xb1823780) at ep-engine/src/atomic.h:348
    #11 0x00007f51336d77c7 in ~DCPBackfill (this=0xb1823740) at ep-engine/src/dcp-stream.cc:114
    #12 DCPBackfill::~DCPBackfill (this=0xb1823740) at ep-engine/src/dcp-stream.cc:114
    #13 0x00007f513368d95f in ~SingleThreadedRCPtr (this=0x5b55a20, e=0x59c4000, taskType=NO_TASK_TYPE) at ep-engine/src/atomic.h:430
    #14 ExecutorPool::_stopTaskGroup (this=0x5b55a20, e=0x59c4000, taskType=NO_TASK_TYPE) at ep-engine/src/executorpool.cc:532
    #15 0x00007f513368dad3 in ExecutorPool::_unregisterBucket (this=0x5b55a20, engine=0x59c4000) at ep-engine/src/executorpool.cc:551
    #16 0x00007f513368e143 in ExecutorPool::unregisterBucket (this=0x5b55a20, engine=0x59c4000) at ep-engine/src/executorpool.cc:602
    #17 0x00007f5133655f82 in EventuallyPersistentStore::~EventuallyPersistentStore (this=0x59e6000)
        at ep-engine/src/ep.cc:365
    #18 0x00007f5133672a25 in EventuallyPersistentEngine::~EventuallyPersistentEngine (this=0x59c4000)
        at ep-engine/src/ep_engine.cc:5791
    #19 0x00007f5133672c95 in EvpDestroy (handle=0x59c4000, force=<value optimized out>) at ep-engine/src/ep_engine.cc:143

To actually reproduce the issue is somewhat involved - we need to
orchestrate the world such that we delete the engine while a
DCPBackfill task is still running. We spin up a separate thread which
will run the DCPBackfill task concurrently with destroy - specifically
DCPBackfill must start running (and add items to the readyQ) before
destroy(), it must then continue running (stop after) _stopTaskGroup
is invoked.  To achieve this we use a couple of condition variables to
synchronise between the two threads - the timeline needs to look like:

    auxIO thread:  [------- DCPBackfill ----------]
     main thread:      [--destroy()--]       [ExecutorPool::_stopTaskGroup]

    --------------------------------------------------------> time

Change-Id: Ic64c419cb8e4e0af2378efba9711b121aacee15b
Reviewed-on: http://review.couchbase.org/65520
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoMB-20054: Account for memory alloc/dealloc in unregisterBucket 25/65525/8
Dave Rigby [Tue, 5 Jul 2016 20:34:56 +0000 (21:34 +0100)]
MB-20054: Account for memory alloc/dealloc in unregisterBucket

While unregistering a bucket, any memory allocations/deallocations
made should be accounted to the bucket in question.  Hence no
`onSwitchThread(NULL)` call.

Change-Id: I5c260e3aa7e2c8d1fd4ff0a1ca20f2185a7362a8
Reviewed-on: http://review.couchbase.org/65525
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoMB-20054: [BP] Add verbose (logging) output to ep_unit_tests_main 16/65516/8
Dave Rigby [Tue, 5 Jul 2016 11:31:59 +0000 (12:31 +0100)]
MB-20054: [BP] Add verbose (logging) output to ep_unit_tests_main

Not originally part of MB-20054, but needed for test development for
this MB.

Change-Id: Ia38db00d4f8cd84b2c90b5bddbd0bc01f51b61de
Reviewed-on: http://review.couchbase.org/65516
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoMB-20054: Backport ep-engine_unit_tests from watson to 3.0.x 79/64979/11
Dave Rigby [Thu, 16 Jun 2016 11:19:34 +0000 (12:19 +0100)]
MB-20054: Backport ep-engine_unit_tests from watson to 3.0.x

In Watson we have created a set of 'unit' (i.e. class-level) tests for
ep-engine. To assist in backporting bug fixes, and specifically their
unit tests (to demonstrate they are correct), this patch backports the
test infrastructure itself.

Note these tests require GTest, so the CMake changes necessary for it
have also been included.

Tests are a backport from couchbase/watson as of commit feda304.
Modified to handle changes in APIs etc, and to remove tests
which fail on 3.0.x as we never chose to fix them in the 3.0.x
branch.

Change-Id: Iaaf59b0d8d6ba0a2211b630ba00fd837ca01614a
Reviewed-on: http://review.couchbase.org/64979
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 82/65582/1
Dave Rigby [Thu, 7 Jul 2016 13:59:04 +0000 (14:59 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* couchbase/3.0.x:
  MB-19982: Don't hold connsLock for duration of dcp stats
  MB-19982: Fix potential deadlock between DcpConsumer::bufMutex & connsLock
  MB-14859: Handle quick successive BG Fetch of a key interleaved with exp pager

Change-Id: Ie192ce93370c3218948434794b335732a6a7ff18

4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 81/65581/1
Dave Rigby [Thu, 7 Jul 2016 13:48:01 +0000 (14:48 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* couchbase/3.0.x:
  MB-19886: Fix data race on ActiveStream::curChkSeqno by making it atomic
  MB-19886: In markDiskSnapshot() get current vb snapshot info outside streamMutex
  MB-19843: Modify the end_seqno in DCP stream request after checking for rollback

Change-Id: I32c52689b4c78f3416af180818df772d217db882

4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 78/65578/1
Dave Rigby [Thu, 7 Jul 2016 13:13:19 +0000 (14:13 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* couchbase/3.0.x:
  MB-19732: Fix the data race on lastSendTime between stats and dcp worker threads
  MB-19732: Record time for all DCP consumer messages
  MB-19732: Only update sendTime if successfully send noop
  MB-19691: Address data race on vb_state::high_seqno

Change-Id: I2d994dd799c8fe5ee5779d3916e374aa3fa9615b

4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 75/65575/2
Dave Rigby [Thu, 7 Jul 2016 13:20:43 +0000 (14:20 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* couchbase/3.0.x:
  MB-19678: Merge backfill and in-memory snapshots correctly on replica vb

Change-Id: I1b0adbafe45e4f4414da62846019b55c7dd05833

4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 74/65574/2
Dave Rigby [Thu, 7 Jul 2016 12:59:10 +0000 (13:59 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* couchbase/3.0.x:
  MB-19636: Initialise failovers correctly from 2.5.x vbstate
  MB-19673: Log the actual last seqno sent before closing the stream.

Change-Id: If0aae515a9fb3232a390b8228cf92274fcc81456

4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 71/65571/1
Dave Rigby [Thu, 7 Jul 2016 10:57:50 +0000 (11:57 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* couchbase/3.0.x:
  MB-19503: Fix ConnMap so notifications don't go missing [2]
  MB-19503: Fix ConnMap so notifications don't go missing.
  MB-19404: [BP] Address data race in DCP-Producer seen while making a stats request
  MB-19405: [BP] Address possible data races in PassiveStream context

Change-Id: I241ebd07f9e6177d557dd0ea37da97d6b4cc1489

4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 89/65389/2
Dave Rigby [Thu, 7 Jul 2016 09:34:12 +0000 (10:34 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* couchbase/3.0.x:
  MB-19359: [3] Address lock inversion with vb's state lock and snapshot lock
  MB-19383: [BP] Address possible data race with startuptime
  MB-19380: Address data race observed with vb's pendingBGFetches
  MB-19360: Init mock server in stream module tests
  MB-19382: [BP] Create a variable to get correct locking scope

Change-Id: Ice3e97c12ee1423923ffeda47bc30890332a1770

4 years agoMB-18453: Make task scheduling fairer 85/65385/7
Jim Walker [Thu, 30 Jun 2016 10:23:20 +0000 (11:23 +0100)]
MB-18453: Make task scheduling fairer

The MB identified that we can starve tasks by scheduling
a higher priority task via ExecutorPool::wake().

This occurs because ExecutorPool::wake() pushes tasks
into the readyQueue enabling frequent wakes to trigger
the starvation bug.

The fix is to remove readyQueue.push from wake, so that we only
push to the readyQueue. The fetch side of scheduling only looks at
the futureQueue once the readyQueue is empty, thus the identified
starvation won't happen.

A unit-test demonstrates the fix using the single-threaded harness and
expects that two tasks of differing priorities get executed, rather
than the wake() starving the low-priority task.

This test drives:
 - ExecutorPool::schedule
 - ExecutorPool::reschedule
 - ExecutorPool::wake

These are all the methods which can add tasks into the scheduler
queue.

The fetch side is also covered:
 - ExecutorPool::fetchNextTask

Change-Id: Ie797a637ce4e7066e3155751ff467bc65d083646
Reviewed-on: http://review.couchbase.org/65385
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMerge "Merge remote-tracking branch 'couchbase/sherlock' into watson" into watson
Dave Rigby [Tue, 5 Jul 2016 17:08:16 +0000 (17:08 +0000)]
Merge "Merge remote-tracking branch 'couchbase/sherlock' into watson" into watson

4 years agoMB-20046: ep_store_test: Use the correct dbname instead of 'test' 00/65300/4
Dave Rigby [Tue, 14 Jun 2016 13:49:09 +0000 (14:49 +0100)]
MB-20046: ep_store_test: Use the correct dbname instead of 'test'

While EventuallyPersistentStoreTest declares a test_dbname variable,
and attempts to delete any files in this directory at the start of the
run, the variable isn't added to the actual config string pased into
EPEngine, resulting in us using the default dbname ('test'), and hence
failing to delete any previous data files.

Fix by adding the dbname to the test config.

Change-Id: I768850277ee3888c0d02bb823203569ff968ee3a
Reviewed-on: http://review.couchbase.org/65300
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-18453: Give all tasks their own stats and priority 53/65253/10
Jim Walker [Sun, 26 Jun 2016 19:52:10 +0000 (20:52 +0100)]
MB-18453: Give all tasks their own stats and priority

MB-18453 identified that tasks have copied & pasted
constructors which leads to some tasks all having the
same Priority object.

The fallout of this is that many tasks now all contribute
to the same histogram for runtime and scheduling waittime.
When debugging issues which lead to MB-18453 it is near
impossible at times to know which real task was delayed
as the stats can be attributed to many tasks.

This commit introduces makes all tasks have their own ID
and thus their own histograms and also makes it easier
for new tasks to be created without forgetting to create
a new Priority instance.

tasks.defs.h is a new file that captures every sub-class
of GlobalTask and shows the priority of all tasks.

TASK macros are now used to generate various switch
statements and enums used in task accounting.

The new system is not strict, MyTask could still be
assigned the priority/id of OldTask, however this
flexibility can be useful in some circumstances.

Note this patch has changed ep_testsuite test_item_pager
to increase the max_size value in the test config. This
is because this patch increases the baseline heap usage
of a bucket as we've increased the number of Histogram
object allocated by EventuallyPersistentStore.

Prior to this patch 28 were allocated, with this patch
51 are allocated (1 per task). Each Histogram<hrtime_t
is approx 1568 bytes (on OSX clang build).

The new max_size is 2.5MiB

Change-Id: I209c67945b964023615af37a12f83ca97142ce53
Reviewed-on: http://review.couchbase.org/65253
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years ago[BP] MB-18580: Wait for VB state to be persisted before starting tests 98/65298/5
Dave Rigby [Wed, 4 May 2016 09:49:59 +0000 (10:49 +0100)]
[BP] MB-18580: Wait for VB state to be persisted before starting tests

Intermittent test failures (across multiple tests) have been seen
where we fail to read the number of items in vbucket disk file:

    terminate called after throwing an instance of 'std::invalid_argument'
    what(): CouchKVStore::getDbFileInfo: Failed to open database file for vBucket = 1 rev = 1 with error:no such file

The issue is that we do not correctly wait for the vBucket files to be
created before starting a test. We /attempt/ to wait in test_setup,
waiting for ep_vb_snapshot_total to be non-zero, however this stat is
not updated when vBuckets are written to disk, instead only when the
vb state snapshot occurs.

To fix this, create a new histogram stat - ep_vb_persist_state_total -
which records how long the actual persist takes (and counts then at
the same time). Change test_setup to check for this stat becoming 1
before continuing.

Results in two new stats:

* disk_persist_vbstate - timing histogram of how long vbState
                          operations took.

* ep_persist_vbstate_total - count of how many VBStatePersists have
                             occurred.

Change-Id: Ic24e6cdb51a98ea6fa65005158242bfcf44225d0
Reviewed-on: http://review.couchbase.org/65298
Reviewed-by: Dave Rigby <daver@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-18452: Extra refactoring and single-threaded test 56/64956/8
Jim Walker [Tue, 14 Jun 2016 15:16:10 +0000 (16:16 +0100)]
MB-18452: Extra refactoring and single-threaded test

Some extra refactoring applied to watson branch and
a single threaded test utilising the watson+
single-threaded unit test harness.

Change-Id: I3028c079e448552987268206ed2664c10933085a
Reviewed-on: http://review.couchbase.org/64956
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
4 years agoMB-19948: enable disabled meta-data tests. 24/65324/3
Jim Walker [Wed, 29 Jun 2016 07:57:47 +0000 (08:57 +0100)]
MB-19948: enable disabled meta-data tests.

With MB-19948 these tests no longer fail
valgrind, so can be enabled.

Change-Id: Ida628a3dd48de243703ebc282b84dc23d5a69ac6
Reviewed-on: http://review.couchbase.org/65324
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMerge remote-tracking branch 'couchbase/sherlock' into watson 92/65392/2
Dave Rigby [Thu, 30 Jun 2016 14:30:04 +0000 (15:30 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into watson

* couchbase/sherlock:
  MB-19843: Modify the end_seqno in DCP stream request after checking for rollback

Change-Id: I6ebfed2f2046c2e6079125f7b015fc9e3ac032cd

4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 28/65328/2
Dave Rigby [Wed, 29 Jun 2016 09:17:13 +0000 (10:17 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* couchbase/3.0.x:
  MB-19359: [2] Address lock inversion with vb's state lock and snapshot lock
  MB-19359: [1] Address lock inversion with vb's state lock and snapshot lock

Change-Id: Ia068af9b26e8a4b980bf22341fa43ac5452aca60

4 years agoMB-19892: Ensure backfills are terminated when closing DcpProducer's streams 21/65021/6
Dave Rigby [Fri, 17 Jun 2016 16:35:56 +0000 (17:35 +0100)]
MB-19892: Ensure backfills are terminated when closing DcpProducer's streams

There is a memory and FD leak if a DCP Producer is closed when
backfills are still present - for example if the connection is
disconnected while backfill is still running.

The issue is that there is a circular reference between DcpProducer
and its ActiveStreams (in the `streams` map). This means that while
all /external/ references to DcpProducer are correctly reduced to
zero, the refcount is held at 1 by any ActiveStream objects, and
vice-versa.

The effect is that the DcpProducer object is never deleted, and in
turn we do not close open couchstore files the DCPBackfill tasks have
open.

Arguably the issue is that the circular reference exists; however the
simplest way to fix this issue is to:

1. Ensure that when all streams are closed
   (DcpProducer::closeAllStreams) we destroy the DcpProducer's
   backfill manager.

2. Rely on ~BackfillManager to cancel and delete all backfills, and
   also cancel the backfillManager task if still running. Note that
   previously this destructor was never called as we never destroyed
   the owning DcpProducer.

One slight sublety is the fact that the BackfillManagerTask (which
runs on a seperate background AUXIO thread) also needs to have a
pointer to the backfill manager object, and we need to ensure that
this task doesn't keep the backfill manager alive after the
DcpProducer has reset() its pointer. We solve this by making
BackfillManagerTask have a /weak/ pointer to the backfillmanager. For
the duration of the run() method the weak_ptr is promoted to a
shared_ptr, giving the task temporary shared ownership, but as soon as
that method completes the (shared) ownership is dropped and the
backfillManager can be deleted.

The patch also adds a Unit test which opens a DCP producer, opens a
stream and when deletes the bucket (with the stream still
connected). This leaks memory (and a couchstore FD) without the fixes
present here.

Change-Id: I23750f1d1c53a56f6773970bd35fc64224165516
Reviewed-on: http://review.couchbase.org/65021
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 89/65189/2
Dave Rigby [Tue, 28 Jun 2016 15:13:14 +0000 (16:13 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* couchbase/3.0.x:
  MB-19343: Use cb_gmtime_r instead of gmtime_r
  [BP] MB-16366: Obtain vbstate readlock in numerous operations
  MB-19280: Fix data race in CouchKVStore stats access
  MB-19279: Fix race in use of gmtime()
  MB-19113: Suppress test_mb16357 when on thread sanitizer

Change-Id: Id289ae95e6fc5e03a64957cc41f53af9ee262a2a

4 years agoMerge "Merge remote-tracking branch 'couchbase/3.0.x' into sherlock" into sherlock
Dave Rigby [Tue, 28 Jun 2016 15:10:12 +0000 (15:10 +0000)]
Merge "Merge remote-tracking branch 'couchbase/3.0.x' into sherlock" into sherlock

4 years agoMB-19982: Don't hold connsLock for duration of dcp stats 11/65211/12
Jim Walker [Fri, 24 Jun 2016 12:28:34 +0000 (12:28 +0000)]
MB-19982: Don't hold connsLock for duration of dcp stats

The Mb identified a lock inversion between dcp->set_vbucket_state
and get_stats("dcp")

The get_stats path uses doDcpStats which holds connsLock whilst
all connections are visited and their stats gathered. When getting
a PassiveStream's stats the buffer.mutex is needed.

The set_vbucket_state obtains the same locks in the reverse order.
Whilst buffer.mutex is held it will try to get connsLock
(via EventuallyPersistentStore::setVBucketState calling into dcpConnMap).

The fix is to work on a copy of the "all" list so that we can do the
work without the lock.

ref-counted pointers should stop any issues where the connection
being visited is freed/dropped from another thread.

Change-Id: Iff5f7be1d78278a4b00bb07b859697cca3115299
Reviewed-on: http://review.couchbase.org/65211
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19982: Fix potential deadlock between DcpConsumer::bufMutex & connsLock 97/65297/3
Dave Rigby [Tue, 28 Jun 2016 11:27:25 +0000 (11:27 +0000)]
MB-19982: Fix potential deadlock between DcpConsumer::bufMutex & connsLock

As identified by ThreadSanitizer (see below). The issue is that the
DcpConsumer::bufMutex and connsLock mutexes are acquired in different
orders:

A) As part of processing incoming DCP messages
   (PassiveStream::processBufferedMessages) we acquire (1) bufMutex,
   then (2) acquire connsLock as part of DcpConnMap::vbucketStateChanged.

B) When disconnecting a connection from the DcpConnMap, we acquire (1)
   connsLock to locate the connection to be closed, and then (2)
   acquire bufMutex as part of PassiveStream::setDead.

Address this by changing (B) - update the data structures maintaining
the map of cookie -> connection (`all` and `map_`), *release
connsLock* and then call PassiveStream::setDead. Finally we re-acquire
connsLock to add the (now closed) stream to the deadConnections list.

WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=393)
  Cycle in lock order graph: M22701 (0x7d5000018140) => M969 (0x7d840001cc50) => M22701

  Mutex M969 acquired here while holding mutex M22701 in thread T10:
    #0 pthread_mutex_lock <null> (engine_testapp+0x00000047e980)
    #1 cb_mutex_enter <null> (libplatform.so.0.1.0+0x000000003960)
    #2 Mutex::acquire() ep-engine/src/mutex.cc:31 (ep.so+0x0000001e60ce)
    #3 LockHolder::lock() ep-engine/src/locks.h:71 (ep.so+0x000000080b33)
    #4 LockHolder::LockHolder(Mutex&, bool) ep-engine/src/locks.h:48 (ep.so+0x0000000807a2)
    #5 DcpConnMap::vbucketStateChanged(unsigned short, vbucket_state_t) ep-engine/src/connmap.cc:1044 (ep.so+0x00000023b3ba)
    #6 EventuallyPersistentStore::setVBucketState(unsigned short, vbucket_state_t, bool, bool) ep-engine/src/ep.cc:1057 (ep.so+0x0000000dc240)
    #7 PassiveStream::processSetVBucketState(SetVBucketState*) ep-engine/src/dcp-stream.cc:1483 (ep.so+0x0000002a3125)
    #8 PassiveStream::processBufferedMessages(unsigned int&) ep-engine/src/dcp-stream.cc:1313 (ep.so+0x0000002a0b58)
    #9 DcpConsumer::processBufferedItems() ep-engine/src/dcp-consumer.cc:608 (ep.so+0x0000002650c4)
    #10 Processer::run() ep-engine/src/dcp-consumer.cc:48 (ep.so+0x000000264cbf)
    #11 ExecutorThread::run() ep-engine/src/executorthread.cc:109 (ep.so+0x0000001e75a1)
    #12 launch_executor_thread(void*) ep-engine/src/executorthread.cc:34 (ep.so+0x0000001e6bca)
    #13 platform_thread_wrap platform/src/cb_pthreads.c (libplatform.so.0.1.0+0x00000000371c)

  Mutex M22701 previously acquired by the same thread here:
    #0 pthread_mutex_lock <null> (engine_testapp+0x00000047e980)
    #1 cb_mutex_enter <null> (libplatform.so.0.1.0+0x000000003960)
    #2 Mutex::acquire() ep-engine/src/mutex.cc:31 (ep.so+0x0000001e60ce)
    #3 LockHolder::lock() ep-engine/src/locks.h:71 (ep.so+0x000000080b33)
    #4 LockHolder::LockHolder(Mutex&, bool) ep-engine/src/locks.h:48 (ep.so+0x0000000807a2)
    #5 PassiveStream::processBufferedMessages(unsigned int&) ep-engine/src/dcp-stream.cc:1286 (ep.so+0x0000002a085d)
    #6 DcpConsumer::processBufferedItems() ep-engine/src/dcp-consumer.cc:608 (ep.so+0x0000002650c4)
    #7 Processer::run() ep-engine/src/dcp-consumer.cc:48 (ep.so+0x000000264cbf)
    #8 ExecutorThread::run() ep-engine/src/executorthread.cc:109 (ep.so+0x0000001e75a1)
    #9 launch_executor_thread(void*) ep-engine/src/executorthread.cc:34 (ep.so+0x0000001e6bca)
    #10 platform_thread_wrap platform/src/cb_pthreads.c (libplatform.so.0.1.0+0x00000000371c)

  Mutex M22701 acquired here while holding mutex M969 in main thread:
    #0 pthread_mutex_lock <null> (engine_testapp+0x00000047e980)
    #1 cb_mutex_enter <null> (libplatform.so.0.1.0+0x000000003960)
    #2 Mutex::acquire() ep-engine/src/mutex.cc:31 (ep.so+0x0000001e60ce)
    #3 LockHolder::lock() ep-engine/src/locks.h:71 (ep.so+0x000000080b33)
    #4 LockHolder::LockHolder(Mutex&, bool) ep-engine/src/locks.h:48 (ep.so+0x0000000807a2)
    #5 PassiveStream::clearBuffer() ep-engine/src/dcp-stream.cc:1573 (ep.so+0x00000029e6a1)
    #6 PassiveStream::setDead(end_stream_status_t) ep-engine/src/dcp-stream.cc:1171 (ep.so+0x00000029dffc)
    #7 DcpConsumer::closeAllStreams() ep-engine/src/dcp-consumer.cc:722 (ep.so+0x00000026fb66)
    #8 DcpConnMap::disconnect_UNLOCKED(void const*) ep-engine/src/connmap.cc:1096 (ep.so+0x00000023bbe3)
    #9 DcpConnMap::disconnect(void const*) ep-engine/src/connmap.cc:1069 (ep.so+0x00000023b664)
    #10 EventuallyPersistentEngine::handleDisconnect(void const*) ep-engine/src/ep_engine.cc:5711 (ep.so+0x0000001617ab)
    #11 EvpHandleDisconnect(void const*, ENGINE_EVENT_TYPE, void const*, void const*) ep-engine/src/ep_engine.cc:1702 (ep.so+0x00000013d415)
    #12 mock_perform_callbacks memcached/programs/engine_testapp/mock_server.c (engine_testapp+0x0000004d04db)
    #13 disconnect_mock_connection <null> (engine_testapp+0x0000004d10c6)
    #14 destroy_mock_cookie <null> (engine_testapp+0x0000004d0fa7)
    #15 test_mb19982(engine_interface*, engine_interface_v1*) ep-engine/tests/ep_testsuite.cc:12020 (ep_testsuite.so+0x0000000b1d7d)
    #16 execute_test memcached/programs/engine_testapp/engine_testapp.c (engine_testapp+0x0000004c50bf)
    #17 main crtstuff.c (engine_testapp+0x0000004c2ea8)

  Mutex M969 previously acquired by the same thread here:
    #0 pthread_mutex_lock <null> (engine_testapp+0x00000047e980)
    #1 cb_mutex_enter <null> (libplatform.so.0.1.0+0x000000003960)
    #2 Mutex::acquire() ep-engine/src/mutex.cc:31 (ep.so+0x0000001e60ce)
    #3 LockHolder::lock() ep-engine/src/locks.h:71 (ep.so+0x000000080b33)
    #4 LockHolder::LockHolder(Mutex&, bool) ep-engine/src/locks.h:48 (ep.so+0x0000000807a2)
    #5 DcpConnMap::disconnect(void const*) ep-engine/src/connmap.cc:1068 (ep.so+0x00000023b64e)
    #6 EventuallyPersistentEngine::handleDisconnect(void const*) ep-engine/src/ep_engine.cc:5711 (ep.so+0x0000001617ab)
    #7 EvpHandleDisconnect(void const*, ENGINE_EVENT_TYPE, void const*, void const*) ep-engine/src/ep_engine.cc:1702 (ep.so+0x00000013d415)
    #8 mock_perform_callbacks memcached/programs/engine_testapp/mock_server.c (engine_testapp+0x0000004d04db)
    #9 disconnect_mock_connection <null> (engine_testapp+0x0000004d10c6)
    #10 destroy_mock_cookie <null> (engine_testapp+0x0000004d0fa7)
    #11 test_mb19982(engine_interface*, engine_interface_v1*) ep-engine/tests/ep_testsuite.cc:12020 (ep_testsuite.so+0x0000000b1d7d)
    #12 execute_test memcached/programs/engine_testapp/engine_testapp.c (engine_testapp+0x0000004c50bf)
    #13 main crtstuff.c (engine_testapp+0x0000004c2ea8)

SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) ??:0 __interceptor_pthread_mutex_lock

Change-Id: Ia441d5f5898516e3526a610426fa81f5df0e35e6
Reviewed-on: http://review.couchbase.org/65297
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoMB-19719: Cleanup reading of vBucket stats when couchstore file doesn't exist. 07/65007/2
Dave Rigby [Thu, 26 May 2016 15:25:53 +0000 (16:25 +0100)]
MB-19719: Cleanup reading of vBucket stats when couchstore file doesn't exist.

This partially reverts commit 9093bad3061648184101cae992403cb468102d75
- the test improvements have been kept. It also reverts commit
06bf584672d7b1c4f6af2cb7811bad18e86b5729.

This removes the incomplete / unnecessary checks on vbucket files
existing, and simply relies on the getNumPersistedDeletes() method
throwing an exception if it fails.

Change-Id: I159e766a5e5b1963b40ef828d0762766b35845b8
Reviewed-on: http://review.couchbase.org/65007
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>

4 years agoMB-19948: Handle 18 bytes of metadata 16/65016/3
Jim Walker [Thu, 16 Jun 2016 15:57:49 +0000 (16:57 +0100)]
MB-19948: Handle 18 bytes of metadata

Correctly read metadata of various sizes.

* 16 bytes
* 18 bytes
* 19 bytes

Are all possible sizes stored in couchdb by ep-engine.

Change-Id: Iede967ba0ce45e95e38c1f6cdb47a5164ab3c5d3
Reviewed-on: http://review.couchbase.org/65016
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19948: CouchKVStore metadata tests 14/65014/3
Jim Walker [Thu, 16 Jun 2016 12:00:02 +0000 (13:00 +0100)]
MB-19948: CouchKVStore metadata tests

This commit contains some new tests to exercise the code
which assembles our metadata into couchstore.

There are upstream fixes and refactoring which will utilise
these tests for some positive vibes about maintaining correctness
as the code is changed.

Change-Id: I4facbc343133db1ba9a7bf76b8ba9834c3f69cae
Reviewed-on: http://review.couchbase.org/65014
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 87/65187/3
Dave Rigby [Thu, 23 Jun 2016 10:44:53 +0000 (11:44 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* commit 'a430629':
  MB-19278: Fix lock-order inversion on ActiveStream::streamMutex
  MB-19277: Set executorThread's waketime to atomic
  MB-19276: Fix data race on ExecutorThread::taskStart
  MB-19275: Address data race on a DCP stream's state
  MB-19273: Fix data race on PassiveStream::buffer.{bytes,items}
  MB-19260: Make cookie atomic to serialize set/get in ConnHandler
  MB-19259: Fix data race on DcpConsumer::backoffs
  MB-19258: Address data race with replicationThrottle parameters
  MB-19281: [BP] Add template class RelaxedAtomic<>
  MB-19257: Fix data race on ExecutorThread::now
  MB-19256: Address possible data race on VBCBAdaptor::currentvb

Further merge of mostly TSan fixes from 3.0.x into sherlock.

Change-Id: Ic88c446c4e09d669f7a4da7f8cb2f97c13d70ab7

4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 08/65008/1
Dave Rigby [Fri, 17 Jun 2016 11:41:12 +0000 (12:41 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* couchbase/3.0.x:
  MB-19253: Fix race in void ExecutorPool::doWorkerStat
  MB-19252: Fix data race on Stream::readyQueueMemory
  MB-19251: Fix race in updating Vbucket.file{SpaceUsed,Size}
  MB-19249: Address possible data races in ConnHandler context
  MB-19248: Fix race in TaskQueue.{ready,future,pending}Queue access
  MB-19247: Fix possible data race in workload.h: workloadPattern
  MB-19246: Fix potentially incorrect persist_time in OBSERVE response
  MB-19229: Address possible data race in vbucket.cc: numHpChks
  MB-19228: Address possible data races in ActiveStream context
  MB-19227: Fix race in ConnNotifier.task access

Change-Id: I184b86cd800e406b5be96ec5f7c456e73f54b05c

4 years agoMerge "Merge remote-tracking branch 'couchbase/3.0.x' into sherlock" into sherlock
Dave Rigby [Fri, 17 Jun 2016 10:20:42 +0000 (10:20 +0000)]
Merge "Merge remote-tracking branch 'couchbase/3.0.x' into sherlock" into sherlock

4 years agoMerge "Merge remote-tracking branch 'couchbase/3.0.x' into sherlock" into sherlock
Dave Rigby [Fri, 17 Jun 2016 08:32:22 +0000 (08:32 +0000)]
Merge "Merge remote-tracking branch 'couchbase/3.0.x' into sherlock" into sherlock

4 years agoMerge "Merge remote-tracking branch 'couchbase/3.0.x' into sherlock" into sherlock
Dave Rigby [Fri, 17 Jun 2016 08:29:58 +0000 (08:29 +0000)]
Merge "Merge remote-tracking branch 'couchbase/3.0.x' into sherlock" into sherlock

4 years agoMB-14859: Handle quick successive BG Fetch of a key interleaved with exp pager 29/64929/2
Manu Dhundi [Tue, 23 Jun 2015 20:38:28 +0000 (13:38 -0700)]
MB-14859: Handle quick successive BG Fetch of a key interleaved with exp pager

If two bgfetch are scheduled for a non existing key, and one bgfetch completes
and marks the key as non existant in the hash table, and subsequently expiry
pager removes it from the hash table before the second bgfetch completes,
we need to handle the case appropriately in the complete bgfetch code as
notify the memcached with appropriate return value.

(cherry picked from commit f9402cb0ee6a3592413e43855b0a48b7c0202a5b)

Change-Id: I8eaf54319014ea4039c74d2cbfab21ef275939fe
Reviewed-on: http://review.couchbase.org/64929
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>

4 years agoMB-19897: Fix the data race on lastSendTime between stats and dcp worker threads 59/64959/2
Manu Dhundi [Wed, 15 Jun 2016 17:08:50 +0000 (10:08 -0700)]
MB-19897: Fix the data race on lastSendTime between stats and dcp worker threads

Fix the thread sanitizer warning
http://cv.jenkins.couchbase.com/job/ep-engine-threadsanitizer-3.0.x/258/console

WARNING: ThreadSanitizer: data race (pid=102290)
  Read of size 4 at 0x7d580000f71c by thread T14 (mutexes: write M969):
    #0 void ConnHandler::addStat<unsigned int>(char const*, unsigned int const&, void (*)(char const*, unsigned short, char const*, unsigned int, void const*), void const*) const /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-3.0.x/ep-engine/src/tapconnection.h:294 (ep.so+0x00000020e9f3)
    #1 DcpProducer::addStats(void (*)(char const*, unsigned short, char const*, unsigned int, void const*), void const*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-3.0.x/ep-engine/src/dcp-producer.cc:557 (ep.so+0x00000027fbe5)
    #2 ConnStatBuilder::operator()(SingleThreadedRCPtr<ConnHandler>&) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-3.0.x/ep-engine/src/ep_engine.cc:3701 (ep.so+0x000000183c44)
    #3 ConnStatBuilder std::for_each<std::_List_iterator<SingleThreadedRCPtr<ConnHandler> >, ConnStatBuilder>(std::_List_iterator<SingleThreadedRCPtr<ConnHandler> >, std::_List_iterator<SingleThreadedRCPtr<ConnHandler> >, ConnStatBuilder) /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/stl_algo.h:3755 (ep.so+0x0000001838a5)
    #4 void ConnMap::each_UNLOCKED<ConnStatBuilder>(ConnStatBuilder) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-3.0.x/ep-engine/src/connmap.h:148 (ep.so+0x000000183808)
    #5 void ConnMap::each<ConnStatBuilder>(ConnStatBuilder) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-3.0.x/ep-engine/src/connmap.h:140 (ep.so+0x00000017732e)
    #6 EventuallyPersistentEngine::doDcpStats(void const*, void (*)(char const*, unsigned short, char const*, unsigned int, void const*)) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-3.0.x/ep-engine/src/ep_engine.cc:3954 (ep.so+0x00000014c5ed)

  Previous write of size 4 at 0x7d580000f71c by main thread:
    #0 DcpProducer::maybeSendNoop(dcp_message_producers*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-3.0.x/ep-engine/src/dcp-producer.cc:740 (ep.so+0x00000027d8ce)
    #1 DcpProducer::step(dcp_message_producers*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-3.0.x/ep-engine/src/dcp-producer.cc:323 (ep.so+0x00000027c920)
    #2 EvpDcpStep(engine_interface*, void const*, dcp_message_producers*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-3.0.x/ep-engine/src/ep_engine.cc:1404 (ep.so+0x000000138baa)

Change-Id: I2a2b0b0f01b10ecb31701bfc2330881bbafc6b74
Reviewed-on: http://review.couchbase.org/64959
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-16337: Fix for intermittent test_access_scanner failure 71/64971/3
Manu Dhundi [Thu, 16 Jun 2016 17:20:08 +0000 (10:20 -0700)]
MB-16337: Fix for intermittent test_access_scanner failure

In the test_access_scanner test we explicitly invoke the access scanner
task from the test code. However, while the task is being invoked, if
the task was already running at its default expected run time, the
explicit request to run the task may get ignored due to race in
updating the snooze time for the access scanner task.

The fix makes sure that the default run time of the access scanner is
not near the current run time.

Note: Access scanner task is first initiated at 'alog_task_time' and
then at intervals of 'alog_sleep_time'.

Change-Id: I8b2fc537e9532049066bc31fda69dee0e2b15917
Reviewed-on: http://review.couchbase.org/64971
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 81/64981/1
Dave Rigby [Thu, 16 Jun 2016 16:59:54 +0000 (17:59 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* couchbase/3.0.x:
  MB-19226: Address potential data races in the warmup code
  MB-19225: Fix data race on Flusher::taskId
  MB-19225: Fix race in Flusher._state
  MB-19224: Address possible data race with global task's waketime

Change-Id: Idc461799bc50bd1274f3ffafab4b3257a024327b

4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 80/64980/2
Dave Rigby [Thu, 16 Jun 2016 16:01:14 +0000 (17:01 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* couchbase/3.0.x:
  MB-19223: Switch to hrtime from timeval in Global Thread Pool
  MB-19222: Fix race condition in TaskQueue shutdown
  MB-19220: Ensure HashTable::size is atomic

Change-Id: I6e36e57c8394bb0eb147c697a81d0cbeeae423f7

4 years agoMerge "Merge remote-traking branch 'couchbase/sherlock' into 'couchbase/watson'"...
Dave Rigby [Thu, 16 Jun 2016 15:01:43 +0000 (15:01 +0000)]
Merge "Merge remote-traking branch 'couchbase/sherlock' into 'couchbase/watson'" into watson

4 years agoMerge "Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'...
Dave Rigby [Thu, 16 Jun 2016 15:01:31 +0000 (15:01 +0000)]
Merge "Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'" into watson

4 years agoMerge "Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'...
Dave Rigby [Thu, 16 Jun 2016 14:57:43 +0000 (14:57 +0000)]
Merge "Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'" into watson

4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 77/64977/1
Dave Rigby [Thu, 16 Jun 2016 08:42:22 +0000 (09:42 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* couchbase/3.0.x:
  MB-19204: ep_testsuite: Don't release the item while we're using it
  MB-19204: Address data race in ep_test_apis/testsuite
  MB-19204: ep_testsuite: Use std::string for last_key/body
  MB-19204: Remove alarm() call from atomic_ptr_test, reduce iteration count
  MB-19204: hash_table_test: Fix TSan issues

Start of merge of 3.1.5+ changes into sherlock, broken into multiple
merges due to the size.

Change-Id: I65530d3c81d6b5e8b0171d0e3e1da3e14e0bb308

4 years agoMerge remote-traking branch 'couchbase/sherlock' into 'couchbase/watson' 33/64933/4
Jim Walker [Wed, 15 Jun 2016 13:43:23 +0000 (14:43 +0100)]
Merge remote-traking branch 'couchbase/sherlock' into 'couchbase/watson'

* commit 'b4007da6ceca5b2bb0902609d6e9c36f1f32e2a3':
  MB-19897: Only update sendTime if successfully send noop
  MB-19897: Record time for all DCP consumer messages

Change-Id: Idfa70391e59c6eede96ad0fce8ca312b2fbdd566

4 years agoMerge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson' 03/64603/12
Jim Walker [Wed, 15 Jun 2016 08:27:39 +0000 (09:27 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'

* commit '7c65b732c0099c0ad84e7d70506625e694051495':
  MB-18452: Force DcpConsumer processor task to yield
  MB-19678: Merge backfill and in-memory snapshots correctly on replica vb

Change-Id: Ifce5a18fc807285471b08e9737cedb5db2b7923f

4 years agoMerge tag 'v3.1.5' into sherlock 55/64955/1
Dave Rigby [Wed, 15 Jun 2016 11:14:34 +0000 (12:14 +0100)]
Merge tag 'v3.1.5' into sherlock

3.1.5 release (ep-engine)

* tag 'v3.1.5':
  MB-16656: Send snapshotEnd as highSeqno for replica vb in GET_ALL_VB_SEQNOS call
  MB-19153: Break circular dependency while deleting bucket
  MB-19113: Address false positive lock inversion seen with test_mb16357

Change-Id: I2e7cd72f09c8b2b3780568ed7f7ca81fde064cb9

4 years agoMerge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson' 31/64931/2
Jim Walker [Tue, 14 Jun 2016 10:33:06 +0000 (11:33 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'

* commit 'bac952b2a5cb0e0ed3f4d1f4c8c7cea561a0c8db':
  MB-19690: Fix compile warning introduced by fix for 16656
  MB-19635: Initialise failovers correctly from 2.5.x vbstate

Change-Id: I6dda4d998ebc7aab5a68ac34705eb14090837afa

4 years agoMB-19843: Modify the end_seqno in DCP stream request after checking for rollback 06/64906/4
Manu Dhundi [Tue, 14 Jun 2016 18:05:04 +0000 (11:05 -0700)]
MB-19843: Modify the end_seqno in DCP stream request after checking for rollback

During a DCP stream request, we will update the end seqno when flags
DCP_ADD_STREAM_FLAG_LATEST/DCP_ADD_STREAM_FLAG_DISKONLY are used.
Currently in some cases when a rollback is required, the end_seqno could become
less than start_seqno before we check if a rollback is needed, resulting in
rejection of stream request.

Hence we should modify the end_seqno (if required as per the flags) only after
checking if a rollback is needed.

Change-Id: I23b112c16b9167023a990a5709ae6aae4838472e
Reviewed-on: http://review.couchbase.org/64906
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
4 years agoMB-19815: Expand 19695 regression test to cover DCP 21/64721/3
Dave Rigby [Thu, 2 Jun 2016 09:55:42 +0000 (10:55 +0100)]
MB-19815: Expand 19695 regression test to cover DCP

Expand the existing test MB19695_doTapVbTakeoverStats to also test
doDcpVbTakeoverStats.

Change-Id: I7e5ac417c4ba4a32cd6abbe669ab1e0e0aa32d21
Reviewed-on: http://review.couchbase.org/64721
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Tested-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19886: Fix data race on ActiveStream::curChkSeqno by making it atomic 62/64862/4
Manu Dhundi [Fri, 10 Jun 2016 01:07:25 +0000 (18:07 -0700)]
MB-19886: Fix data race on ActiveStream::curChkSeqno by making it atomic

Fix the data race
http://cv.jenkins.couchbase.com/job/ep-engine-threadsanitizer-3.0.x/266/consoleFull

WARNING: ThreadSanitizer: data race (pid=109115)
   Write of size 8 at 0x7d480000b088 by thread T16:
     #0 ActiveStream::processItems(std::deque<SingleThreadedRCPtr<Item>, std::allocator<SingleThreadedRCPtr<Item> > >&) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-3.0.x/ep-engine/src/dcp-stream.cc:760 (ep.so+0x000000297e65)
     #1 ActiveStream::nextCheckpointItemTask() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-3.0.x/ep-engine/src/dcp-stream.cc:724 (ep.so+0x0000002976ab)
     #2 ActiveStreamCheckpointProcessorTask::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-3.0.x/ep-engine/src/dcp-stream.cc:679 (ep.so+0x0000002973ad)
     #3 ExecutorThread::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-3.0.x/ep-engine/src/executorthread.cc:109 (ep.so+0x0000001e3fe1)
     #4 launch_executor_thread(void*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-3.0.x/ep-engine/src/executorthread.cc:34 (ep.so+0x0000001e360a)
     #5 platform_thread_wrap /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-3.0.x/platform/src/cb_pthreads.c (libplatform.so.0.1.0+0x00000000377c)

   Previous read of size 8 at 0x7d480000b088 by main thread:
     [failed to restore the stack]

SUMMARY: ThreadSanitizer: data race /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-3.0.x/ep-engine/src/dcp-stream.cc:760 ActiveStream::processItems(std::deque<SingleThreadedRCPtr<Item>, std::allocator<SingleThreadedRCPtr<Item> > >&)
Change-Id: I7fa5dd9110342ca836b6b0b0f203dd8b063cf20d
Reviewed-on: http://review.couchbase.org/64862
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19897: Only update sendTime if successfully send noop 78/64878/4
Daniel Owen [Thu, 21 Apr 2016 09:51:48 +0000 (10:51 +0100)]
MB-19897: Only update sendTime if successfully send noop

In the maybeSendNoop function when a DCP producer attempts
to send a noop to a consumer it can receive back
ENGINE_SUCCESS or ENGINE_E2BIG.

We should only set pendingRecv to true and update the
last sendTime if ENGINE_SUCCESS is returned.

Change-Id: Ice8a66dcae35505d7bab7d261f080d5ffb95c8e3
Reviewed-on: http://review.couchbase.org/64878
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19886: In markDiskSnapshot() get current vb snapshot info outside streamMutex 40/64840/4
Manu Dhundi [Thu, 9 Jun 2016 20:22:45 +0000 (13:22 -0700)]
MB-19886: In markDiskSnapshot() get current vb snapshot info outside streamMutex

We need this to overcome the lock inversion detected in
http://cv.jenkins.couchbase.com/job/ep-engine-threadsanitizer-3.0.x/263/console.

Explaining the lock inversion:
(1) Backfill thread sending disk snapshot:
    streamMutex (class Stream) ==> snapshotMutex (class VBucket)

(2) Front End thread receiving DCP mutation from active vb:
    snapshotMutex (class VBucket) ==> stateLock (class VBucket) ==>
                                              streamsMutex (class DcpProducer)

(3) Another front end thread disconnecting the view engine connection:
    streamsMutex (class DcpProducer) ==> streamMutex (class Stream)

Solution:
Break streamMutex (class Stream) ==> snapshotMutex (class VBucket).
This is done by making snapshot variables atomic and it should be good
as the backfill thread needs only snapshot_end.

Change-Id: Id1cff42dfe39151d9a19c826d7e47e23b3fc4d21
Reviewed-on: http://review.couchbase.org/64840
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>

4 years agoMB-19897: Record time for all DCP consumer messages 79/64879/3
Daniel Owen [Mon, 25 Apr 2016 13:06:41 +0000 (14:06 +0100)]
MB-19897: Record time for all DCP consumer messages

The DCP documentation states that the consumer should see
some sort of message or a No-Op message in a period
equal to twice the noop interval otherwise it should close
its connection.  See documentation/commands/no-op.md in
https://github.com/couchbaselabs/dcp-documentation

This patch changes from checking only the receival of a
no-op message to check for recieving the following messages
- add stream
- close stream
- deletion
- expiration
- flush
- mutation
- set VBucket state
- snapshot Marker
- stream end

Change-Id: Ib2268dba339cbf3701f3c7782ee8256bddc79ba3
Reviewed-on: http://review.couchbase.org/64879
Tested-by: buildbot <build@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-18452: Force DcpConsumer processor task to yield 18/64718/4
Jim Walker [Fri, 3 Jun 2016 10:49:55 +0000 (11:49 +0100)]
MB-18452: Force DcpConsumer processor task to yield

Introduce two config tunable values that limit the DCP processor from
running 'forever'.

* dcp_consumer_process_buffered_messages_yield_limit
* dcp_consumer_process_buffered_messages_batch_size

The yield parameter forces the NONIO task to yield when the
limit is reached.

Change-Id: Ifce5a18fc807285471b08e9737cedb5db2b7923f
Reviewed-on: http://review.couchbase.org/64718
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19843: Modify the end_seqno in DCP stream request after checking for rollback 96/64796/6
Manu Dhundi [Wed, 8 Jun 2016 19:41:12 +0000 (12:41 -0700)]
MB-19843: Modify the end_seqno in DCP stream request after checking for rollback

During a DCP stream request, we will update the end seqno when flags
DCP_ADD_STREAM_FLAG_LATEST/DCP_ADD_STREAM_FLAG_DISKONLY are used.
Currently in some cases when a rollback is required, the end_seqno could become
less than start_seqno before we check if a rollback is needed, resulting in
rejection of stream request.

Hence we should modify the end_seqno (if required as per the flags) only after
checking if a rollback is needed.

Change-Id: I23b112c16b9167023a990a5709ae6aae4838472e
Reviewed-on: http://review.couchbase.org/64796
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19732: Fix the data race on lastSendTime between stats and dcp worker threads 03/64803/3
Manu Dhundi [Wed, 8 Jun 2016 17:38:21 +0000 (10:38 -0700)]
MB-19732: Fix the data race on lastSendTime between stats and dcp worker threads

Fix the thread sanitizer warning
http://cv.jenkins.couchbase.com/job/ep-engine-threadsanitizer-3.0.x/258/console

Change-Id: I2a2b0b0f01b10ecb31701bfc2330881bbafc6b74
Reviewed-on: http://review.couchbase.org/64803
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19849: Add thread type to `scheduler` and `runtime` stats 58/64758/4
Dave Rigby [Mon, 6 Jun 2016 15:36:10 +0000 (16:36 +0100)]
MB-19849: Add thread type to `scheduler` and `runtime` stats

Example output:

$ cbstats localhost:12000 -b default scheduler
 flusher_tasks[WRITE] (44 total)
    16us - 32us   : (  2.27%)  1 #
    32us - 64us   : (  4.55%)  1 #
    64us - 128us  : ( 11.36%)  3 ###
    256us - 512us : ( 27.27%)  7 #######
    2ms - 4ms     : ( 45.45%)  8 ########
    4ms - 8ms     : (100.00%) 24 #########################
    Avg           : (    2ms)
 workload_monitor_tasks[NONIO] (8 total)
    128us - 256us : ( 12.50%) 1 #####
    256us - 512us : ( 50.00%) 3 #################
    2ms - 4ms     : ( 87.50%) 3 #################
    4ms - 8ms     : (100.00%) 1 #####
    Avg           : (    1ms)
 warmup_tasks[READ] (15 total)
    16us - 32us  : ( 20.00%) 3 #########
    32us - 64us  : ( 80.00%) 9 ############################
    64us - 128us : (100.00%) 3 #########
    Avg          : (   35us)

Change-Id: I36c76ca7f1cf9fb5076e9b4c69a27bbb44a24f97
Reviewed-on: http://review.couchbase.org/64758
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Patrick Varley <patrick@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19732: Record time for all DCP consumer messages 27/64727/7
Daniel Owen [Mon, 25 Apr 2016 13:06:41 +0000 (14:06 +0100)]
MB-19732: Record time for all DCP consumer messages

The DCP documentation states that the consumer should see
some sort of message or a No-Op message in a period
equal to twice the noop interval otherwise it should close
its connection.  See documentation/commands/no-op.md in
https://github.com/couchbaselabs/dcp-documentation

This patch changes from checking only the receival of a
no-op message to check for recieving the following messages
- add stream
- close stream
- deletion
- expiration
- flush
- mutation
- set VBucket state
- snapshot Marker
- stream end

Change-Id: Ib2268dba339cbf3701f3c7782ee8256bddc79ba3
Reviewed-on: http://review.couchbase.org/64727
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19732: Only update sendTime if successfully send noop 24/64724/8
Daniel Owen [Thu, 21 Apr 2016 09:51:48 +0000 (10:51 +0100)]
MB-19732: Only update sendTime if successfully send noop

In the maybeSendNoop function when a DCP producer attempts
to send a noop to a consumer it can receive back
ENGINE_SUCCESS or ENGINE_E2BIG.

We should only set pendingRecv to true and update the
last sendTime if ENGINE_SUCCESS is returned.

Change-Id: Ice8a66dcae35505d7bab7d261f080d5ffb95c8e3
Reviewed-on: http://review.couchbase.org/64724
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19813: [Coverity Scan Warning] Assignment used instead of comparison operator 60/64760/3
Manu Dhundi [Mon, 6 Jun 2016 18:40:02 +0000 (11:40 -0700)]
MB-19813: [Coverity Scan Warning] Assignment used instead of comparison operator

CID 60299:  Incorrect expression  (PW.ASSIGN_WHERE_COMPARE_MEANT)
use of "=" where "==" may have been intended
cb_assert(dcp_last_op = PROTOCOL_BINARY_CMD_DCP_CONTROL);

Change-Id: I4f9e3a32ca04aafd26551cd86aafbb59c15a2b97
Reviewed-on: http://review.couchbase.org/64760
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19836: Correctly set PendingOpsNotifcation task priority 14/64714/2
Daniel Owen [Fri, 3 Jun 2016 10:12:17 +0000 (11:12 +0100)]
MB-19836:  Correctly set PendingOpsNotifcation task priority

The PendingsOpsNotification task currently has the priority
VBMemoryDeletionPriority (set to 6).  It should have
the priority PendingOpsPriority (set to 0).

Change-Id: I488d8eae7347eb65fe0f8474ae60e939234b972b
Reviewed-on: http://review.couchbase.org/64714
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-18452: Single threaded test harness improvements 90/64690/4
Jim Walker [Thu, 2 Jun 2016 15:05:50 +0000 (16:05 +0100)]
MB-18452: Single threaded test harness improvements

Refactor parts of the very new evp_store_single_threaded_test so that
it's simpler to drive tasks making new tests easier to write.

The main change is to provide helper methods for running any task from
a queue (with some checks) and a way to push a clean shutdown.

Change-Id: I19782dcacb36048151bc073a377f28603ff83033
Reviewed-on: http://review.couchbase.org/64690
Reviewed-by: Dave Rigby <daver@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-9897: Handle slow stream by dropping cursor and switching to backfill state 75/64375/12
Manu Dhundi [Fri, 3 Jun 2016 22:16:35 +0000 (15:16 -0700)]
MB-9897: Handle slow stream by dropping cursor and switching to backfill state

There can be slow DCP streams which can hold cursors on the checkpoints
causing the memory usage to shoot up. This can also result in deadlocks.

Initially cursor dropping was implemented by closing and re-opening the
slow streams. The re-opening of slow streams caused problems because
ns-server also tried to re-open the closed streams.

This approach tries to solve the problem by switching to
backfilling state from in-memory state when we see the memory usage
in checkpoints going high due to slow streams.

This switch from in-memory to backfill state does not interfere with
a snapshot that is being sent to the client. The change in state
happens only after all the items in a current snapshot is sent.

Hence clients work with the existing DCP protocol without any change.

Change-Id: If4c128df60fc0249cadf08158a04911120de4c99
Reviewed-on: http://review.couchbase.org/64375
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
4 years agoMB-19813: [Refactor Test Code] Common func for min items sanity check 31/64431/8
Manu Dhundi [Fri, 3 Jun 2016 18:15:38 +0000 (11:15 -0700)]
MB-19813: [Refactor Test Code] Common func for min items sanity check

Change-Id: I46e4646af1188637bd5189a911213da14ae18647
Reviewed-on: http://review.couchbase.org/64431
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19813: Increase memory quota for a dgm test 01/64501/4
Manu Dhundi [Wed, 1 Jun 2016 18:32:58 +0000 (11:32 -0700)]
MB-19813: Increase memory quota for a dgm test

Test 'test_dcp_producer_stream_backfill_no_value' hits 80% resident
ratio and expects atleast 1000 items to be present then. There were
couple of test failures spotted because we did not have 1000 items
when 80% resident ratio was reached. Instead we had only 900 items.

Fixed by increasing mem quota by 25%.

Change-Id: I25b8a03e2e82d9fdc556e726e647579a94ea6fd0
Reviewed-on: http://review.couchbase.org/64501
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMerge remote-tracking branch 'couchbase/4.5.0' into watson 19/64719/1
Dave Rigby [Fri, 3 Jun 2016 11:49:24 +0000 (12:49 +0100)]
Merge remote-tracking branch 'couchbase/4.5.0' into watson

* couchbase/4.5.0:
  MB-19815: doDcpVbTakeoverStats, addTakeoverStats: 0 deleted items on exception
  MB-19605: Add more tests for stats

Change-Id: Id0043615c6b2ab3d74122ddf7002e2989de3bd3a