4 years agoMB-21320: Notify memcached to visit readyQ and get any ready items 71/69171/7
Manu Dhundi [Mon, 31 Oct 2016 16:53:18 +0000 (09:53 -0700)]
MB-21320: Notify memcached to visit readyQ and get any ready items

We should notify memcached to visit readyQ and get any items that were
pushed there during stream creation. Also, we must notify the memcached
about cursor dropping so that it can visit ep-engine and stream any
pending items and do a subsequent stream state change.

This is not a functional fix. It improves performance however.
It is not absolutely necessary to notify immediately as conn manager
will notify eventually.

Change-Id: Id06fc450a20f6d0258fa7c687520dff5f4899a28
Reviewed-on: http://review.couchbase.org/69171
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20822: Print correct value for vb_%d:num_erroneous_entries_erased 32/69132/3
Dave Rigby [Mon, 24 Oct 2016 09:21:30 +0000 (10:21 +0100)]
MB-20822: Print correct value for vb_%d:num_erroneous_entries_erased

'num_erroneous_entries_erased' was incorrectly printing the total size
of the failover table, and not the number of elements which have been

Change-Id: I7bbd75ca4db962cba257a1883ce3635b475287c7
Reviewed-on: http://review.couchbase.org/69132
Reviewed-by: Jim Walker <jim@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-21190: update log message for set max_cas 94/68994/5
Jim Walker [Thu, 20 Oct 2016 08:03:40 +0000 (09:03 +0100)]
MB-21190: update log message for set max_cas

There's a future desire to print vbucket's as "vb:"

Change-Id: I5ac58daa01a1dd2ed5a631558bf735c9a81ed821
Reviewed-on: http://review.couchbase.org/68994
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20725: Update drift accounting stats 28/69328/5
Jim Walker [Mon, 31 Oct 2016 12:55:23 +0000 (12:55 +0000)]
MB-20725: Update drift accounting stats

The bucket now reports a total of drift for all VBs which will
allow better tracking of drift (e.g. UI graphing).

Change-Id: I5600df323b32f3f1d11feb5be2cf8ddac5988be9
Reviewed-on: http://review.couchbase.org/69328
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20759: Fix false-positive race on DcpConnMap::numActiveSnoozingBackfills 35/69235/4
Dave Rigby [Wed, 26 Oct 2016 10:12:08 +0000 (11:12 +0100)]
MB-20759: Fix false-positive race on DcpConnMap::numActiveSnoozingBackfills

The existing code was safe, however ThreadSanitizer doesn't know about
our own Spinlocks. Given this shouldn't be a hot path, switch to
normal std:mutex.

ThreadSanitizer report:

WARNING: ThreadSanitizer: data race (pid=23569)
  Read of size 2 at 0x7d840000eca2 by thread T8 (mutexes: write M294, read M27095, write M66205, write M102676, write M95235):
    #0 DcpConnMap::canAddBackfillToActiveQ() ep-engine/src/connmap.cc:1308 (ep.so+0x000000045ac5)
    #1 BackfillManager::schedule() ep-engine/src/dcp/backfill-manager.cc:142 (ep.so+0x00000005b0eb)
    #2 DcpProducer::scheduleBackfillManager() ep-engine/src/dcp/producer.cc:702 (ep.so+0x000000078fe3)
    #3 ActiveStream::scheduleBackfill_UNLOCKED(bool) ep-engine/src/dcp/stream.cc:1016 (ep.so+0x00000008f280)
    #4 ActiveStream::transitionState(stream_state_t) ep-engine/src/dcp/stream.cc:1145 (ep.so+0x000000090589)
    #5 ActiveStream::setActive() ep-engine/src/dcp/stream.h:204 (ep.so+0x00000009958e)
    #6 DcpProducer::streamRequest() ep-engine/src/dcp/producer.cc:327 (ep.so+0x00000007f85d)
    #7 EvpDcpStreamReq ep-engine/src/ep_engine.cc:1573 (ep.so+0x0000000cea78)
    #8 dcp_stream_req_executor memcached/daemon/mcbp_executors.cc:2272 (memcached+0x00000045925c)
    #9 process_bin_packet memcached/daemon/mcbp_executors.cc:4650 (memcached+0x00000046481d)
    #10 mcbp_complete_nread(McbpConnection*) memcached/daemon/mcbp_executors.cc:4759 (memcached+0x00000046481d)
    #11 conn_nread(McbpConnection*) memcached/daemon/statemachine_mcbp.cc:314 (memcached+0x000000472678)
    #12 McbpStateMachine::execute(McbpConnection&) memcached/daemon/statemachine_mcbp.h:43 (memcached+0x000000447054)
    #13 McbpConnection::runStateMachinery() memcached/daemon/connection_mcbp.cc:1003 (memcached+0x000000447054)
    #14 McbpConnection::runEventLoop(short) memcached/daemon/connection_mcbp.cc:1274 (memcached+0x0000004470dd)
    #15 run_event_loop memcached/daemon/connections.cc:147 (memcached+0x00000044b9e9)
    #16 event_handler(int, short, void*) memcached/daemon/memcached.cc:851 (memcached+0x00000041466c)
    #17 event_persist_closure src/libevent/event.c:1319 (libevent_core-2.0.so.5+0x00000000b6b7)
    #18 event_process_active_single_queue src/libevent/event.c:1363 (libevent_core-2.0.so.5+0x00000000b6b7)
    #19 event_process_active src/libevent/event.c:1438 (libevent_core-2.0.so.5+0x00000000b6b7)
    #20 event_base_loop src/libevent/event.c:1639 (libevent_core-2.0.so.5+0x00000000b6b7)
    #21 CouchbaseThread::run() platform/src/cb_pthreads.cc:54 (libplatform.so.0.1.0+0x0000000057a5)
    #22 platform_thread_wrap platform/src/cb_pthreads.cc:66 (libplatform.so.0.1.0+0x0000000057a5)

  Previous write of size 2 at 0x7d840000eca2 by thread T55:
    #0 DcpConnMap::decrNumActiveSnoozingBackfills() ep-engine/src/connmap.cc:1319 (ep.so+0x000000045b7b)
    #1 BackfillManager::backfill() ep-engine/src/dcp/backfill-manager.cc:273 (ep.so+0x00000005a783)
    #2 BackfillManagerTask::run() ep-engine/src/dcp/backfill-manager.cc:62 (ep.so+0x00000005ac1c)
    #3 ExecutorThread::run() ep-engine/src/executorthread.cc:115 (ep.so+0x000000108d96)
    #4 launch_executor_thread ep-engine/src/executorthread.cc:33 (ep.so+0x000000109675)
    #5 CouchbaseThread::run() platform/src/cb_pthreads.cc:54 (libplatform.so.0.1.0+0x0000000057a5)
    #6 platform_thread_wrap platform/src/cb_pthreads.cc:66 (libplatform.so.0.1.0+0x0000000057a5)

Change-Id: I88df57b268c1e615b7c5d2b7caf5f038a53692ba
Reviewed-on: http://review.couchbase.org/69235
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19672: Fix intermittent failure of 'test producer stream request (partial)' 33/69233/5
Dave Rigby [Wed, 26 Oct 2016 09:48:33 +0000 (10:48 +0100)]
MB-19672: Fix intermittent failure of 'test producer stream request (partial)'

This test was making incorrect assumptions about how / when
checkpoints would be created - it wasn't taking the checkpoint period
(how often we forcefully create new checkpoints) into account in it's
assuptions of checkpoint layout. As such it would occasionally fail if
it took longer than expected to run.

Fix this by essentially disabling chk_period (setting it to an
arbitrarily large value). At the same time make the tests expectations
explicit and check them where possible.

Change-Id: I9944f19d41a0a33064c9c43c8673c7ef9c4a3ab9
Reviewed-on: http://review.couchbase.org/69233
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-21109: Fix intermittent failure in 'temp item deletion' 37/69237/4
Dave Rigby [Wed, 26 Oct 2016 12:05:03 +0000 (13:05 +0100)]
MB-21109: Fix intermittent failure in 'temp item deletion'

Test suffers from intermittent failures due to it not accounting for
background fetch of deleted items. The issue is that the read of
curr_temp_items races with the background fetcher - if the background
fetcher has not completed then it will have a value of 1 (expected),
however if the BG fetcher completes before the stat is read then 0
will be returned (as the temp item has been cleaned up).

To address this, disable the reader threads before triggering the
background fetch. This requires that we manually handle the
EWOULDBLOCK that get_meta returns (otherwise we'll hang forever
waiting for the BG fetch task to run).

Change-Id: If1bca36a6517909259b4e3767fd58a7ff99944c3
Reviewed-on: http://review.couchbase.org/69237
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852 [18/N]: Remove now dead VBucket persist Tasks 51/69151/9
Dave Rigby [Mon, 24 Oct 2016 11:46:01 +0000 (12:46 +0100)]
MB-20852 [18/N]: Remove now dead VBucket persist Tasks

Delete all tasks (apart from Flusher) which used to persist VBucket
state to disk - the Flusher Task now handles all VBucket state

Also remove statistics which are no longer valid / relevent -
snapshotVbucketHisto / persistVBStateHisto - we no longer snapshot
separately from flushing.

Change-Id: Ia1b3f30f0f792a8ca739d377443f909138dcaa86
Reviewed-on: http://review.couchbase.org/69151
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852 [17/N]: Serialize VB state changes 50/69150/11
Dave Rigby [Wed, 21 Sep 2016 15:03:47 +0000 (16:03 +0100)]
MB-20852 [17/N]: Serialize VB state changes


MB-20852 exposed an issue with how VBucket state was persisted to disk
- specifically that state can be persisted to disk out of order, and
intermediate state changes could be dropped (not persisted at all).

These problems are caused by the asynchronous tasks which are
responsible for persisting state to disk -
VBStatePersistTask{Low,High}. There are essentially two interrelated

1. We allow multiple tasks which persist VBState to exist concurrently
   (VBStatePersistTask{Low,High}, Flusher,
   VBSnapshotTask{Low,High}. Furthermore, multiple instances of the
   same task (e.g. VBStatePersistTask) can also exist concurrenlty.

2. We do not maintain a queue of states to persist, instead we just
   mark that a given vbid's state is pending.

(1) Can occur when a VBPersistTask has started to run on a BG writer
    thread (and has cleared the schedule_vbstate_persist flag), but
    then another scheduleVBStatePersist() call is made. As the
    schedule_vbstate_persist flag is clear, this second task is
    allowed to be created. There is then no guarantee which task will
    complete first (they could be scheduled to different OS writer
    threads, the first thread could be descheduled and the second one
    then runs and completes first).

    We could attempt to solve this by changing when
    schedule_vbstate_persist is cleared (say move it later in the
    persistVBState() function), but then the inverse problem is
    exposed - we may fail to schedule a second (different) state to be
    persisted if the current task is just finishing up (and will exit
    without persisting the now-outstanding work).

(2) Presents a subtle problem relating to when the state of a VBucket
    is materialized. As we only record the vbid to persist (and not
    the state), by the time the VBucketPersistTask runs the actual
    state we /wanted/ to write may have moved forward. Even worse, the
    state could have "gone backwards" (as shown in the MB) if the
    state of the VBucket is read before the vbucket is deleted,
    meaning the task has a 'stale' view of the VBucket object (due to
    us using ref-counted pointers for VBucket objects).

    Additionally, not persisting all intermediate states makes
    debugging harder. We don't actually change VBucket state /that/
    often, and so having all the intermediate states a VBucket went
    through on disk is extremely valuable in debugging.


Instead of having multiple different tasks which can persist state
(and attempting to manage when they are created / when they run / what
state they persist), we instead use a single task (the Flusher task)
to persist state for a given vBucket. Note the Flusher *already*
persists vbucket state if necessary during commit (see
EventuallyPersistentStore::flushVBucket), so this path just adopts the
Flusher as the canonical Task to perform vbstate persistence.

To ensure that state is persisted even when there are no outstanding
'normal' items in the vbucket's checkpoint queue, we use the new
queue_op::set_vbucket_state meta-item to signify that a persist is
pending (and to essentially make the pending items queue non-empty so
the flusher will run).

After this patch all the other Tasks which used to persist vbucket
state are redundent - a subsequent patch will remove them.

Change-Id: Ic44360a1dd14fb882ebfa6f28f4ccfe6127d17a8
Reviewed-on: http://review.couchbase.org/69150
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852 [16/N]: Add queue_op::set_vbucket_state meta-item 49/69149/7
Dave Rigby [Mon, 24 Oct 2016 11:01:15 +0000 (12:01 +0100)]
MB-20852 [16/N]: Add queue_op::set_vbucket_state meta-item

Add a new meta-item to queue_op enum - set_vbucket_state. This will be
used to mark that a VBucket should have it's state persisted.

Change-Id: I108befd70933962d262529ee230382e47c64e31a
Reviewed-on: http://review.couchbase.org/69149
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852 [15/N]: Accurately track meta items within checkpoints 20/69020/8
Dave Rigby [Tue, 11 Oct 2016 16:14:03 +0000 (17:14 +0100)]
MB-20852 [15/N]: Accurately track meta items within checkpoints

Instead of assuming that a Checkpoint only contains 1 (Open) or 2
(Closed) meta-items, maintain a count of items within each Checkpoint,
and track how many meta-items a CheckpointCursor has read.

This allows us to support an arbitrary number of meta-items within a
Checkpoint, and in any sequence. This feature will be used to add
support for set_vbstate meta-items in a subsquent patches.

Change-Id: I8fb3040cbe64e316aae1f693afee65001b2b4b17
Reviewed-on: http://review.couchbase.org/69020
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852 [14/N]: Improve debug/logging in CheckpointManager 21/69021/8
Dave Rigby [Thu, 20 Oct 2016 13:58:01 +0000 (14:58 +0100)]
MB-20852 [14/N]: Improve debug/logging in CheckpointManager

Include information on the CheckpointCursors associated with a
CheckpointManager by adding a operator<< for CheckpointCursor. Add a
dump() method to CheckpointManager to assist in debugging their
contents (e.g. from gdb).

Also add some additional CheckpointManager/CouchKVStore logging.

Change-Id: I8c3de5b5ec0e8e297db8530dee87ac0edd869a91
Reviewed-on: http://review.couchbase.org/69021
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852 [13/N]: Checkpoint: Add getNumMetaItems() method 18/69018/8
Dave Rigby [Thu, 13 Oct 2016 16:08:53 +0000 (17:08 +0100)]
MB-20852 [13/N]: Checkpoint: Add getNumMetaItems() method

Add a new method to Checkpoint which returns the number of metaItems
in the checkpoint.

Initially this just returns a fixed value of 1 if the checkpoint is
open, or 2 if closed, as this matches the current checkpoint meta item

However subsequent patches will modify this to track the count of how
many meta items actually are in the checkpoint and hence allow us to
determine the meta item count when n queue_op::set_vbstate is added.

Change-Id: I411f2e97e16f9b11ac19a1b7165e4767d09f37d1
Reviewed-on: http://review.couchbase.org/69018
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852 [12/N]: Add VBucket::getVBucketState method, use vector for VBuckets in Map 89/67889/12
Dave Rigby [Wed, 21 Sep 2016 14:38:47 +0000 (15:38 +0100)]
MB-20852 [12/N]: Add VBucket::getVBucketState method, use vector for VBuckets in Map

Add a method which will encode the current state of the VBucket into a
vbucket_state struct.

Move from a raw array new to std::vector for VBuckets.

Change-Id: I4755568f977b6e97e7ed9e3bdc64a76d842b6ebd
Reviewed-on: http://review.couchbase.org/67889
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852 [11/N]: Move persistenceCheckpoint id to VBucket 77/67877/14
Dave Rigby [Tue, 13 Sep 2016 16:22:38 +0000 (17:22 +0100)]
MB-20852 [11/N]: Move persistenceCheckpoint id to VBucket

Currently the VBucketMap owns the persistence checkpoint IDs of all
VBuckets. Given these are properties of the VBuckets themselves (for
example one cannot form a vbucket_state record without one) it seems
more logical to have the VBucket class itself own this property.

Change-Id: I50260da628f543365081e736fdd55056a6004cd9
Reviewed-on: http://review.couchbase.org/67877
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852 [10/N]: Return by value from VBucket::getPersistedSnapshot 75/68275/8
Dave Rigby [Mon, 3 Oct 2016 09:23:39 +0000 (10:23 +0100)]
MB-20852 [10/N]: Return by value from VBucket::getPersistedSnapshot

Simplifies usage & makes it explicit that all members of the range are

Change-Id: I43159f7dc4836b4d30c98d0e07bf570f3f34cc1e
Reviewed-on: http://review.couchbase.org/68275
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
4 years agoMB-20852 [9/N]: Explicilty handle all queue_op uses 61/68161/8
Dave Rigby [Thu, 29 Sep 2016 14:05:14 +0000 (15:05 +0100)]
MB-20852 [9/N]: Explicilty handle all queue_op uses

Remove instances of switch statements where we don't check all
enumerations in queue_op, and instead handle them all. Makes it easier
/ safer to add any new queue_op types (and ensure they are handled

Change-Id: I9406c6d327d572268afa4d1830d690ee8ec153a1
Reviewed-on: http://review.couchbase.org/68161
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852 [8/N]: Improve documentation of putCursorsInCollapsedChk 17/69017/5
Dave Rigby [Tue, 11 Oct 2016 14:12:03 +0000 (15:12 +0100)]
MB-20852 [8/N]: Improve documentation of putCursorsInCollapsedChk

Change-Id: Ibe29d7507aea0f49593454416e9e58e4994ffa4e
Reviewed-on: http://review.couchbase.org/69017
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852 [7/N]: CheckpointManager::queueDirty: Pass vb by reference 16/69016/5
Dave Rigby [Wed, 19 Oct 2016 10:26:33 +0000 (11:26 +0100)]
MB-20852 [7/N]: CheckpointManager::queueDirty: Pass vb by reference

The VBucket argument to queueDirty is passed via a (reference to) a
ref-counted pointer. Nothing in this function modifies the arguments'
ref count, or passes it another owner; moreover it is also never null.

Therefore change to pass as a reference.

See also:

Change-Id: Iccfeb42922da558b6e9ab430b96829002e85af4a
Reviewed-on: http://review.couchbase.org/69016
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852 [6/N]: Simplify {start,stop}Flusher & flushVBucket, move to C++11 90/67890/8
Dave Rigby [Thu, 29 Sep 2016 14:24:38 +0000 (15:24 +0100)]
MB-20852 [6/N]: Simplify {start,stop}Flusher & flushVBucket, move to C++11

Move to more C++11 style, and add some comments as to what certain
members mean.

Change-Id: Ibdb900cfe63a9b1352fa1020f9dc58839e3e446d
Reviewed-on: http://review.couchbase.org/67890
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoMB-20852 [5/N]: Checkpoint: C++11-ification 15/69015/6
Dave Rigby [Tue, 11 Oct 2016 12:37:10 +0000 (13:37 +0100)]
MB-20852 [5/N]: Checkpoint: C++11-ification

Convert more of the checkpoint code to C++11 - mostly using auto for
long type names.

Change-Id: I16138f6b16f72e065ecc2c82d973caae7786f846
Reviewed-on: http://review.couchbase.org/69015
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
4 years agoMB-20852 [4/N]: Use named struct when moving cursors between checkpoints 14/69014/7
Dave Rigby [Tue, 11 Oct 2016 12:33:38 +0000 (13:33 +0100)]
MB-20852 [4/N]: Use named struct when moving cursors between checkpoints

Instead of an anonymous std::pair (which is hard to follow what the
two fields are), use a named struct `CursorPosition` and related
CursorIdToPositionMap map when recording cursor positions to move
between checkpoints.

Change-Id: I39d14dbdae34cd20fe880e23b95d01c5fbfdac91
Reviewed-on: http://review.couchbase.org/69014
Reviewed-by: Jim Walker <jim@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852 [3/N]: checkpoint_test enhancements 13/69013/6
Dave Rigby [Mon, 10 Oct 2016 13:44:44 +0000 (14:44 +0100)]
MB-20852 [3/N]: checkpoint_test enhancements

Tighten up existing checkpoint tests by adding additional EXPECT

Add a new CursorMovementReplicaMerge test which tests merging of
cursors on a replica VBucket.

Change-Id: Ia46812c2cd61eeea821a4b2ea46f645e1c66cf7e
Reviewed-on: http://review.couchbase.org/69013
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
4 years agoMB-20852 [2/N]: Convert queue_operation to scoped enum 60/68160/12
Dave Rigby [Thu, 29 Sep 2016 13:52:16 +0000 (14:52 +0100)]
MB-20852 [2/N]: Convert queue_operation to scoped enum

In preparation for adding new queue_op for setVBucketState, convert to
a typesafe C++11 scoped enum. Improve the documentation around
queue_op, and related classes Checkpoint, CheckpointCursor.

Also improve the output streaming (operator<<) for Checkpoint class.

Change-Id: I8f29b8e9e8915a68e31550b78bf3131b3737e2d7
Reviewed-on: http://review.couchbase.org/68160
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852 [1/N]: Update tests to facilitate set_vbucket_state changes 10/69010/10
Dave Rigby [Wed, 19 Oct 2016 11:58:11 +0000 (12:58 +0100)]
MB-20852 [1/N]: Update tests to facilitate set_vbucket_state changes

Use checkXX() macros instead of cb_assert() in a number of test which
check for number of items enqueued.

Also adjust the tests to find the initial value of ep_total_enqueued
before storing items, and use this when checking for expected
values. This facilitates the tests working once the changes for
MB-20852 add the set_vbucket_state meta-item to disk queues.

Change-Id: I0108cc5b635b5376d5852b513a6ed47e67f1e62b
Reviewed-on: http://review.couchbase.org/69010
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
4 years agoAdd error checking to test_access_scanner 12/69012/6
Dave Rigby [Mon, 17 Oct 2016 13:32:26 +0000 (14:32 +0100)]
Add error checking to test_access_scanner

test_access_scanner doesn't check if store() has actually succeeded
when populating data. Fix this.

Change-Id: Ibd4f97b3b47c15b81a7671d5ed32c15aa4796a40
Reviewed-on: http://review.couchbase.org/69012
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
4 years agowait_for_flusher_to_settle: Use wait_for_stat_to_be() 11/69011/5
Dave Rigby [Fri, 14 Oct 2016 16:04:34 +0000 (17:04 +0100)]
wait_for_flusher_to_settle: Use wait_for_stat_to_be()

wait_for_stat_to_be() records the last value (in current) which makes
it easier to debug issues when values arn't as expected. Additionally
this simplifies the code in wait_for_flusher_to_settle.

Change-Id: I601ab4e8b605c937a86b802746cad691ac659308
Reviewed-on: http://review.couchbase.org/69011
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
4 years agoMB-21448: replace-with-CAS should return NOT_FOUND for deleted items 80/69080/4
Dave Rigby [Fri, 21 Oct 2016 14:36:38 +0000 (15:36 +0100)]
MB-21448: replace-with-CAS should return NOT_FOUND for deleted items

Regression test for MB-21448 - if an attempt is made to perform a CAS
operation on a logically deleted item we should return NOT_FOUND (aka

The reason for this "intermittent" behaviour is that we are
essentially racing with the writer thread. The sequence of events for
deleting an item is as follows:

1. User calls DELETE, which locates the key in the HashTable, and then
   sets the deleted flag on that StoredValue (but crucially leaves the
   StoredValue in the HashTable). A deletion operation is then
   enqueued in the CheckpointManager (to write the deletion to disk).

2. When the flusher runs for this vBucket, it calls
   CouchKVStore::commit which after committing invokes
   PersistenceCallback::callback to inform ep-engine that the item has
   been persisted to disk (and we can now remove from the HashTable).

3. The callback locates the item in the HashTable and finally
   physically removes the StoredValue.

The issue here is the client is essentially racing with (3) above. If
the client replace-with-CAS operation is processed before (3), then
the StoredValue is still in the HashTable, and hence unlocked_set
returns NOT_FOUND (aka KEY_ENOENT):

    } else if (cas && cas != v->getCas()) {  // line 1078
        if (v->isTempDeletedItem() || v->isTempNonExistentItem()) {
            return NOT_FOUND;

If on the other hand the client replace-with-CAS operation is
processed after (3), then the StoredValue no longer exists in the
HashTable, and we instead hit this case where v is NULL:

    if (v) {
    } else if (cas != 0) {  // line 1102
        rv = NOT_FOUND;

Solution is to make the unlocked_set behave as if an item doesn't
exist (i.e. post (3) state), even if it's currently present but

Change-Id: If353a310e5096dc838c6db9f673d85515d090842
Reviewed-on: http://review.couchbase.org/69080
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
4 years agoMB-21475: Account for memory alloc/dealloc in stopTaskGroup 84/69184/2
Dave Rigby [Tue, 25 Oct 2016 12:23:13 +0000 (13:23 +0100)]
MB-21475: Account for memory alloc/dealloc in stopTaskGroup

While stopping a task group, any memory allocations/deallocations made
should be accounted to the bucket in question.  Hence no
`onSwitchThread(NULL)` call.

(Note: This is very similar to MB-20054).

Change-Id: Ib5f88fe8e4061ac827a28e267717df32131aba23
Reviewed-on: http://review.couchbase.org/69184
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
4 years agoMB-21178: In 'test_dcp_producer_stream_backfill_no_value' ENOMEM = ETMPFAIL 14/69214/2
Manu Dhundi [Tue, 25 Oct 2016 20:51:49 +0000 (13:51 -0700)]
MB-21178: In 'test_dcp_producer_stream_backfill_no_value' ENOMEM = ETMPFAIL

Change-Id: I38d59b0618c435dd984b5af68b4c8b9c57345133
Reviewed-on: http://review.couchbase.org/69214
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-20746: Delete bloom filter only if temporary filter is created 00/69100/9
Sriram Ganesan [Sat, 22 Oct 2016 01:09:35 +0000 (18:09 -0700)]
MB-20746: Delete bloom filter only if temporary filter is created

During compaction, a temporary filter is only created in the
bloom filter callback function. There is a possibility that this
function will not get invoked if there is no data present during
compaction. So, before swapping the filter, ensure that a temporary
filter is created before deleting the original filter.

Change-Id: I4fcf11d32674be8c28dbda02724b40d3d37e335a
Reviewed-on: http://review.couchbase.org/69100
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Sriram Ganesan <sriram@couchbase.com>
4 years agoMB-21320: Update handleSlowStream log to match the pattern of other DCP logs 73/69173/2
Manu Dhundi [Tue, 25 Oct 2016 01:20:38 +0000 (18:20 -0700)]
MB-21320: Update handleSlowStream log to match the pattern of other DCP logs

Change-Id: I2e9f1aecf66f7ee598d3c12d4be8ab875327bcf8
Reviewed-on: http://review.couchbase.org/69173
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-21320: Remove unused 'schedule' flag in DcpProducer::notifyStreamReady 72/69172/2
Manu Dhundi [Tue, 25 Oct 2016 01:13:55 +0000 (18:13 -0700)]
MB-21320: Remove unused 'schedule' flag in DcpProducer::notifyStreamReady

Change-Id: Ia562ec7baac012d401cc420982431df9920e92d3
Reviewed-on: http://review.couchbase.org/69172
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoLoggedAtomic: Expand set of operations supported, serialize logging 09/69009/4
Dave Rigby [Wed, 19 Oct 2016 10:20:40 +0000 (11:20 +0100)]
LoggedAtomic: Expand set of operations supported, serialize logging

Add support for increment & decrement (fetch_{add,sub}), assignment
and conversion to T.

Also add a mutex to serialize access to stderr, to prevent corruption
in log messages.

Change-Id: I0a617f702a22d287fb87bd1001ee0300b6551b9e
Reviewed-on: http://review.couchbase.org/69009
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Reviewed-by: Will Gardner <willg@rdner.io>
4 years agoMB-20247: Check callback return value for failure 37/68937/3
Jim Walker [Wed, 19 Oct 2016 16:04:18 +0000 (17:04 +0100)]
MB-20247: Check callback return value for failure

The batchWarmupCallback applies many items to the vbucket yet
is not coded to handle failures, e.g. ENOMEM.

These errors are now checked for and the warmup stops when an error
is found.

Testing of this is difficult as it was always quite theoretical:

1. The real failure case here is very hard to hit, I think if you
warmup close to DGM whilst racing deletes/evict from the frontend
you could encounter this situation.

2. warmup has very little unit-testing that can be adapted.

However I've built an instrumented ep-engine that would force the
error condition after n callbacks and I've ran that version under
valgrind and ASAN.

Change-Id: I05d35e10e577a3b5c2c7d21807996ab7b8455cc1
Reviewed-on: http://review.couchbase.org/68937
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-21143: Don't store conflict resolution mode per document 30/68930/4
Jim Walker [Wed, 19 Oct 2016 14:08:10 +0000 (15:08 +0100)]
MB-21143: Don't store conflict resolution mode per document

Some more deadcode that at the moment returns an unitialised
value as there's no conflict mode to return.

Change-Id: I7c0230ff4167d81dabb6dcbcaa5b69e461f7a850
Reviewed-on: http://review.couchbase.org/68930
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years ago[BP] MB-20585: Fix memory leak 36/68936/3
Jim Walker [Wed, 19 Oct 2016 15:28:39 +0000 (16:28 +0100)]
[BP] MB-20585: Fix memory leak

Only one of the MB-20585 fixes needs backporting and
a fix to two tests that came from watson->master merge

Orginal MB commit to master is a288f6c6
Test fix came from 0f532bce

Change-Id: I200706997df2d24a7629282e119b0ab8af2c4872
Reviewed-on: http://review.couchbase.org/68936
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-21428: Make sure all tasks in fake executor pool finish execution in teardown 68/68968/2
Manu Dhundi [Wed, 19 Oct 2016 23:11:09 +0000 (16:11 -0700)]
MB-21428: Make sure all tasks in fake executor pool finish execution in teardown

It is not sufficient to just run all tasks in future and ready queues while
shutdown, we also need to make sure they finish before delete
EventuallyPersistentEngine instance.

Change-Id: I658a2176b6f4a99beacbc8dd2e010b93c4480eaa
Reviewed-on: http://review.couchbase.org/68968
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoMB-21300: [Address Sanitizer] Prevent underflow during hash table resize() 87/68887/2
Manu Dhundi [Tue, 18 Oct 2016 20:51:10 +0000 (13:51 -0700)]
MB-21300: [Address Sanitizer] Prevent underflow during hash table resize()


If i == 0 in line 144 of HashTable::resize(), reading prime_size_table[i - 1]
is spurious.

Change-Id: I7f3135efff7a0993fb939707b912d064514e9a45
Reviewed-on: http://review.couchbase.org/68887
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoReduce log level from WARNING to NOTICE on some events 57/68857/2
Trond Norbye [Fri, 14 Oct 2016 09:26:19 +0000 (11:26 +0200)]
Reduce log level from WARNING to NOTICE on some events

Change-Id: Ice0d3391778a0c4a82739b1a206cb4fe17bc23dc
Reviewed-on: http://review.couchbase.org/68857
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-20247: Check scan return code 01/68801/2
Jim Walker [Mon, 17 Oct 2016 10:21:40 +0000 (11:21 +0100)]
MB-20247: Check scan return code

kvstore->scan can fail, so check the return code to prevent trying
to continue loading whilst failing.

Extension of changes made for MB-16910

Change-Id: Ia890d685d7acb144eca374ca0964e3d65100f89e
Reviewed-on: http://review.couchbase.org/68801
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-21320: Improve debug when handling slow streams 98/68798/3
Dave Rigby [Thu, 13 Oct 2016 11:16:49 +0000 (12:16 +0100)]
MB-21320: Improve debug when handling slow streams

Change-Id: Ia24e955fa975ea538680d5b80150823382a988b9
Reviewed-on: http://review.couchbase.org/68798
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
4 years agoMB-21325: Fix race in BloomFilter::getNumOfKeysInFilter 03/68803/2
Dave Rigby [Mon, 17 Oct 2016 10:36:50 +0000 (11:36 +0100)]
MB-21325: Fix race in BloomFilter::getNumOfKeysInFilter

Address race condition identified by ThreadSanitizer on

WARNING: ThreadSanitizer: data race (pid=341)
  Write of size 8 at 0x7d140000df80 by thread T11 (mutexes: write M20582, write M20817, write M19369):
    #0 BloomFilter::addKey() ep-engine/src/bloomfilter.cc:119 (ep.so+0x00000002453d)
    #1 VBucket::addToFilter() ep-engine/src/vbucket.cc:482 (ep.so+0x000000149704)
    #2 PersistenceCallback::callback() ep-engine/src/ep.cc:3211 (ep.so+0x0000000b27c1)
    #3 non-virtual thunk to PersistenceCallback::callback(int&) ep-engine/src/ep.cc:3167 (ep.so+0x0000000b36c2)
    #4 CouchKVStore::commitCallback() ep-engine/src/couch-kvstore/couch-kvstore.cc:1898 (ep.so+0x00000017aa51)
    #5 _ZN12CouchKVStore17commit2couchstoreEP8CallbackIJ10KVStatsCtxEE ep-engine/src/couch-kvstore/couch-kvstore.cc:1712 (ep.so+0x00000017208b)
    #6 _ZN12CouchKVStore6commitEP8CallbackIJ10KVStatsCtxEE ep-engine/src/couch-kvstore/couch-kvstore.cc:954 (ep.so+0x000000171c92)
    #7 EventuallyPersistentStore::commit(unsigned short) ep-engine/src/ep.cc:3451 (ep.so+0x0000000a29e6)
    #8 EventuallyPersistentStore::flushVBucket(unsigned short) ep-engine/src/ep.cc:3399 (ep.so+0x0000000a1935)
    #9 Flusher::flushVB() ep-engine/src/flusher.cc:293 (ep.so+0x0000001038ae)
    #10 Flusher::step(GlobalTask*) ep-engine/src/flusher.cc:183 (ep.so+0x000000101f7e)
    #11 FlusherTask::run() ep-engine/src/tasks.cc:138 (ep.so+0x00000013ac42)
    #12 ExecutorThread::run() ep-engine/src/executorthread.cc:115 (ep.so+0x0000000fd53f)
    #13 launch_executor_thread(void*) ep-engine/src/executorthread.cc:33 (ep.so+0x0000000fd095)
    #14 platform_thread_wrap(void*) platform/src/cb_pthreads.cc:55 (libplatform_so.so.0.1.0+0x000000005deb)

  Previous read of size 8 at 0x7d140000df80 by main thread (mutexes: write M15538, write M1828880534441356728):
    #0 BloomFilter::getNumOfKeysInFilter() ep-engine/src/bloomfilter.cc:142 (ep.so+0x0000000246a0)
    #1 VBucket::addStats() ep-engine/src/vbucket.cc:597 (ep.so+0x00000014a527)
    #2 EventuallyPersistentEngine::doVBucketStats() ep-engine/src/ep_engine.cc:3837 (ep.so+0x0000000c69f6)
    #3 EventuallyPersistentEngine::doVBucketStats() ep-engine/src/ep_engine.cc:3861 (ep.so+0x0000000c664a)
    #4 EventuallyPersistentEngine::getStats() ep-engine/src/ep_engine.cc:4774 (ep.so+0x0000000cbbb3)
    #5 EvpGetStats() ep-engine/src/ep_engine.cc:231 (ep.so+0x0000000baafe)
    #6 mock_get_stats() memcached/programs/engine_testapp/engine_testapp.cc:215 (engine_testapp+0x0000004ce40d)
    #7 std::string get_stat<std::string>() ep-engine/tests/ep_test_apis.cc:1140 (ep_testsuite_xdcr.so+0x000000020c34)
    #8 unsigned long get_stat<unsigned long>() ep-engine/tests/ep_test_apis.cc:1179 (ep_testsuite_xdcr.so+0x00000002089b)
    #9 get_ull_stat() ep-engine/tests/ep_test_apis.cc:1174 (ep_testsuite_xdcr.so+0x00000001e73e)
    #10 test_del_with_meta_and_check_drift_stats() ep-engine/tests/ep_testsuite_xdcr.cc:1711 (ep_testsuite_xdcr.so+0x0000000143f1)
    #11 execute_test() memcached/programs/engine_testapp/engine_testapp.cc:1118 (engine_testapp+0x0000004cc997)
    #12 __libc_start_main /build/buildd/eglibc-2.15/csu/libc-start.c:226 (libc.so.6+0x00000002176c)

Change-Id: I4e9a589e9286e72646df515db891c57b143d1fdd
Reviewed-on: http://review.couchbase.org/68803
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
4 years agoMB-21190: cbepctl support to change max_cas 36/68736/2
Jim Walker [Fri, 14 Oct 2016 10:30:29 +0000 (11:30 +0100)]
MB-21190: cbepctl support to change max_cas

The initial version of this commit blocks *all* set commands
as it required they all pass a vbucket-id even when not

This update means that changing max_cas is done via

 cbepctl set_vbucket_param

That command requires the vbucket-id.

This commit changes (reverts) to not requiring the extra CLI argument.

 cbepctl set

Change-Id: I04b8b607ccfbc56bca46fe55ca0360f0fe7874ef
Reviewed-on: http://review.couchbase.org/68736
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19746: Add xml output to ep_perfsuite 97/68097/5
Matt Carabine [Fri, 17 Jun 2016 11:16:11 +0000 (12:16 +0100)]
MB-19746: Add xml output to ep_perfsuite

- The new performance commit validation, CBNT, requires input data
  in XML format which mirrors that of the GTest XML output
- This commit adds this functionality to ep_perfsuite, providing
  the ability to output the results in textual format or xml
- It can be invoked by using the new `-f` flag, followed by either
  `xml` or `text`, if this flag is not specified then the default
  behaviour of textual output is assumed

Change-Id: I153a816493181acb091e8d4433f5be543742231f
Reviewed-on: http://review.couchbase.org/68097
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
4 years agodefragmenter_test: Run for fewer iterations under Valgrind 68/68568/3
Dave Rigby [Wed, 3 Aug 2016 13:06:19 +0000 (13:06 +0000)]
defragmenter_test: Run for fewer iterations under Valgrind

Currently when this test is run under Valgrind it takes ~150s to
complete; because we create a large (500,000) number of items to
operate on for the benchmark tests. There's little value in
benchmarking under Valgrind, however it is still useful to run the
tests under Valgrind to validate functional correctness.

Therefore reduce the item count to 10 when run under Valgrind. This
reduces the runtime to <10s

Change-Id: Idda46dff5963c29db6890b0c8841c80b1401c010
Reviewed-on: http://review.couchbase.org/68568
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-21190: cbepctl support to change max_cas 17/68517/8
Jim Walker [Fri, 7 Oct 2016 10:37:48 +0000 (12:37 +0200)]
MB-21190: cbepctl support to change max_cas

If the max_cas of a vbucket is forced forward by a
peer with a 'bad' clock, there are limited recovery
options. Allowing the max_cas to be "reset" could
be useful to recover from such a problem.

Change-Id: I9235520283ee1cd0d5b49820190a9eed3daac3c2
Reviewed-on: http://review.couchbase.org/68517
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-21154: set_param support for drift thresholds 25/68325/11
Jim Walker [Tue, 4 Oct 2016 14:56:20 +0000 (15:56 +0100)]
MB-21154: set_param support for drift thresholds

Enable the set_param command to change the ahead and behind drift
thresholds, enabling changes without a bucket restart.

cbepctl is updated so that it can send drift threshold changes,
but they're not publicised.

> cbepctl ... set vbucket_param hlc_drift_ahead_threshold_us 1

Change-Id: I4973d0c36bfa03ff33e50924b7c10434675d90da
Reviewed-on: http://review.couchbase.org/68325
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-21150: A single total for drift ahead exceptions 72/68272/11
Jim Walker [Mon, 3 Oct 2016 13:33:25 +0000 (14:33 +0100)]
MB-21150: A single total for drift ahead exceptions

Report a stat that captures every time we've set a vbucket's HLC
from a peer who is ahead of the threshold of the node.

Change-Id: I7b7d73b5044afd2a4cfcc2d8ed43e9018b478f19
Reviewed-on: http://review.couchbase.org/68272
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20725: LWW monitoring 58/68158/13
Jim Walker [Thu, 29 Sep 2016 14:58:18 +0000 (15:58 +0100)]
MB-20725: LWW monitoring

Add a number of new stats to the bucket stats and vbucket-details.

Add drift +/- threshold config parameters.

engine stats:

1. The maximum absolute cummulative-drift of the active vbuckets. See
    vbucket-details total_abs_drift

2. The number of updates applied to ep_active_hlc_drift

3. The maximum absolute drift of the replica vbuckets

4. The number of counts applied to ep_replica_hlc_drift

5. The total number of times a setMaxCas was from a peer who is
   ahead of the ahead threshold (hlc_ahead_threshold_us).

6. The total number of times a setMaxCas was from a peer who is
   behind of the behind threshold (hlc_behind_threshold_us).

7. The total number of times a setMaxCas was from a peer who is
   ahead of the ahead threshold (hlc_ahead_threshold_us) for
   replica VBs.

8. The total number of times a setMaxCas was from a peer who is
   behind of the behind threshold (hlc_behind_threshold_us) for
   replica VBs.

vbucket-details stats:

1. The current max_hlc (reported as max_cas for compatibility with
   other modules)

2. The vbucket's absolute cummulative drift

3. How many updates have been applied to total_abs_drift

4. How many times the ahead threshold has been exceeded.

5. How many times the behind threshold has been exceeded.

6. How many logical clock ticks this vbucket's HLC has returned

Change-Id: I063782d4451b97f58a3c89208506bd8bd08b705e
Reviewed-on: http://review.couchbase.org/68158
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20819: Use std::chrono::system_clock for HLC 38/68038/13
Jim Walker [Thu, 29 Sep 2016 14:56:24 +0000 (15:56 +0100)]
MB-20819: Use std::chrono::system_clock for HLC

Use a clock that follows the system clock for generation of HLC

Change-Id: Ibb928c3ea4b83fbcf0862e8c610b1c49154eb0ed
Reviewed-on: http://review.couchbase.org/68038
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20798: Allow CAS and seqno to be generated consistently 70/67670/16
Jim Walker [Tue, 27 Sep 2016 11:26:44 +0000 (12:26 +0100)]
MB-20798: Allow CAS and seqno to be generated consistently

Add a new option to queueDirty so that it can generate
and assign the CAS to the StoredValue.

This allows us to create a seqno and CAS under the same
lock, thus seqno and CAS will be incrementing when
a checkpoint is serially observed.

Change-Id: Ic24619326a4e8722613824f2253b606d228e98c7
Reviewed-on: http://review.couchbase.org/67670
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20798: Update queueDirty options in prep for a new option 82/67582/16
Jim Walker [Tue, 13 Sep 2016 19:51:32 +0000 (20:51 +0100)]
MB-20798: Update queueDirty options in prep for a new option

Switch the bool 'genBySeqno' to a more descriptive enum, call
site will now be clearer regarding the parameter.

Change-Id: I2d6707df0360f02f7455b480383f5ca9d6e64261
Reviewed-on: http://review.couchbase.org/67582
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-21143: Remove adjusted-time/drift and associated code 99/67999/10
Jim Walker [Mon, 26 Sep 2016 13:36:01 +0000 (14:36 +0100)]
MB-21143: Remove adjusted-time/drift and associated code

As part of simplfying the supported LWW code, remove the
adjusted-time API and associated code.

Change-Id: I4d1cb092d4fce3155d1cd1e0134333655bcb3a61
Reviewed-on: http://review.couchbase.org/67999
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-21143: Don't store conflict resolution mode per document 55/67955/9
Jim Walker [Fri, 23 Sep 2016 13:45:40 +0000 (14:45 +0100)]
MB-21143: Don't store conflict resolution mode per document

Disk and memory have storage on a per document basis for the
conflict resolution mode.

LWW is now enabled globally so this meta data is wasted.

Change-Id: I7b54a96f453b4182e0976e6c18cb48ac964e5177
Reviewed-on: http://review.couchbase.org/67955
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-21144: Simplify how LWW is enabled 54/67954/9
Jim Walker [Fri, 23 Sep 2016 16:01:27 +0000 (17:01 +0100)]
MB-21144: Simplify how LWW is enabled

Use existing conflict_resolution_type config flag to enable
LWW on a global bucket basis. Now ignoring the per document
LWW flag.

Change-Id: I2a19fd5633ec6cf28deead3cb5a371e242131fc6
Reviewed-on: http://review.couchbase.org/67954
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMove more unit tests into the ep_unit_tests binary 12/68212/7
Dave Rigby [Fri, 12 Aug 2016 14:53:16 +0000 (15:53 +0100)]
Move more unit tests into the ep_unit_tests binary

For GTest-style tests we have created a single test binary
(ep-engine_ep_unit_tests) to link all the tests into. This has the
advantage of not having to compile different variants of our source
files for multiple different test binaries (which is partly a
limitation / 'feature' of CMake's dependancy calculation).

However not all tests are in this binary. This patch moves an
additional 2 test suites - checkpoint & defragmenter - into the single
binary. This speeds up our build time, and also removes a bunch of
duplicated boilerplate test setup code.

Change-Id: I7a9b6f8166fe2dcb739bdf124b43d1de6abc1e42
Reviewed-on: http://review.couchbase.org/68212
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
4 years agoRemove hooksApi global; reduce coupling with MemoryTracker 10/68210/6
Dave Rigby [Fri, 12 Aug 2016 13:45:20 +0000 (14:45 +0100)]
Remove hooksApi global; reduce coupling with MemoryTracker

MemoryTracker is somewhat tightly coupled with ep_engine.cc as it uses
the getHooksApi() function to obtain the memory allocator hooks.

Firstly this makes it hard to test - compile one file and you have to
include the other, and it's also difficult to provide a different
hooks api - either for injecting a mock one for testing, or simply to
use the 'normal' hooks API but without pulling in ep-engine.

Secondly, there is unnecessary indirection in NewHook / DeleteHook -
which are called on every new/delete so performance is relevant
there. By giving the MemoryTracker it's own copy of the alloc_hooks
(instead of calling getHooksApi() on every call) we can reduce the
amount of work in NewHook / DeleteHook by approx. 50% (measured in
terms of x64 instructions).

Change-Id: Ia0f8ebb0a5263567dc08b32fe6ff9b7ea9eefa92
Reviewed-on: http://review.couchbase.org/68210
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20822: When loading persisted failovers, remove any erroneous entries 02/67702/8
Manu Dhundi [Tue, 4 Oct 2016 20:31:33 +0000 (13:31 -0700)]
MB-20822: When loading persisted failovers, remove any erroneous entries

Due to bugs in older releases (and also possibly in future releases),
we may end up storing wrong failover table on disk. Hence during
warmup while loading failover table from disk we must prune out any
wrong entries.

Also, in the past we have seen erroneous entries with vb_uuid == 0.
So we make sure that vb_uuid is not generated as a valid value and
prune out any entry that has vb_uuid == 0

Change-Id: I630cb7fb1ea9a711432be64f36924d04fcd5e361
Reviewed-on: http://review.couchbase.org/67702
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-21193: Fix ConnectionTest failure when test runs for >21s 73/68273/3
Dave Rigby [Mon, 3 Oct 2016 16:41:14 +0000 (17:41 +0100)]
MB-21193: Fix ConnectionTest failure when test runs for >21s

When CV jobs take longer than normal (e.g. when running under
ThreadSanitizer) the ConnectionTest can fail:

    [ RUN      ] ConnectionTest.test_maybesendnoop_buffer_full
    /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-watson/ep-engine/tests/module_tests/dcp_test.cc:308: Failure
    Value of: ret
      Actual: 255
    Expected: ENGINE_E2BIG
    Which is: 8
    maybeSendNoop not returning ENGINE_E2BIG
    [  FAILED  ] ConnectionTest.test_maybesendnoop_buffer_full (656 ms)

The test is assuming that it starts running in less than 21 seconds
when setting the DCP noop timeout. Instead of using an absolute '21'
we need to use the current_time + 21.

Change-Id: Id4fe8b372da4c447e00128de2c94c72bb60373e4
Reviewed-on: http://review.couchbase.org/68273
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
4 years agoMB-21142: Ensure vbucket states are persisted before the test 64/68264/2
Sriram Ganesan [Mon, 3 Oct 2016 09:14:18 +0000 (14:44 +0530)]
MB-21142: Ensure vbucket states are persisted before the test

In test_tap_filter_stream, vb ids 0 to 3 are set to active
before the test. The test needs to ensure that all the vbucket
states are persisted and the respective couchstore files created
so that there is no race between file creation and operations
on the file

Context: test_tap_filter_stream

Change-Id: Idad34474bdf418504012731459cf4eb3272b1ba9
Reviewed-on: http://review.couchbase.org/68264
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-20744: Increment reject ops on insertion to reject queue 19/68219/10
Sriram Ganesan [Mon, 29 Aug 2016 22:48:40 +0000 (15:48 -0700)]
MB-20744: Increment reject ops on insertion to reject queue

In case of a failure to persist items, the items are added to the
reject queue in which the reject stats needs to be updated for
the associated vbucket.

Change-Id: I15b7ad26d8bcab5b6a437b8d172d8b914df8b683
Reviewed-on: http://review.couchbase.org/68219
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-21140: Fix race in test_access_scanner_settings (gmtime_r) 51/68151/4
Dave Rigby [Fri, 23 Sep 2016 13:59:34 +0000 (14:59 +0100)]
MB-21140: Fix race in test_access_scanner_settings (gmtime_r)

When calculating access scanner adjusted time values, use the
thread-safe variant (gmtime_r) so the test doesn't conflict with the
ep_engine code.

Also use std::chrono to perform the time manipulaton (which handles
any modulus of minutes -> hours etc).

Change-Id: Icf8505e4ce465f382904934dcaa05527efc57454
Reviewed-on: http://review.couchbase.org/68151
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20623: Warmup: Process the access log in chunks 45/68045/7
Dave Rigby [Tue, 23 Aug 2016 14:28:07 +0000 (15:28 +0100)]
MB-20623: Warmup: Process the access log in chunks

Instead of loading the entire Access log into memory, and then
applying in one (potentially very large) getMulti; process it in
warmup_batch_size chunks (default 10,000 items).

This sigificantly reduces the amount of temporary memory consumed
during warmup, which in turn means this memory can be used for
document values.

Results: in the following workload we manage to load 19.2M items (or
4.2GB) into memory during warmup, where previously only two items
(yes, *two*) were warmed up due to the size of the temporary data
structures - because the temporary data structures consumed ~4.2GB.

* 1 bucket, 10,000MB quota, 1 node.
* 30M items, 300bytes each; giving a residency ratio of ~60%. Dataset
  generated by:

    cbc-pillowfight -U couchbase://localhost:12000 -I 30000000 -m 300 -M 300 -t16

Change-Id: I511b70d5ea9c9c6b9556249a936030a67bf70c02
Reviewed-on: http://review.couchbase.org/68045
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20623: Warmup: Implement MutationLog::iterator copy assignment 44/68044/8
Dave Rigby [Tue, 23 Aug 2016 14:21:22 +0000 (15:21 +0100)]
MB-20623: Warmup: Implement MutationLog::iterator copy assignment

MutationLog::iterator doesn't follow the Rule of Three - it doesn't
implement the copy-assigment operator. This means that it's not a
complete iterator implementation.

Fix this, and add a unit test for it.

Change-Id: I12d67bc072d72e481e6a195e2d45b16c0318fdc0
Reviewed-on: http://review.couchbase.org/68044
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20623: Remove 'fetches' vector from MutationLogHarvester::apply 43/68043/7
Dave Rigby [Tue, 23 Aug 2016 11:12:05 +0000 (12:12 +0100)]
MB-20623: Remove 'fetches' vector from MutationLogHarvester::apply

MutationLogHarvester::apply() builds a vector of keys (`fetches`) to
fetch from disk. This is essentially identical to the contents of
`committed`, except it only contains keys which currently exist in the

We can optimize this by removing fetches, and instead erasing the
elements from `committed[vb]` which are no longer valid. This also
means we no longer have to check for duplicates in batchWarmupCallback
(which was kinda pointless even before), as the data structure passed
in is now a set (and hence cannot by definition contain dupes).

Results in a reduction in the memory used by these temporary warmup
data structures - from 3876MB to 3218MB (17%) for the following

* 1 bucket, 10,000MB quota, 1 node.
* 30M items, 300bytes each. Dataset generated by:
    cbc-pillowfight -U couchbase://localhost:12000 -I 30000000 -m 300 -M 300 -t16

Change-Id: I1c2e480e53626be41fa20dcd44abe4d6fa327e4c
Reviewed-on: http://review.couchbase.org/68043
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20623: Remove unused fields from temporary warmup data structures 42/68042/7
Dave Rigby [Mon, 22 Aug 2016 14:43:29 +0000 (15:43 +0100)]
MB-20623: Remove unused fields from temporary warmup data structures

Warmup currently creates three data structures (per vBucket) in memory
when warming up using the access.log:

1. A map of keys -> sequence number at the time the access log was
   generated - MutationLogHarvester::committed

2. A list of {key, hashTable_seqNo} pairs -

3. A map of key -> VBucketBGFetchItem for every key in the list above
- batchWarmupCallback::items2fetch

There are a number of inefficiencies in this implementation, the first
of which is that we record sequence numbers in the first two data
structures which are never actually used - the final BGfetch doesn't
need them.

This patch therefore removes the recording of sequence numbers. This
changes the data structures to:

1. MutationLogHarvester::committed - a set of keys (per vBucket).
2. MutationLogHarvester::apply::fetches - a vector of keys.

Results in a reduction in the memory used by these temporary warmup
data structures - from 4356MB to 3876MB (11%) for the following

* 1 bucket, 10,000MB quota, 1 node.
* 30M items, 300bytes each. Dataset generated by:
    cbc-pillowfight -U couchbase://localhost:12000 -I 30000000 -m 300 -M 300 -t16

Change-Id: I0666e2d1a8a0d9e996cdbdd61d41118d2c2d6dfc
Reviewed-on: http://review.couchbase.org/68042
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20623: MutationLog: Remove unused Delete and Uncommitted functionality 41/68041/7
Dave Rigby [Mon, 22 Aug 2016 13:10:17 +0000 (14:10 +0100)]
MB-20623: MutationLog: Remove unused Delete and Uncommitted functionality

The MutationLog (when actually used for Mutations) could record
Deletes (in addition to 'adds') of keys, and return a list of
'uncommitted' keys.  However this functionality has long been unused
(since March 2013, http://review.couchbase.org/26943).

Remove this unused code to simplify the way the MutationLog class is
now used - for the access log.

Change-Id: I5642553708c7d5cf320cec5524ae49643a414192
Reviewed-on: http://review.couchbase.org/68041
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20623: Convert MutationLog unit tests to GTest 40/68040/8
Dave Rigby [Tue, 27 Sep 2016 14:00:03 +0000 (15:00 +0100)]
MB-20623: Convert MutationLog unit tests to GTest

Convert to using the GoogleTest framework, and integrate into
ep-engine_ep_unit_tests binary.

Change-Id: I7caff50202abc4ecf55c513f4b6859d5c5d48d40
Reviewed-on: http://review.couchbase.org/68040
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20623: re-enable and fix MutationLog unit test 39/68039/10
Dave Rigby [Mon, 22 Aug 2016 13:46:01 +0000 (14:46 +0100)]
MB-20623: re-enable and fix MutationLog unit test

This test appears to have been not built/run a while back. Resurrect

Change-Id: I4b35291239882e5e58bc2c7d435e3c793a7ae6ba
Reviewed-on: http://review.couchbase.org/68039
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoAdd LoggedAtomic<> debug class 44/67944/6
Dave Rigby [Fri, 23 Sep 2016 10:29:54 +0000 (11:29 +0100)]
Add LoggedAtomic<> debug class

A Debugging wrapper around std::atomic which print all accesses to the
atomic value to stderr.

Drop-in compatible with AtomicValue for simple use-cases - currently
only implements load() / store().

Change-Id: I78cec4d8bad55588900573f201d81a38f16f97ee
Reviewed-on: http://review.couchbase.org/67944
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
4 years agoMB-20822: Erase all diverged branch entries correctly in failover table 81/67481/6
Manu Dhundi [Wed, 14 Sep 2016 19:12:29 +0000 (12:12 -0700)]
MB-20822: Erase all diverged branch entries correctly in failover table

When we add an entry in failover table we must erase all the other
entries with higher seqno because they are from a diverged branch.

This commit fixes a bug in the loop that was erasing diverged entires.
In a std::list when we erase an entry, the function returns the
iterator pointing to next element and hence we must be careful not to
double increment it.

Change-Id: I9275fba8057f26e2853a8d7d359f1d01f107f2be
Reviewed-on: http://review.couchbase.org/67481
Reviewed-by: Dave Rigby <daver@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-16181: Add replicate/persist traits to Item 90/67990/4
Jim Walker [Mon, 26 Sep 2016 09:29:10 +0000 (10:29 +0100)]
MB-16181: Add replicate/persist traits to Item

Provide an abstraction for the flusher and DCP that tells them
if an item should be persisted or replicated.

This provides a stepping stone towards system owned items in

Change-Id: Ie5e65a2b20d0d162e1b8fe3e439219c34fb7b508
Reviewed-on: http://review.couchbase.org/67990
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20869: Simplify test to fix intermittent test_access_scanner failure 84/67884/11
Dave Rigby [Wed, 7 Sep 2016 14:40:08 +0000 (15:40 +0100)]
MB-20869: Simplify test to fix intermittent test_access_scanner failure

In test_access_scanner the check that the number of non-resident items
was less than 6% of the total is incorrect - it was using integer
division (6/100) and hence the RHS of the check was always zero, which
would always succeed. Fix the check so it calculates the percentage

Additionally we did not wait for the previous AccessScanner run to
finalize - which meant that the second start of the access acanner
could fail to start (and consequently we would timeout with the

    Exceeded maximum wait time of 60000000us waiting for stat
    'ep_num_access_scanner_runs' to be greater or equal than 8 -

To solve this, change to just one iteration of creating the file (not
sure why it did two before), and move the increment of stats.alogRuns
to /after/ the access log files are renamed.

Change-Id: I6c3b6eeed91302fd7fa67dfade6ae954078fc125
Reviewed-on: http://review.couchbase.org/67884
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years ago[BP] MB-20598: AddressSanitizer problems 06/68106/3
Jim Walker [Fri, 19 Aug 2016 15:19:51 +0000 (15:19 +0000)]
[BP] MB-20598: AddressSanitizer problems

Two leaks and one stack overflow.

The forest-kvstore code should use dynamic_cast (like couch-kvstore)
else when the incoming callback is not a RememberingCallback, we will
access outside of the incoming object.

ep_testsuite has a leak in tap code where we must release items
during iteration.

Change-Id: If0db9e936ee141299c5a579235e828c7309b8118
Reviewed-on: http://review.couchbase.org/68106
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years ago[BP] MB-20598: AddressSanitizer problem in perfsuite 07/68107/2
Jim Walker [Thu, 25 Aug 2016 09:47:14 +0000 (09:47 +0000)]
[BP] MB-20598: AddressSanitizer problem in perfsuite

This test was being OOM killed so was left out of
the orginal set of fixes. Tap iterator mutation/deletions
need to be released by the client.

Change-Id: Ib65f386f1080cb2a130cfdd7d90c85dd4a871989
Reviewed-on: http://review.couchbase.org/68107
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19578: Fail warmup only when explicitly configured 53/67353/9
Sriram Ganesan [Sat, 3 Sep 2016 23:49:53 +0000 (16:49 -0700)]
MB-19578: Fail warmup only when explicitly configured

- Instead of unconditionally failing warmup in case of a OOM failure,
  fail it only when it is configured to do so. Note that ns_server
  explicitly sets this value to false.
- Remove an unnecessary check in the bucket initialization path to
  check for warmup failures as soon as warmup is started. The warmup
  process is asynchronous and thus checking for failures right away
  is futile.

Change-Id: I5725644a3731e7cd75e98b0b77ca6ae8a3d88ab9
Reviewed-on: http://review.couchbase.org/67353
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20528: unconditionally increment get meta ops 07/67307/12
Sriram Ganesan [Thu, 1 Sep 2016 00:02:55 +0000 (17:02 -0700)]
MB-20528: unconditionally increment get meta ops

The get meta operations prior to introducing bloom filters was
always incremented whether the key was present or not present
in the bucket. Restore the behavior where the get meta ops stat
is always incremented.

Change-Id: I09fadedb71929d0d078d0ca6c49ef27de3fd5bef
Reviewed-on: http://review.couchbase.org/67307
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852: ep_unit_tests_main: Show DEBUG logs with -v 88/67888/5
Dave Rigby [Wed, 21 Sep 2016 14:33:18 +0000 (15:33 +0100)]
MB-20852: ep_unit_tests_main: Show DEBUG logs with -v

Previously we would only print INFO level log messages when the -v
flag was given, even though the code claimed to have enabled DEBUG

This was because we didn't get the log level early enough - the
extension defaults to INFO, and will propogate that down to ep_engine
(overwriting what we specify). Fix by setting the log level in the
server API to debug.

Change-Id: I8a192169721fad631965600b69975d882b6221f8
Reviewed-on: http://review.couchbase.org/67888
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoRemove default parameter from queueDirty 23/67623/12
Daniel Owen [Tue, 13 Sep 2016 11:14:32 +0000 (12:14 +0100)]
Remove default parameter from queueDirty

The EventuallyPersistentStore method queueDirty has a parameter called
notifyReplicator, which is defaulted to true.  The parameter is used
to specify whether or not to notify the replicator.

However the queueDirty method either uses the notifyReplicator default
of true, or is passed in true.

This patch removes the unrequired parameter from the queueDirty
definition and simpifies the body of the method.

Change-Id: I6340e522f0e6137bc744450ddc90e418ed7716a2
Reviewed-on: http://review.couchbase.org/67623
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Reviewed-by: Sriram Ganesan <sriram@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20834: Use get_available_cpu_count() for executorpool #threads calculation 93/67493/3
Dave Rigby [Thu, 8 Sep 2016 13:32:45 +0000 (14:32 +0100)]
MB-20834: Use get_available_cpu_count() for executorpool #threads calculation

Use the new Couchbase::get_available_cpu_count() when calculating how
many executor pool threads to create. get_available_cpu_count()
accounts for the number of logical cores a process is allowed to run
on, which is the mechanism cgroups (and hence Docker) uses to limit
the CPUs available to a container.

This fixes the problem of us creating "too many" executorpool threads
under Docker containers which have the --cpuset-cpus option set.

Change-Id: I3e3b91eecc51aea298ae9aceb9e8c6d3f16b7612
Reviewed-on: http://review.couchbase.org/67493
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoRemove default parameter from deleteItem 24/67624/6
Daniel Owen [Tue, 13 Sep 2016 10:25:39 +0000 (11:25 +0100)]
Remove default parameter from deleteItem

The EventuallyPersistentStore method deleteItem has a parameter
called tapBackfill, which is defaulted to false.  The parameter
is used to specify if an item deletion is from a TAP backfill

However the deleteItem method is never passed a tapBackfill
parameter and therefore the default of false is always used.

This patch removes the unrequired parameter from the deleteItem
definition and simpifies the body of the method.

Change-Id: Ic1aa9a41bd68411f9b29b61333f66b4c1ae35278
Reviewed-on: http://review.couchbase.org/67624
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>

4 years agoMB-21036: Fix intermittent failure in shutdown_snapshot_range 48/67948/6
Dave Rigby [Fri, 23 Sep 2016 16:52:52 +0000 (17:52 +0100)]
MB-21036: Fix intermittent failure in shutdown_snapshot_range

Issue is that the test attempts to create exactly 3 checkpoints,
however this is load-dependent (i.e. how quickly the flusher runs and
creates checkpoints).

Remove this intermediate checks in the test, and just check the
sequence numbers.

Change-Id: Ic7c0a9217afcdc8bd65680efb992b09db0f5023b
Reviewed-on: http://review.couchbase.org/67948
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
4 years agoMB-21034: Fix race in test_expiry_pager_settings (gmtime_r) 46/67946/5
Dave Rigby [Fri, 23 Sep 2016 13:59:34 +0000 (14:59 +0100)]
MB-21034: Fix race in test_expiry_pager_settings (gmtime_r)

When calculating expiry pager adjusted time values, use the
thread-safe variant (gmtime_r) so the test doesn't conflict with the
ep_engine code.

Also use std::chrono to perform the time manipulaton (which handles
any modulus of minutes -> hours etc).

Change-Id: Iee39a86b73a71642b9dab4ff2821d589699731ac
Reviewed-on: http://review.couchbase.org/67946
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-21029: Don't return ep_access_scanner_task_time until it has been scheduled 43/67943/7
Dave Rigby [Fri, 23 Sep 2016 10:23:11 +0000 (11:23 +0100)]
MB-21029: Don't return ep_access_scanner_task_time until it has been scheduled

Fix a latent race condition in ep_access_scanner_task_time stat
(exposed by intermittent failure in access_scanner_settings test),
whereby we can return a value for ep_access_scanner_task_time of the
Unix epoch (1970-01-01 00:00:00) if the AccessScanner has been enabled
but not yet scheduled.

Solution is to not return a value (and instead 'NOT SCHEDULED') until
scheduling has occurred.

Change-Id: Id85a602b3424c30c8795884207a1b0d31cb3c75a
Reviewed-on: http://review.couchbase.org/67943
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-21029: ep_test_apis: make wait_for_stat_XXX functions templated 42/67942/6
Dave Rigby [Fri, 23 Sep 2016 10:12:36 +0000 (11:12 +0100)]
MB-21029: ep_test_apis: make wait_for_stat_XXX functions templated

There is a range of helper functions to wait for a stat to meet some
criteria (equal, not equal, greater, ...), and currently these are
duplicated for each type (int, string, ...) required.

As the subsequent fix for MB-21029 needs a wait_for_stat_to_change for
the std::string type, genericize the current wait_for_stat function so
it can be used with any type.

Change-Id: I681218d31c4dcd1ef8b34de225efd13e99bbf8db
Reviewed-on: http://review.couchbase.org/67942
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852: ep_test_apis: report final value in wait_for_stat... 87/67887/8
Dave Rigby [Wed, 21 Sep 2016 12:32:14 +0000 (13:32 +0100)]
MB-20852: ep_test_apis: report final value in wait_for_stat...

Change-Id: I5aeae80ffcc0791701cf5982df4b11d00b288c8a
Reviewed-on: http://review.couchbase.org/67887
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20852: Remove unused 'force' parameter from scheduleVBStatePersist() 76/67876/6
Dave Rigby [Tue, 13 Sep 2016 13:06:04 +0000 (14:06 +0100)]
MB-20852: Remove unused 'force' parameter from scheduleVBStatePersist()

'force' is always false, so remove it.

Change-Id: I02e245b2a0237697c9abc4d55e2a9117cf4ca214
Reviewed-on: http://review.couchbase.org/67876
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20697: ep_testsuite: restore deleted DB dir before cleanup 81/67681/4
Dave Rigby [Wed, 14 Sep 2016 17:52:49 +0000 (18:52 +0100)]
MB-20697: ep_testsuite: restore deleted DB dir before cleanup

In some environments (seen on Windows VM), the regression test for
MB-20697 can hang forever during cleanup due to the writer thread being
stuck in an infinite loop trying to flush to disk.

To address this, re-create the database directory before shutting down
EPStore (after the test is complete).

Change-Id: I474e77bafbe5b30858d9a306669c52713890f846
Reviewed-on: http://review.couchbase.org/67681
Reviewed-by: Daniel Owen <owend@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMerge remote-tracking branch 'couchbase/4.5.1' into watson 00/68000/1 v4.5.1-MP1
Dave Rigby [Mon, 26 Sep 2016 13:54:02 +0000 (14:54 +0100)]
Merge remote-tracking branch 'couchbase/4.5.1' into watson

* couchbase/4.5.1:
  MB-20943: Set state to dead when deleting vbucket

Change-Id: I79ef4c2d0e66cf18e08f884ad7fecf2afe3b5578

4 years agoMB-20943: Set state to dead when deleting vbucket 73/67873/8 4.5.1 v4.5.1
Daniel Owen [Wed, 21 Sep 2016 09:50:33 +0000 (10:50 +0100)]
MB-20943: Set state to dead when deleting vbucket

When executing the VBucketMemoryDeletionTask the vbucket state is
unchanged.  notifyAllPendingConnsFailed is called in the run
function of VBucketMemoryDeletionTask.  This inturn calls fireAllOps,
which ensures all pending ops are cleared if the vbucket is in an
active state.

However if the vbucket is in a pending state is does nothing and
therefore the pending operations remain.  This results in connections
not being closed down.

The solution provided is to set the vbucket state to dead in
deleteVBucket, prior to calling scheduleVBDeletion.

A corresponding test has been added, which without the fix will hang.

Change-Id: I09cd4597b26576dd4b99d26f3a60c031e1b5f41d
Reviewed-on: http://review.couchbase.org/67873
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoMB-20771: Remove unnecessary defragmenter_test operator overloads 75/67675/3
Dave Rigby [Wed, 14 Sep 2016 14:49:42 +0000 (15:49 +0100)]
MB-20771: Remove unnecessary defragmenter_test operator overloads

The defragmenter & ep_unit tests have their own operator new/delete
overload, to improve interopability with Valgrind (so Valgrind saw
explicit calls our malloc / free).

However the global_new_replacement in platform does this anyway now, so
this code is no longer needed.

Change-Id: I49dbc9e14582d39b58b1db48a0c7772d4f3855c9
Reviewed-on: http://review.couchbase.org/67675
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
4 years agoMB-20586: Update ep-engine to use cb_malloc memory API 13/67313/8
Dave Rigby [Fri, 26 Aug 2016 12:37:57 +0000 (13:37 +0100)]
MB-20586: Update ep-engine to use cb_malloc memory API

Similar to changes in memcached, update all C-style memory allocation
uses to use cb_malloc instead of raw system malloc.

Change-Id: Ic9bce029c34f74e161aed20b99129985264e0d4c
Reviewed-on: http://review.couchbase.org/67313
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20586: ep_unit_tests: Use real memory tracking code 42/67442/2
Dave Rigby [Fri, 12 Aug 2016 14:08:39 +0000 (15:08 +0100)]
MB-20586: ep_unit_tests: Use real memory tracking code

Use the 'real' memory tracking hooks instead of alloc_hooks_dummy in
the ep-engine unit tests. This more accurately reflects how our code
used in the 'real world'

Change-Id: I231a179e7765d46a63c72686c0279983db21cf0b
Reviewed-on: http://review.couchbase.org/67442
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoMB-20586: DefragmenterTest.MappedMemory: Fix on OS X 12/67312/6
Dave Rigby [Fri, 26 Aug 2016 12:39:59 +0000 (13:39 +0100)]
MB-20586: DefragmenterTest.MappedMemory: Fix on OS X

This test currently fails as it does not have a way of tracking memory
usage, due to us not initializing the memory hooks (which installs a
custom zone allocator).

Fix by initializing the allocator before we run any tests in this

Change-Id: I3b567001b070483d19d16ff1787d29ebd9669bfa
Reviewed-on: http://review.couchbase.org/67312
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoMB-20586: Configuration: Reduce use of C-style memory allocation 11/67311/6
Dave Rigby [Wed, 17 Aug 2016 14:00:20 +0000 (15:00 +0100)]
MB-20586: Configuration: Reduce use of C-style memory allocation

Remove C-style allocation where possible from Confiuration, replacing
with C++ owning containers.

Change-Id: I009a07064fb5c2727ab4d62f85aa9efd7f79684a
Reviewed-on: http://review.couchbase.org/67311
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoMB-20586: MemoryTracker: Use new[] instead of calloc() 10/67310/7
Dave Rigby [Thu, 18 Aug 2016 16:04:02 +0000 (17:04 +0100)]
MB-20586: MemoryTracker: Use new[] instead of calloc()

Change-Id: I44882ee3d385c3ab70787fd187105fcdeb26545e
Reviewed-on: http://review.couchbase.org/67310
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoMB-20586: Use cJSON_Free() to free allocations made by cJSON_Print 69/67069/6
Dave Rigby [Thu, 18 Aug 2016 16:11:29 +0000 (17:11 +0100)]
MB-20586: Use cJSON_Free() to free allocations made by cJSON_Print

Otherwise we can get mismatched malloc/free when using cbmalloc or
another custom allocator.

Change-Id: Ifdd2fe031cb6d6c785a77d552b966fa173de1593
Reviewed-on: http://review.couchbase.org/67069
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoMB-20716: Ensure DCP consumers in EWOULDBLOCK are unpaused on bucket delete 75/67375/6
Dave Rigby [Tue, 6 Sep 2016 08:04:16 +0000 (09:04 +0100)]
MB-20716: Ensure DCP consumers in EWOULDBLOCK are unpaused on bucket delete

This is a follow-up to 8734958 - in addition to ensuring that DCP
producers are unpaused, also unpause DCP _consumers_.

Change-Id: I538e7bca865c4fa41240263da1c92312b3866bfa
Reviewed-on: http://review.couchbase.org/67375
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-20751: Fix lock cycle (deadlock) during bucket delete & client disconnect 52/67252/4
Dave Rigby [Thu, 1 Sep 2016 15:30:41 +0000 (16:30 +0100)]
MB-20751: Fix lock cycle (deadlock) during bucket delete & client disconnect

MB-20716 recently fixed an issue where idle DCP connections (in
EWOULDBLOCK state) would not be notified (woken up) in the frontend
when a bucket was deleted. The fix for this was to trigger a notify
(via producer->notifyPaused()) as part of ep-engine bucket delete.

Unfortunately this introduced a lock-order-inversion (deadlock)
between two mutxes, which caused memcached to hang during shutdown,
as one (or more) worker threads would never terminate.

The issue is between:

1. Frontend_worker thread mutex (threadLock)
2. ConnMap::connsLock

And at least two threads (although normally 3 in the wild):

T1: Frontend worker thread
T2: DestroyBucket thread
(optional T3: A NONIO thread running ConnManager)

During bucket delete, the follow sequence occurs which creates a cycle
between threadLock and connsLock:

    event_handler() ... conn_pending_close()
      -> LOCK(threadLock)
      -> ACQUIRE(connsLock)

    EventuallyPersistentEngine::handleDeleteBucket() ...
      -> LOCK(connsLock)
    notifyIOComplete() ... DcpProducer::notifyPaused()
      -> ACQUIRE(threadLock)

Part of the issue here is that DcpProducer::notifyPaused() *must* be
called with schedule==false, as there is no longer a ConnNotifier task
running on another thread (which never acquires the connsLock and
hence avoids any deadlock), as the ConnNotifier has been shutdown in
DcpConnMap::shutdownAllConnections previously. Therefore we need to
use the variant of notifyPaused which calls notify_IO_complete in the
same thread.

The solution chosen is to essentially drop the connsLock in
shutdownAllConnections before calling notify. We achive this by taking
a _copy_ of the connections map (under connsLock), and then iterating
over this copy and calling notify etc. This is safe as the elements of
the map are all ref-counted pointers.

Change-Id: I73f9b7576e42030a9f5219ae51e604e36fabcac7
Reviewed-on: http://review.couchbase.org/67252
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>