ep-engine.git
4 years agoMB-19635: Initialise failovers correctly from 2.5.x vbstate 55/64155/2
Jim Walker [Tue, 17 May 2016 16:41:10 +0000 (17:41 +0100)]
MB-19635: Initialise failovers correctly from 2.5.x vbstate

When loading a vb file, don't force the failover table data
to be ("[{\"id\":0,\"seq\":0}]"); if the file doesn't contain
any data.

Change-Id: I41673bf848fcbab9b616edec5c7fd2ab9a3ddd6b
Reviewed-on: http://review.couchbase.org/64155
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19503: Fix ConnMap so notifications don't go missing [2] 72/64072/3
Jim Walker [Mon, 16 May 2016 15:24:35 +0000 (16:24 +0100)]
MB-19503: Fix ConnMap so notifications don't go missing [2]

Previous patch[1] cleared the isNotificationScheduled flag
at the wrong place and meant things could then never
again get scheduled.

This is because we only cleared the flag if tp->isPaused()
yet we still pop the notification from the queue, so we
left tp->isNotificationScheduled yet the queue is empty.
Now no more notifications will ever get scheduled!

So we need to clear the notification scheduled boolean
unconditionally of the other flags on tp.

[1] - Commit 0856e0b3d3fc6

Change-Id: I11c9fd72f4b35102328022bd4c334a9e09a61cd0
Reviewed-on: http://review.couchbase.org/64072
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
4 years agoMB-19503: Fix ConnMap so notifications don't go missing. 25/64025/2
Jim Walker [Wed, 11 May 2016 15:26:47 +0000 (16:26 +0100)]
MB-19503: Fix ConnMap so notifications don't go missing.

There's a reliance on an atomic bool and cmpxchg to
prevent the producer of notification from queueing
himself if he's already got a notification scheduled.

There's an ordering issue though where the producers code
can execute, see the flag is true and not bother queueing
a notification, yet the consumer side is about to clear the
flag and finish. The notification thus never gets queued
and the producer side thinks he will get a notification.

In my terminology:
producer is ConnMap::notifyPausedConnection
consumer is ConnMap::notifyAllPausedConnections

Change-Id: Id324b6369c5ee3a6b6758a7a93e017a4ff7c4a78
Reviewed-on: http://review.couchbase.org/64025
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-16656: Send snapshotEnd as highSeqno for replica vb in GET_ALL_VB_SEQNOS call 25/62925/3 v4.1.1
Manu Dhundi [Fri, 15 Apr 2016 20:54:42 +0000 (13:54 -0700)]
MB-16656: Send snapshotEnd as highSeqno for replica vb in GET_ALL_VB_SEQNOS call

For replica vbucket we must send snapshotEnd received in the last snapshotMarker
as the high seqno. Sending lastClosedChkSeqno can cause problems for view engine
which builds an index from replica vbucket.

Previously this was sent correctly in seqno stats, now adding it for
GET_ALL_VB_SEQNOS as well.

Change-Id: Ifad267521184c4976e1cb194e6814b56963298b0
Reviewed-on: http://review.couchbase.org/62925
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMerge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock' 98/62598/1
abhinavdangeti [Fri, 8 Apr 2016 03:03:22 +0000 (20:03 -0700)]
Merge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock'

couchbase/3.0.x:
|\
| * 771859f MB-19093 [BP]: [ActiveStream] Address potential lock-inversion scenarios
| * 6811030 MB-19075: Remove printing of empty string in CouchKVStore::getMulti()

Change-Id: I06717dcaad2b764582ebe378a6f9023f7ca76d88

5 years agoMB-19093 [BP]: [ActiveStream] Address potential lock-inversion scenarios 98/62498/4
abhinavdangeti [Fri, 15 Jan 2016 16:36:52 +0000 (08:36 -0800)]
MB-19093 [BP]: [ActiveStream] Address potential lock-inversion scenarios

Acquire vbucket state lock only when really necessary
in the ActiveStream context. Also avoid acquiring one
lock within the other wherever possible in the ActiveStream
context again.

This change is to avert potential deadlocks due to
lock inversion that will be induced by upcoming changes,
here are the scenarios:
(i)     Locking between streamsMutex, streamMutex and
        vb_stateLock in the set operation - handle
        response scenario.
        (http://factory.couchbase.com/job/ep-engine-threadsanitizer-master/1225/console)
(ii)    In case of a set operation, vb_stateLock is
        acquired and then streamMutex is acquired for
        notification. During markDiskSnapshot, the
        streamMutex is acquired before the vb_stateLock
        lock is acquired.
        (http://factory.couchbase.com/job/ep-engine-threadsanitizer-master/1268/console)

(Already reviewed at: http://review.couchbase.org/58557)

Change-Id: I5e5a3e2cc5ba9ae17090e1a3ee4bde100d305f1c
Reviewed-on: http://review.couchbase.org/62498
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-19075: Remove printing of empty string in CouchKVStore::getMulti() 25/62325/7
Sriram Ganesan [Fri, 1 Apr 2016 23:12:10 +0000 (16:12 -0700)]
MB-19075: Remove printing of empty string in CouchKVStore::getMulti()

In case of an error in opening a file, an error message is logged.
But the string that is supposed to hold the name of the file is
not populated, thus resulting in an empty string getting printed.
Remove the string from printed as openDB already prints the name
of the file in case of an open failure.

Change-Id: Ife3aec8381ead4f2e0b84c921a3781efa39a2126
Reviewed-on: http://review.couchbase.org/62325
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-18476: Handle write failures more gracefully in the mutation log 16/61116/11
Sriram Ganesan [Tue, 8 Mar 2016 22:08:50 +0000 (14:08 -0800)]
MB-18476: Handle write failures more gracefully in the mutation log

Log and error message in case of a write failure and remove any unnecessary
asserts in that code path

Change-Id: I50b7e4de4d414e21bf00404a22863baff06c0f4f
Reviewed-on: http://review.couchbase.org/61116
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-17517: Testcase for invalid max_cas in _local/vbstate 58/60858/3
Dave Rigby [Thu, 3 Mar 2016 15:03:27 +0000 (15:03 +0000)]
MB-17517: Testcase for invalid max_cas in _local/vbstate

Unit test for fe67145 - verify that if a vbucket is created with a max
CAS of -1, upon re-reading from disk it is detected and fixed.

Change-Id: Ifdf3e642589f93191786ac55c1cf02207159f657
Reviewed-on: http://review.couchbase.org/60858
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
5 years agoMB-17517: Don't create TAP_MUTATIONS with CAS of -1 01/60801/6
Dave Rigby [Wed, 2 Mar 2016 16:56:40 +0000 (16:56 +0000)]
MB-17517: Don't create TAP_MUTATIONS with CAS of -1

In TapProducer::getNextItem(), a TAP mutation is generated by asking
EPStore to get() an Item for the given key. However, if the key in
question is locked (for example, if there was a SET followed by a LOCK
operation), then get() returns an Item with a CAS of -1. This is
normally only intended for end-users (we 'hide' the CAS of locked
items so only the client which locked it, and has the correct CAS can
mutate it). However we incorrectly end up creating a TAP_MUTATION
packet with a CAS of -1; corrupting the data sent to the TAP consumer
(replica, backup, etc).

Fix by adding a new option to get() to make the CAS 'hiding' optional,
and choose to not hide it for the TapProducer.

Change-Id: I4e9f4963e77437c9dc5a9ebb482c727e8ef4beb7
Reviewed-on: http://review.couchbase.org/60801
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years ago[BP] MB-17517: Check for and disard an invalid max_cas in _local/vbstate 95/60795/3
Dave Rigby [Wed, 2 Mar 2016 16:43:41 +0000 (16:43 +0000)]
[BP] MB-17517: Check for and disard an invalid max_cas in _local/vbstate

If there is an invalid max CAS value on disk (in _local/vbstate),
discard it (resetting it to zero). Instead will need to rebuild the
max CAS value from the items we load from the file.

Change-Id: Ibe283160ab0b477b1a2a91985036269ce1d590af
Reviewed-on: http://review.couchbase.org/60795
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMerge "Merge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock'"...
Chiyoung Seo [Tue, 1 Mar 2016 17:59:44 +0000 (17:59 +0000)]
Merge "Merge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock'" into sherlock

5 years agoMB-17024: Add more logs during bucket deletion 96/60596/2
Sriram Ganesan [Sat, 27 Feb 2016 02:30:02 +0000 (18:30 -0800)]
MB-17024: Add more logs during bucket deletion

Add more logs and change certain logs to WARNING level while these
tasks are stopped during bucket deletion

Change-Id: Ia2ada9d8525cec4c3e50ed087d2d052a05663873
Reviewed-on: http://review.couchbase.org/60596
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
5 years agoMerge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock' 11/60711/1
Jim Walker [Tue, 1 Mar 2016 12:34:11 +0000 (12:34 +0000)]
Merge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock'

* couchbase/3.0.x:
  MB-17889: No notify from streamRequest
  MB-17889: Address view_query latency regression

Change-Id: Ib1a5c25fccdfddc85e6072ec1d6831569c84128f

5 years agoMB-17889: No notify from streamRequest 52/60352/3 v3.1.4
Jim Walker [Mon, 22 Feb 2016 21:01:09 +0000 (21:01 +0000)]
MB-17889: No notify from streamRequest

The 3.1.3 (i.e. before the DCP churn) didn't ever notify from
streamRequest, so that is reverted and helps to bring
view_query latency down (view engine is constantly creating
streams).

A second tweak is to not call stream->next whilst holding
the streamMutex. This can block streamRequest again
affecting the view-engine's DCP stream requests.

Change-Id: I5b57fd7998003251fb32897f37c8a2f15f687a13
Reviewed-on: http://review.couchbase.org/60352
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-17889: Address view_query latency regression 50/59750/3
Jim Walker [Wed, 10 Feb 2016 16:33:10 +0000 (16:33 +0000)]
MB-17889: Address view_query latency regression

Some of the stale=false query latency tests show a 2x increase
in the 80 and 95 percentile. This is observed when moving from
3.1.4 1835 to 1836.

The prime suspect is that the snapshot yield parameter is now
stalling the view engine's DCP data.

In the change from 1835 to 1836 this value was tuned based upon
queue lengths observed during rebalance, moving from 10 to 256.

It could be that the task now runs for longer periods blocking
DCP backfill from running, and view engine drives many backfills
due to the way it frequently closes and opens streams.

Note that DCP Backfill and the snapshot task both share the same
task type, so can block each other.

This patch moves this config value back to 10 (as it was in 1835).

The view-query latency is very difficult to relibably reproduce, observe
and tune, but there is evidence (a trend) that with this config value
at 10, the performance (view latency) is improved.

Some small scale rebalance tests (3 node cluster, swap 1 node for 1)
showed that with 10 rebalance was not adversly affected, but it's a risk.

A latency comparison is attached to the MB which hints that
the latency is better.

https://issues.couchbase.com/secure/attachment/29513/29513_benchmark.png

Change-Id: I6ecf8ff950f77638eb03e4fedaefb700cf945d54
Reviewed-on: http://review.couchbase.org/59750
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMerge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock' 11/60211/1
Manu Dhundi [Thu, 18 Feb 2016 20:04:11 +0000 (12:04 -0800)]
Merge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock'

*
|\
| * 61ab952 2016-02-18 | MB-17889: DCP producer can leave an operation behind

Change-Id: Iad7e13aef50018432bedc17f776984c7a21a4cb4

5 years agoMB-17889: DCP producer can leave an operation behind 96/60196/3
Jim Walker [Thu, 18 Feb 2016 15:14:28 +0000 (15:14 +0000)]
MB-17889: DCP producer can leave an operation behind

In low-traffic setups there's a case in DcpProducer::getNextItem
where the function exits and pauses the task, yet an item was
waiting to be sent.

getNextItem() looked like

  1. setpaused=false;
  2. while(ready.pop(vbucket)) {
  3.   process(vbucket);
  4. }
  5. setpaused=true;
  6. return NULL;

The notifier

  a. if(ready.pushUnique(vbucket))
  b.   wakeupIfPaused;

If a and b execute between 4 and 5, then the producer will sleep
and not process the vbucket until the next wakeup (which maybe never).

This is not good, the first operation will have a long latency before
it can be seen on DCP. As long as it takes for the second operation.

If a and b occur between 5 and 6, that's fine, wakupIfPaused will re-wake
the producer.

a,b,5,6 is bad
5,a,6,b is ok
5,6,a,b is ok
5,a,b,6 is ok
5,6,a,b is ok
...

The fix.

getNextItem()

  0. do {
  1.   setpaused=false;
  2.   while(ready.pop(vbucket)) {
  3.     process(vbucket);
  4.   }
  5.   setpaused=true;
  6.  } while(!ready.empty());
  7. return NULL;

Now if ab occurs after 4, but before 5, it's ok as 6 will now consume
the vbucket.

5,a,b,6 is ok, as 6 will loop and consume
5,a,6,b is ok, "    "    "    "    "
6,a,b,7 is ok, paused is true (5), b will wake the task

Change-Id: Ib412a85ee10de0e2a2ca4116d0cc85bbad538da2
Reviewed-on: http://review.couchbase.org/60196
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMerge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock' 37/60137/1
abhinavdangeti [Wed, 17 Feb 2016 18:10:54 +0000 (10:10 -0800)]
Merge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock'

couchbase/3.0.x:
|\
| * 37d53b1 MB-18171: Break cyclic reference between ActiveStream & ChkptProcesser
| * 928ba39 MB-17766: Set maxNumAuxIO in stream_test to zero
| * 9ac7fd0 [BP] MB-17766: Fix intermittant stream_test failure

Change-Id: I5a01bbf360bd5e59a91ae8400f3de14cc50f5911

5 years agoMB-18171: Break cyclic reference between ActiveStream & ChkptProcesser 60/60060/4
abhinavdangeti [Tue, 16 Feb 2016 18:49:09 +0000 (10:49 -0800)]
MB-18171: Break cyclic reference between ActiveStream & ChkptProcesser

Removing circular dependency between ActiveStream and
ActiveStreamCheckpointProcesserTask where each holds a reference
to the other causing a memory leak during shutdown.

Also explicitly clear the queues of checkpointProcessor task upon
disconnection of the DcpProducer, so as to remove a cyclic reference
between DcpProducer, ActiveStream, and ActiveStreamCheckpointProcesserTask.

Change-Id: Ifac03a40132431476a6b5000725ce972068b47f4
Reviewed-on: http://review.couchbase.org/60060
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-17766: Set maxNumAuxIO in stream_test to zero 59/60059/2
abhinavdangeti [Tue, 16 Feb 2016 19:12:53 +0000 (11:12 -0800)]
MB-17766: Set maxNumAuxIO in stream_test to zero

Setting maxNumAuxIO to zero will ensure that the producer's
ActiveStreamCheckpointProcesserTask will never run causing
unexpected results in the test context.

Change-Id: I5e7f4b18b1b72af1f99e83cadc5ee979dbcd4cae
Reviewed-on: http://review.couchbase.org/60059
Tested-by: buildbot <build@couchbase.com>
Well-Formed: Hari Kodungallur <hari.kodungallur@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years ago[BP] MB-17766: Fix intermittant stream_test failure 58/60058/2
Dave Rigby [Tue, 16 Feb 2016 12:54:38 +0000 (12:54 +0000)]
[BP] MB-17766: Fix intermittant stream_test failure

Address two issues:

1) end sequence numbers were incorrect, which could result in not
   having any items in our cursor.
2) Don't check CheckpointMamager::registerCursor() return falue, we
   don't actually care if any other cursors are already registered for
   a given checkpoint (persistence cursor sometimes registers before
   us).

Change-Id: I1145d5fb61c0c12f019154c979afdd50b4060509
Reviewed-on: http://review.couchbase.org/60058
Tested-by: buildbot <build@couchbase.com>
Well-Formed: Hari Kodungallur <hari.kodungallur@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMerge branch 'couchbase/3.0.x' into 'couchbase/sherlock' 66/59866/1
abhinavdangeti [Fri, 12 Feb 2016 17:13:58 +0000 (09:13 -0800)]
Merge branch 'couchbase/3.0.x' into 'couchbase/sherlock'

couchbase/3.0.x:
|\
| * b84d09d MB-17766: Regression test that checks for race during takeover
| * ba305c4 MB-17766: Incorrect ordering of messages during ActiveStream's takeover-send phase
| * 4f39683 MB-17766: Avoid copy overhead of std::deque in getOutstandingItems
| * e3f4855 MB-17766: Refactor nextCheckpointItemTask to allow testing

Change-Id: I57a2aa37abc4ab60f09648bd7b02740b4bf933e6

5 years agoMerge branch 'couchbase/3.0.x' into 'couchbase/sherlock' 65/59765/11
abhinavdangeti [Fri, 12 Feb 2016 16:54:56 +0000 (08:54 -0800)]
Merge branch 'couchbase/3.0.x' into 'couchbase/sherlock'

couchbase/3.0.x:
|\
| * 0da7d42 MB-17885: Address compilation errors in ep_testsuite.cc
| * b7ee24c MB-17885: Update flow control bytesSent correctly on DCP producer

Change-Id: I70cda64395781a433a8e40720bdc5c75f5d0e3c2

5 years agoMB-17766: Regression test that checks for race during takeover 66/59666/8
Dave Rigby [Tue, 9 Feb 2016 18:21:25 +0000 (18:21 +0000)]
MB-17766: Regression test that checks for race during takeover

Module test: ep-engine_stream_test

Change-Id: I8e11722b1ed1029c8b969dcb88000c5903fbb0ca
Reviewed-on: http://review.couchbase.org/59666
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-17766: Incorrect ordering of messages during ActiveStream's takeover-send phase 27/59627/7
abhinavdangeti [Tue, 9 Feb 2016 20:18:01 +0000 (12:18 -0800)]
MB-17766: Incorrect ordering of messages during ActiveStream's takeover-send phase

A race between the step() and ActiveStreamCheckpointProcessorTask
can cause mutations to be queued into the readyQ after the
setVBucketState(active) message.

Here's the scenario in chronology (T: Front-end thread, BT: IO thread):
1. T1: ActiveStream::setVBucketStateAckRecieved()
2. T1: transitionState(takeoverSend) => schedules ActiveStreamCheckpointProcessorTask
3. BT1: manageConnections() notifies memcached about specific conn (max idle time: 5s)
4. BT2: ActiveStreamCheckpointProcessorTask runs, gets all Items For Cursor
5. T1: step() -> takeoverSendPhase() -> readyQ is empty
        => nextCheckpointItem() return false, as getNumItemsForCursor returns 0
        => setVbucketState(active) added to readyQ
6. BT2: ActiveStreamCheckpointProcessorTask continues, adds mutations acquired into readyQ
        -> Note the mutations were acquired in step 4
        => Notified memcached connections
7. T1: step() .. ships messages in incorrect order

On the new master, the vbucket is promoted to active state and then more
mutations are received from the old master. If there were front end ops
at this time, there could be an inconsistency in highSeqno or in worst
cases crashes in checkpoint manager due to highSeqno not belonging in
the designated range.

The fix: Add an atomic flag that is also checked for along with
getNumItemsForCursor in nextCheckpointItem(). This flag is set before
retrieving all items for a cursor (getAllItemsForCursor) and unset after
all the retrieved items have been added to the ready queue of the stream.

Change-Id: I5c04d47cc99c7dd3b2d87cb68dd30d36473226e5
Reviewed-on: http://review.couchbase.org/59627
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-17766: Avoid copy overhead of std::deque in getOutstandingItems 54/59754/2
abhinavdangeti [Wed, 10 Feb 2016 18:03:03 +0000 (10:03 -0800)]
MB-17766: Avoid copy overhead of std::deque in getOutstandingItems

Change-Id: I771182bd54a0f702f70287ff4728d26b7ffaa323
Reviewed-on: http://review.couchbase.org/59754
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-17766: Refactor nextCheckpointItemTask to allow testing 64/59664/4
Dave Rigby [Tue, 9 Feb 2016 13:33:38 +0000 (13:33 +0000)]
MB-17766: Refactor nextCheckpointItemTask to allow testing

Split nextCheckpointItemTask() into two inner (protected) functions,
to allow testing of the fix for MB-17766.

Change-Id: I9d441d873cf7f727f90a966d4dda03043c7f6480
Reviewed-on: http://review.couchbase.org/59664
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-17885: Address compilation errors in ep_testsuite.cc 78/59678/4
abhinavdangeti [Tue, 9 Feb 2016 22:27:10 +0000 (14:27 -0800)]
MB-17885: Address compilation errors in ep_testsuite.cc

3.0.x don't support C++11!

Change-Id: Ia3adf0c7ace9b771b999427811f0872774740386
Reviewed-on: http://review.couchbase.org/59678
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>

5 years agoMB-17517: Regenerate CAS for TAP/DCP mutations with invalid CAS 97/59597/4
Dave Rigby [Fri, 5 Feb 2016 11:47:27 +0000 (11:47 +0000)]
MB-17517: Regenerate CAS for TAP/DCP mutations with invalid CAS

When receiving mutations via TAP or DCP, validate the CAS if invalid
generate a new one.

This is instead of the simply dropping the mutaiton (returning an
error to the producer), as by dropping the mutation we could cause
data loss.

Change-Id: I3183aab7c5eecb090ccc560319a7aac915cb35b8
Reviewed-on: http://review.couchbase.org/59597
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-17885: Update flow control bytesSent correctly on DCP producer 75/59575/4
Manu Dhundi [Mon, 8 Feb 2016 17:42:25 +0000 (09:42 -0800)]
MB-17885: Update flow control bytesSent correctly on DCP producer

This is a fix for a regression introduced recently. Also this adds
a DCP test case to test flow control behavior of DCP producer.

Change-Id: Ia56858cb9e687a0a045b582c18e4b68948cb460c
Reviewed-on: http://review.couchbase.org/59575
Reviewed-by: Jim Walker <jim@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16910: Stop logging multiple 'warmup is complete' messages 98/58898/8
Norair Khachiyan [Wed, 27 Jan 2016 23:54:30 +0000 (15:54 -0800)]
MB-16910: Stop logging multiple 'warmup is complete' messages

Fix prevents logging more than one 'Engine warmup is complete' message
for each bucket. Fix resolves problem of overloading log file with the
message mentioned above.

Change-Id: Icfdeb5def6e3f055159e1a57430c6dc661382e30
Reviewed-on: http://review.couchbase.org/58898
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMerge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock' 46/59146/1
abhinavdangeti [Wed, 27 Jan 2016 22:40:20 +0000 (14:40 -0800)]
Merge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock'

couchbase/3.0.x:
|\
| * ae39d7f MB-17502: DCP performance regression fixed.

Change-Id: Id3262008479ffe7e350a9a4d88438ed541ac242a

5 years agoMB-17502: DCP performance regression fixed. 86/58886/9
Jim Walker [Wed, 6 Jan 2016 15:40:57 +0000 (15:40 +0000)]
MB-17502: DCP performance regression fixed.

Many patches were added to speed up DCP, however some of that
performance was lost when doing some code tidying without
re-profiling.

With all the DCP performance patches (particuarly) 87869fd39 straight
DCP performance is a touch slower. This is because DCP used to take
one lock and then do work. The new code has more locks, but holds
them for fewer lines of code. This means that DCP is friendlier/fairer
to the other threads interacting with a DCP producer.
The front-end operation threads are no longer stalled for long periods
whilst DCP holds the one lock.

Frontend latency before locking changes:

 === Latency [With background DCP] - 100000 items
                                 Percentile
                   Median     95th     99th  Std Dev
 Add               16.337   34.894   45.241   25.627
 Get                1.226    1.524    1.745    0.435
 Replace           16.311   34.386   42.097    8.435
 Delete            15.636   32.915   41.999    7.408

Frontend latency after locking changes:

 === Latency [With background DCP] - 100000 items
                                 Percentile
                   Median     95th     99th  Std Dev
 Add                3.996   12.159   20.724   11.376
 Get                1.299    1.629    1.730    0.634
 Replace            4.274   12.831   22.988    4.523
 Delete             3.142   10.302   14.292    3.350

The average and 95th/99th are all improved.

Fix details:

The roundRobin/vbReady code has a bufferLog.pauseIfFull call on the
"hot" part of the loop, this is the main cause of the regression.

With that fixed CPU profiling and benchmarking shows that DCP is back
to 3.1.3 levels but highlighted that:

1. DcpProducer::getNextItem was hot (5% of a DcpProducer thread).
2. DcpConsumer::processBufferedItems was hitting SpinLock hard.
   20 to 30% at times was consumed by SpinLock code.
3. snapshot creation was frequently yielding even though it had work todo.

So to address 1. the fix is actually to remove the roundRobin/vbReady
code. It is actually no better and in some cases a little slower than
the orginal. This code is replaces with std:: structures *but* the
Mutex used has a much smaller scope.

Note the DcpProducerReadyQueue has been profiled and proven that having
the std::map powering find() is much faster than searching the list.
This is important because the find method is part of the front-end
operation thread.

To address 2. it was observed that the consumer code is constructing
a passive_stream_t frequently, then testing if there is a pointer.
The construction uses the SpinLock code and can be avoided just by
testing the streams[vb] directly and only then do we construct
a copy of the passive_stream_t. This avoids the SpinLock code on
every iteration of the for loop in the affected function.

To address 3. ensure that the snapshot tasks work queue doesn't have
duplicates, there's no need. Then raise the number of snapshots before
yield. Various rebalances showed that around 250 was enough, so let's go
with 256.

Change-Id: I8fb0bd30f8e07d000192675de425726ad26e403a
Reviewed-on: http://review.couchbase.org/58886
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-17517: check for invalid CAS value in [set/delete]WithMeta 10/58910/3
Sriram Ganesan [Thu, 21 Jan 2016 20:00:36 +0000 (12:00 -0800)]
MB-17517: check for invalid CAS value in [set/delete]WithMeta

In a [set/delete]WithMeta call from either XDCR or from DCP,
a corrupt CAS value for the incoming item should return an
error

Change-Id: Id87627ae35ef711de4704a960b3aa7d1e9b9a48b
Reviewed-on: http://review.couchbase.org/58910
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-17517: return EINVAL instead of assert in arithmetic 68/58868/3
Sriram Ganesan [Thu, 21 Jan 2016 02:15:56 +0000 (18:15 -0800)]
MB-17517: return EINVAL instead of assert in arithmetic

If a get performed on an item returns a CAS value of zero, then
return EINVAL as opposed to asserting

Change-Id: If3d43c270bcc627029d0954dab0e570c83ddca74
Reviewed-on: http://review.couchbase.org/58868
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMerge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock' 36/58536/1
Jim Walker [Tue, 12 Jan 2016 14:18:50 +0000 (14:18 +0000)]
Merge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock'

* couchbase/3.0.x:
  MB-17086: Fix to performance regression.
  MB-16632: Use a background task to handle snapshot creation

Change-Id: Ib018040e29c699316b5519a1325fff1476b803eb

5 years agoMB-17289: Update existing temp item CAS to incoming CAS in deleteWithMeta 85/58385/9
Sriram Ganesan [Thu, 7 Jan 2016 18:38:38 +0000 (10:38 -0800)]
MB-17289: Update existing temp item CAS to incoming CAS in deleteWithMeta

If the incoming CAS is non-zero, a comparison is made with an
existing item CAS to check if the delete can be done or not.
But in case a temp existing item has to be persisted, then
update the existing temp item CAS to the incoming CAS in order
to bypass the above-mentioned comparison only for forced
deletions

Change-Id: Idca2615a7e3166e98b26c9d447b1161742226436
Reviewed-on: http://review.couchbase.org/58385
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-17086: Fix to performance regression. 80/58380/4
Jim Walker [Fri, 8 Jan 2016 14:02:43 +0000 (14:02 +0000)]
MB-17086: Fix to performance regression.

Revert "MB-16632: As part of queueDirty schedule a DCP connections notifier task"

This reverts commit fa17728e7ca0c637c84a2208b5decfe7ba7e54f1.

Performance testing showed that a regression has been introduced and that
fa17728 was the cause.

The regression was introduced by some fixes made during review that weren't
re-profiled.

Performance can be improved by making some further changes but the investigation
revealed that performance is actually at its best without fa17728.

Change-Id: I7ac3ff49d0b9ce8563f3a932dd337a58d03a0153
Reviewed-on: http://review.couchbase.org/58380
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16632: Use a background task to handle snapshot creation 48/57148/17
Jim Walker [Thu, 12 Nov 2015 11:14:11 +0000 (11:14 +0000)]
MB-16632: Use a background task to handle snapshot creation

Frontend threads are delayed by large snaphots due to the time taken
in processing the items into the readyQ.

Moving this work to a background task frees frontend threads to
do other work.

Change-Id: Ic399ef06be996b7b7e179c4c8934a0f5a74cb8f7
Reviewed-on: http://review.couchbase.org/57148
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-17231: conditionally delete temp items in get API 94/58294/4
Sriram Ganesan [Tue, 5 Jan 2016 23:26:59 +0000 (15:26 -0800)]
MB-17231: conditionally delete temp items in get API

There are some conditions in which there might be a need to retain
a temporary non-existent item. For example, in the context of an
arithmetic API call, a get call in full eviction results in a
temp non-existent item that needs to be retained in order for a
subsequent ADD operation to not perform another background fetch.

Change-Id: I357d28a91b1e7ce56006accf919e48e99a643a2e
Reviewed-on: http://review.couchbase.org/58294
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoLog DCP stream creation before closing the stream 15/56615/6
Mike Wiederhold [Mon, 2 Nov 2015 23:27:48 +0000 (15:27 -0800)]
Log DCP stream creation before closing the stream

Th endStream() function emits a log message saying the stream was
closed. This log message should always come after the one that says
the stream was created.

Change-Id: Ibcd259373f5675cf49ea7a95ba3f2276507abe7a
Reviewed-on: http://review.couchbase.org/56615
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMerge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock' 47/58247/2
abhinavdangeti [Tue, 5 Jan 2016 02:20:06 +0000 (18:20 -0800)]
Merge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock'

couchbase/3.0.x:
|\
| * e338e1b MB-17220: [BP] Seperate logs for notifying seqno/checkpoint persistence
| * 02064cd MB-17168: Log lastSentSeqno during takeover state change
| * 380307d MB-16656: Set the open chkpt id on replica to 0 when disk snapshot is recvd.
| * 6f869d1 MB-16656: Stream a full (disk+mem) snapshot from DCP producer on replica vb

Change-Id: I92464ddfa7a20e1db12d2ab432b4072f6f57c8d8

5 years agoMB-17220: [BP] Seperate logs for notifying seqno/checkpoint persistence 33/58233/3
abhinavdangeti [Tue, 1 Dec 2015 19:08:41 +0000 (11:08 -0800)]
MB-17220: [BP] Seperate logs for notifying seqno/checkpoint persistence

- Print different logs while notifying completion or timeouts
during seqno persistence and checkpoint persistence.
- Also adding additional information to the logs.

Change-Id: Idf29cab2197f37b180b0295b19f6b46542bdc6b6
Reviewed-on: http://review.couchbase.org/58233
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-17088: Fix the underflow in certain vb stats. 67/58067/3
Manu Dhundi [Mon, 4 Jan 2016 19:43:32 +0000 (11:43 -0800)]
MB-17088: Fix the underflow in certain vb stats.

Fix the potential underflow bug in dirtyQueueMem, dirtyQueueAge and
dirtyQueuePendingWrites stats.

Change-Id: If5df08e709404399b4e4595f150f5c4616897278
Reviewed-on: http://review.couchbase.org/58067
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-17168: Log lastSentSeqno during takeover state change 17/58217/3
abhinavdangeti [Mon, 4 Jan 2016 17:43:01 +0000 (09:43 -0800)]
MB-17168: Log lastSentSeqno during takeover state change

When an active vbucket state is changed to dead as part
of takeover, log a message that would indicate the last
sent seqno for the vbucket on the particular stream and
the vbucket's high seqno.

Change-Id: I7097b79cf41b2c62688ddb9345bc529ac08b2223
Reviewed-on: http://review.couchbase.org/58217
Reviewed-by: Sriram Ganesan <sriram@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16656: Set the open chkpt id on replica to 0 when disk snapshot is recvd. 34/57834/4
Manu Dhundi [Sat, 19 Dec 2015 02:15:16 +0000 (18:15 -0800)]
MB-16656: Set the open chkpt id on replica to 0 when disk snapshot is recvd.

Currently due to a bug in 3.0.x the open checkpoint id is not set to 0
when replica receives a disk snapshot from active.

Change-Id: Iffda89b8da713539a52d50aa4acc33458ae7150e
Reviewed-on: http://review.couchbase.org/57834
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-16656: Stream a full (disk+mem) snapshot from DCP producer on replica vb 16/57816/7
Manu Dhundi [Mon, 21 Dec 2015 21:42:07 +0000 (13:42 -0800)]
MB-16656: Stream a full (disk+mem) snapshot from DCP producer on replica vb

A replica vbucket receives items from an active vbucket, and until a full
snapshot is received the data on the replica vbucket is not consistent due
to de-duplication and other reasons. Hence while streaming items to a DCP
client from a replica vbucket we need to combine backfill and in memory
snapshots and send items in one snapshot. A caveat here is the replica vb
might not have received all the items in the latest (memory) snapshot, so the
DCP client streaming from replica will have to wait till the replica gets
all the items in the latest snapshot from the active.

Change-Id: I4db622f967316d120506dc9b125211578194bb60
Reviewed-on: http://review.couchbase.org/57816
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
5 years agoMerge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock' 20/58020/1
abhinavdangeti [Mon, 21 Dec 2015 23:34:48 +0000 (15:34 -0800)]
Merge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock'

couchbase/3.0.x:
* 9b12d0f [DcpStream] Removing extra exception/abort that was added recently

Change-Id: Ied21f9fb093c403cbcef44fcc6d2c9229030a2ad

5 years ago[DcpStream] Removing extra exception/abort that was added recently 62/57962/3
abhinavdangeti [Mon, 21 Dec 2015 20:03:40 +0000 (12:03 -0800)]
[DcpStream] Removing extra exception/abort that was added recently

Exceptions in 3.0.x are unhandled which makes them pretty
much the same as aborts/asserts.

Although it is impossible for the event where an active stream
enters STREAM_READING state to occur , it may be the better
thing to do - to have the risk of hitting this assertion be ZERO
for the maintainance releases only.

Change-Id: I0a1eff5ab6c8cec8ad6d97e9a1c2201844c25fbd
Reviewed-on: http://review.couchbase.org/57962
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Sriram Ganesan <sriram@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoLog destruction of Passive stream if it were live 61/57861/3
abhinavdangeti [Wed, 16 Dec 2015 17:26:09 +0000 (09:26 -0800)]
Log destruction of Passive stream if it were live

Transition state of passive stream to dead in its
destructor, and log a message if we're destroying
a live stream.

Change-Id: I39651ee022a321048409345b9d987dc7c15003cf
Reviewed-on: http://review.couchbase.org/57861
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Reviewed-by: Sundararaman Sridharan <sundar@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMerge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock' 61/57961/1
abhinavdangeti [Fri, 18 Dec 2015 17:22:45 +0000 (09:22 -0800)]
Merge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock'

couchbase/3.0.x:
* 5ffc61f MB-17051: [DcpProducer] Ensure no un-notified streams are left behind
* 5107095 [DcpProducer] Refactor function name to indicate intent

Change-Id: I9c6d50e478846aab1719575e97efe3106cbe8425

5 years agoMB-17051: [DcpProducer] Ensure no un-notified streams are left behind 67/57867/9
abhinavdangeti [Thu, 17 Dec 2015 18:57:56 +0000 (10:57 -0800)]
MB-17051: [DcpProducer] Ensure no un-notified streams are left behind

Reiterate vbReady list at the end of a DcpProducer step to
ensure un-notified vbuckets are not left unprocessed.

Change-Id: I21065cf99f8be0af6dedf506237ce3dbe683387d
Reviewed-on: http://review.couchbase.org/57867
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years ago[DcpProducer] Refactor function name to indicate intent 72/57872/4
abhinavdangeti [Thu, 17 Dec 2015 03:56:44 +0000 (19:56 -0800)]
[DcpProducer] Refactor function name to indicate intent

unpauseIfSpace --> unpauseIfSpaceAvailable

Change-Id: Ifb4ec181e2228d819ab460bd03eccfefd75c48d6
Reviewed-on: http://review.couchbase.org/57872
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16949: BackfillManager: Fix leak due to circular reference 17/57417/5
Dave Rigby [Wed, 2 Dec 2015 14:55:32 +0000 (14:55 +0000)]
MB-16949: BackfillManager: Fix leak due to circular reference

There is a circular reference between BackfillManager and DcpProducer,
which means that BackfillManager is never deleted. This is caused by
BackfillManager holding an (unused) SingleThreadedRCPtr to
DcpProducer.

Fix this by Removing this unused smart pointer. Unfortunately exposes
a latent deadlock when destructing BackfillManager - ~BackfillManager
(indirectly) calls ~ActiveStream, which in turn calls
BackfillManager::bytesSent() which attempts to acquire the Backfill
lock; however the lock is already acquired in ~BackfillManager.

Fix this by removing the call to bytesSent() in ~ActiveStream - as
it's already been updated in other methods.

The now correct deletion of DcpProducer and the smart-pointer
triggered deletion of ActiveStream leads to a lock-inversion
problem in that different locks are being acquired in different
orders on the possible deletion paths.

Fix this by removing the lock acquisition from destructors.
The locks in destructors are providing no protection, no
code should be refering to the deleting object, they risk
accessing freed memory with or without the locks.

Change-Id: Id79fbf8bc39463d4874c2f61378b7f110345631b
Reviewed-on: http://review.couchbase.org/57417
Reviewed-by: Jim Walker <jim@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMerge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock' 33/57733/1
abhinavdangeti [Fri, 11 Dec 2015 19:36:51 +0000 (11:36 -0800)]
Merge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock'

couchbase/3.0.x:
* c8ce24b Fix heap-use-after-free issue detected by thread sanitizer

Change-Id: I00298e6a311e87d11fb8dd15adace2ddbfdfcc3f

5 years agoFix heap-use-after-free issue detected by thread sanitizer 99/57699/5
abhinavdangeti [Thu, 10 Dec 2015 22:23:44 +0000 (14:23 -0800)]
Fix heap-use-after-free issue detected by thread sanitizer

No need to stop Producer Notififer in the destructor of
dcpConnMap. This is already taken care of when the executor
pool is unregistered.

WARNING: ThreadSanitizer: heap-use-after-free (pid=158780)
  Read of size 8 at 0x7d180000c1a0 by main thread:
    #0 DcpConnMap::~DcpConnMap() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/tasks.h:103 (ep.so+0x0000000453e1)
    #1 DcpConnMap::~DcpConnMap() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/connmap.cc:954 (ep.so+0x0000000456f5)
    #2 EventuallyPersistentEngine::~EventuallyPersistentEngine() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/ep_engine.cc:6410 (ep.so+0x0000000d0e5c)
    #3 EvpDestroy(engine_interface*, bool) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/ep_engine.cc:147 (ep.so+0x0000000b27f7)
    #4 mock_destroy(engine_interface*, bool) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/memcached/programs/engine_testapp/engine_testapp.cc:99 (engine_testapp+0x0000004cbd97)
    #5 destroy_bucket(engine_interface*, engine_interface_v1*, bool) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/memcached/programs/engine_testapp/engine_testapp.cc:996 (engine_testapp+0x0000004cbc19)
    #6 perf_latency_baseline_multi_thread_bucket(test*, int, int, int) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/tests/ep_perfsuite.cc:386 (ep_perfsuite.so+0x00000000dfc4)
    #7 perf_latency_baseline_multi_bucket_4(test*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/tests/ep_perfsuite.cc:429 (ep_perfsuite.so+0x0000000091ef)
    #8 execute_test(test, char const*, char const*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/memcached/programs/engine_testapp/engine_testapp.cc:1104 (engine_testapp+0x0000004cb21c)
    #9 __libc_start_main /build/buildd/eglibc-2.15/csu/libc-start.c:226 (libc.so.6+0x00000002176c)

  Previous write of size 8 at 0x7d180000c1a0 by thread T15 (mutexes: write M11751):
    #0 operator delete(void*) <null> (engine_testapp+0x0000004641db)
    #1 DcpConnMap::DcpProducerNotifier::~DcpProducerNotifier() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/connmap.h:530 (ep.so+0x00000004ab85)
    #2 ExecutorThread::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/atomic.h:325 (ep.so+0x0000000f17cb)
    #3 launch_executor_thread(void*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000f15f5)
    #4 platform_thread_wrap(void*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/platform/src/cb_pthreads.cc:54 (libplatform.so.0.1.0+0x000000004e7b)

Change-Id: Ib458d0826cc33b4b233da5a422b90bcf08d408bb
Reviewed-on: http://review.couchbase.org/57699
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 26/57626/4
Dave Rigby [Thu, 10 Dec 2015 09:02:23 +0000 (09:02 +0000)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

* couchbase/3.0.x:
  MB-17006: [BP] DCP Producer could miss fetching items from a stream
  Fix test case test_dcp_early_termination
  Fix compilation issue on windows
  [BP] MB-16915: Remove cyclic reference between DcpConsumer and PassiveStream.
  [BP] MB-16915: RollbackTask to hold ref count ptr for DCP consumer instead of raw ptr
  [BP] MB-16915: Use refcounted pointers on producer/consumer
  MB-16915: Remove cyclic reference between DcpConsumer and PassiveStream.
  MB-16915: RollbackTask to hold ref count ptr for DCP consumer instead of raw ptr
  MB-16632: As part of queueDirty schedule a DCP connections notifier task

Change-Id: Iaf2ae12b97c92fd99b3b967cc47f8654b44dfc28

5 years agoMerge branch 'couchbase/3.0.x' into couchbase/sherlock 25/57625/2
Dave Rigby [Thu, 10 Dec 2015 08:48:03 +0000 (08:48 +0000)]
Merge branch 'couchbase/3.0.x' into couchbase/sherlock

* 3.0.x:
  MB-16915: Use refcounted pointers on producer/consumer
  MB-16632: Replace std::list with std:deque in DCP checkpoint processing

Change-Id: Ic754e821b486361640a5f409d589c0e61cf93847

5 years agoMB-17006: [BP] DCP Producer could miss fetching items from a stream 41/57641/5
abhinavdangeti [Mon, 7 Dec 2015 21:02:55 +0000 (13:02 -0800)]
MB-17006: [BP] DCP Producer could miss fetching items from a stream

Here's the scenario:
1. Stream currently in backfill phase
2. When backfill is received, 1 item added to readyQ
    a. itemsReady of stream set to true (Producer notified)
3. Front end op comes in
    a. item added to checkpoint queue
    b. itemsReady not set to true, as it already is (Producer not
      notified)
4. Producer calls stream::next()
    a. stream in backfillPhase(): 1 item popped from readyQ
    b. backfill task still running => no state transition to IN_MEMORY
    c. 1 op returned to producer, producer re-adds vbucket to ready list
5. Backfill completes
6. Producer calls stream::next()
    a. stream in backfillPhase(): no items in readyQ
    b. As backfill task has completed, state transitions to IN_MEMORY
    c. no items in readyQ => NULL returned
    d. As no op obtained, producer doesn't re-add vbucket to ready list

=> Front end item remains stuck in checkpoint queue, until more front
end ops come in - which would notify the producer

The proposed fix here is: In step 6b, when the producer sees the
backfill task has completed, and the state for the stream transitions
to IN_MEMORY, move checkpoint items into readyQ. This way the readyQ
will not be empty, and the producer would re-add the vbucket back into
the ready list.

Change-Id: I3403d3926f97788074990ef0e4c69cac902b2a93
Reviewed-on: http://review.couchbase.org/57516
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Change-Id: I0821402fefd01f0851572d7c22ccee5fc065778d
Reviewed-on: http://review.couchbase.org/57641
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoFix test case test_dcp_early_termination 42/57642/5
abhinavdangeti [Wed, 9 Dec 2015 19:42:43 +0000 (11:42 -0800)]
Fix test case test_dcp_early_termination

Account for tasks that are already in the future queue of the
auxIO dispatcher to ensure all DCP backfill tasks (auxIO) have
completed.

Change-Id: I9544a79436193f3ef42b08a2b6615eb4be4792ce
Reviewed-on: http://review.couchbase.org/57642
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMerge branch 'couchbase/3.0.x' into couchbase/sherlock 24/57624/2
Dave Rigby [Wed, 9 Dec 2015 11:19:26 +0000 (11:19 +0000)]
Merge branch 'couchbase/3.0.x' into couchbase/sherlock

* 3.0.x:
  MB-16632: Reducing locking contention in DCP-Producer/Stream
  Incorrect log paramenters while logging backfill completion

Change-Id: I29a3aa4b0a2a8d24b9fe94a566354da9a4304f5c

5 years agoMerge remote-tracking branch 'couchbase/3.1.3' into 'couchbase/3.0.x' 19/57619/1
Dave Rigby [Wed, 9 Dec 2015 10:12:25 +0000 (10:12 +0000)]
Merge remote-tracking branch 'couchbase/3.1.3' into 'couchbase/3.0.x'

* couchbase/3.1.3:
  Fix compilation issue on windows
  [BP] MB-16915: Remove cyclic reference between DcpConsumer and PassiveStream.
  [BP] MB-16915: RollbackTask to hold ref count ptr for DCP consumer instead of raw ptr
  [BP] MB-16915: Use refcounted pointers on producer/consumer

Change-Id: Ied8b262fef0eb06671277524e17f0b6cbf7acbeb

5 years agoFix compilation issue on windows 54/57454/4 3.1.3 v3.1.3
abhinavdangeti [Fri, 4 Dec 2015 00:15:47 +0000 (16:15 -0800)]
Fix compilation issue on windows

<http://factory.couchbase.com/job/win_cs_build/ws/couchbase\ep-engine\test
s\ep_testsuite.cc(4958)> : error C2782: 'void checkeqfn(T,T,const char
        ,const char ,const int)' : template parameter 'T' is ambiguous

<http://factory.couchbase.com/job/win_cs_build/ws/couchbase\ep-engine\test
s\ep_testsuite.cc(74)> : see declaration of 'checkeqfn'
could be 'unsigned __int64'
or       'unsigned long'
NMAKE : fatal error U1077:
'C:\PROGRA~2\MICROS~2.0\VC\bin\amd64\cl.exe' :
return code '0x2'

Change-Id: I9a0bf5bd74276ebe9ac6a709302704a2bab06c25
Reviewed-on: http://review.couchbase.org/57454
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
5 years ago[BP] MB-16915: Remove cyclic reference between DcpConsumer and PassiveStream. 49/57449/2
Manu Dhundi [Thu, 3 Dec 2015 21:32:35 +0000 (13:32 -0800)]
[BP] MB-16915: Remove cyclic reference between DcpConsumer and PassiveStream.

DcpConsumer holds a reference to PassiveStream and vice versa. We must
make sure that one of them (DcpConsumer here) releases the reference
to another in a function other than the object destructor.

Change-Id: I8e5c262bc5ac50342f85ba80d481987a26a7a21d
Reviewed-on: http://review.couchbase.org/57429
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-on: http://review.couchbase.org/57449
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years ago[BP] MB-16915: RollbackTask to hold ref count ptr for DCP consumer instead of raw ptr 48/57448/2
Manu Dhundi [Thu, 3 Dec 2015 01:19:51 +0000 (17:19 -0800)]
[BP] MB-16915: RollbackTask to hold ref count ptr for DCP consumer instead of raw ptr

Rollback task is spawned when a DCP consumer is asked to rollback by a DCP
producer. Rollback runs in background and there is a possibility that the DCP
consumer object gets deleted before rollback task completes. We can avoid this
if RollbackTask holds a ref counted ptr of DCP consumer instead of a raw ptr.

Change-Id: I00c1bced0ec445226e64e6f7647a3bfbfb063f94
Reviewed-on: http://review.couchbase.org/57427
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-on: http://review.couchbase.org/57448
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years ago[BP] MB-16915: Use refcounted pointers on producer/consumer 47/57447/2
Jim Walker [Mon, 30 Nov 2015 13:31:59 +0000 (13:31 +0000)]
[BP] MB-16915: Use refcounted pointers on producer/consumer

Prevents a race/crash occuring when the DcpProducer is destroyed
and there are backfill tasks running/pending.

The test case reveals the probem when run under valgrind as
a series of invalid reads of freed memory. E.g.

==40673== Thread 17:
==40673== Invalid read of size 8
==40673==    at 0x71A3CEE: DCPBackfill::run() (dcp-stream.cc:175)
==40673==    by 0x717215C: ExecutorThread::run() (executorthread.cc:110)
==40673==    by 0x7172868: launch_executor_thread (executorthread.cc:34)
==40673==    by 0x503EC67: platform_thread_wrap (cb_pthreads.c:24)
==40673==    by 0x524A181: start_thread (pthread_create.c:312)
==40673==    by 0x555A47C: clone (clone.S:111)
==40673==  Address 0x64c2380 is 48 bytes inside a block of size 384 free'd
==40673==    at 0x4C2C2BC: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==40673==    by 0x718C4ED: DcpConnMap::manageConnections() (atomic.h:430)
==40673==    by 0x71906A5: ConnManager::run() (connmap.cc:151)
==40673==    by 0x717215C: ExecutorThread::run() (executorthread.cc:110)
==40673==    by 0x7172868: launch_executor_thread (executorthread.cc:34)
==40673==    by 0x503EC67: platform_thread_wrap (cb_pthreads.c:24)
==40673==    by 0x524A181: start_thread (pthread_create.c:312)
==40673==    by 0x555A47C: clone (clone.S:111)

Change-Id: I32a7dfd10daa4565b9cbb4c8142ed8f71c13ca31
Reviewed-on: http://review.couchbase.org/57296
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-on: http://review.couchbase.org/57447
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16915: Remove cyclic reference between DcpConsumer and PassiveStream. 29/57429/5
Manu Dhundi [Thu, 3 Dec 2015 21:32:35 +0000 (13:32 -0800)]
MB-16915: Remove cyclic reference between DcpConsumer and PassiveStream.

DcpConsumer holds a reference to PassiveStream and vice versa. We must
make sure that one of them (DcpConsumer here) releases the reference
to another in a function other than the object destructor.

Change-Id: I8e5c262bc5ac50342f85ba80d481987a26a7a21d
Reviewed-on: http://review.couchbase.org/57429
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-16915: RollbackTask to hold ref count ptr for DCP consumer instead of raw ptr 27/57427/3
Manu Dhundi [Thu, 3 Dec 2015 01:19:51 +0000 (17:19 -0800)]
MB-16915: RollbackTask to hold ref count ptr for DCP consumer instead of raw ptr

Rollback task is spawned when a DCP consumer is asked to rollback by a DCP
producer. Rollback runs in background and there is a possibility that the DCP
consumer object gets deleted before rollback task completes. We can avoid this
if RollbackTask holds a ref counted ptr of DCP consumer instead of a raw ptr.

Change-Id: I00c1bced0ec445226e64e6f7647a3bfbfb063f94
Reviewed-on: http://review.couchbase.org/57427
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-16632: As part of queueDirty schedule a DCP connections notifier task 01/56301/28
abhinavdangeti [Thu, 22 Oct 2015 21:54:28 +0000 (14:54 -0700)]
MB-16632: As part of queueDirty schedule a DCP connections notifier task

This is how things are done for TAP.
This pretty much removed the notifications' lock overhead on
store/delete/(front-end) OP latencies.

Change-Id: I32c3c26daf6ea8cebeecc2a81fb1f0e957ba3e3d
Reviewed-on: http://review.couchbase.org/56301
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16915: Use refcounted pointers on producer/consumer 96/57296/11
Jim Walker [Mon, 30 Nov 2015 13:31:59 +0000 (13:31 +0000)]
MB-16915: Use refcounted pointers on producer/consumer

Prevents a race/crash occuring when the DcpProducer is destroyed
and there are backfill tasks running/pending.

The test case reveals the probem when run under valgrind as
a series of invalid reads of freed memory. E.g.

==40673== Thread 17:
==40673== Invalid read of size 8
==40673==    at 0x71A3CEE: DCPBackfill::run() (dcp-stream.cc:175)
==40673==    by 0x717215C: ExecutorThread::run() (executorthread.cc:110)
==40673==    by 0x7172868: launch_executor_thread (executorthread.cc:34)
==40673==    by 0x503EC67: platform_thread_wrap (cb_pthreads.c:24)
==40673==    by 0x524A181: start_thread (pthread_create.c:312)
==40673==    by 0x555A47C: clone (clone.S:111)
==40673==  Address 0x64c2380 is 48 bytes inside a block of size 384 free'd
==40673==    at 0x4C2C2BC: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==40673==    by 0x718C4ED: DcpConnMap::manageConnections() (atomic.h:430)
==40673==    by 0x71906A5: ConnManager::run() (connmap.cc:151)
==40673==    by 0x717215C: ExecutorThread::run() (executorthread.cc:110)
==40673==    by 0x7172868: launch_executor_thread (executorthread.cc:34)
==40673==    by 0x503EC67: platform_thread_wrap (cb_pthreads.c:24)
==40673==    by 0x524A181: start_thread (pthread_create.c:312)
==40673==    by 0x555A47C: clone (clone.S:111)

Change-Id: I32a7dfd10daa4565b9cbb4c8142ed8f71c13ca31
Reviewed-on: http://review.couchbase.org/57296
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16915: Use refcounted pointers on producer/consumer 77/57377/2
Jim Walker [Mon, 30 Nov 2015 13:31:59 +0000 (13:31 +0000)]
MB-16915: Use refcounted pointers on producer/consumer

Prevents a race/crash occuring when the DcpProducer is destroyed
and there are backfill tasks running/pending.

The test case reveals the probem when run under valgrind as
a series of invalid reads of freed memory. E.g.

==40673== Thread 17:
==40673== Invalid read of size 8
==40673==    at 0x71A3CEE: DCPBackfill::run() (dcp-stream.cc:175)
==40673==    by 0x717215C: ExecutorThread::run() (executorthread.cc:110)
==40673==    by 0x7172868: launch_executor_thread (executorthread.cc:34)
==40673==    by 0x503EC67: platform_thread_wrap (cb_pthreads.c:24)
==40673==    by 0x524A181: start_thread (pthread_create.c:312)
==40673==    by 0x555A47C: clone (clone.S:111)
==40673==  Address 0x64c2380 is 48 bytes inside a block of size 384 free'd
==40673==    at 0x4C2C2BC: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==40673==    by 0x718C4ED: DcpConnMap::manageConnections() (atomic.h:430)
==40673==    by 0x71906A5: ConnManager::run() (connmap.cc:151)
==40673==    by 0x717215C: ExecutorThread::run() (executorthread.cc:110)
==40673==    by 0x7172868: launch_executor_thread (executorthread.cc:34)
==40673==    by 0x503EC67: platform_thread_wrap (cb_pthreads.c:24)
==40673==    by 0x524A181: start_thread (pthread_create.c:312)
==40673==    by 0x555A47C: clone (clone.S:111)

Change-Id: I32a7dfd10daa4565b9cbb4c8142ed8f71c13ca31
Reviewed-on: http://review.couchbase.org/57377
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16632: Replace std::list with std:deque in DCP checkpoint processing 47/57147/8
Jim Walker [Wed, 4 Nov 2015 12:17:57 +0000 (12:17 +0000)]
MB-16632: Replace std::list with std:deque in DCP checkpoint processing

The algorithm does not need a std::list when it is implementing
nothing more than an queue.

This change brings some performance improvement to snaphot marker
creation.

Change-Id: I2f1ac82364737e9f56ff9c0c11b3cc1775b3f0d2
Reviewed-on: http://review.couchbase.org/57147
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-16632: Reducing locking contention in DCP-Producer/Stream 00/56300/15
abhinavdangeti [Thu, 22 Oct 2015 22:36:12 +0000 (15:36 -0700)]
MB-16632: Reducing locking contention in DCP-Producer/Stream

- Adding a new RWLock for streams in Producer and avoid queueLock
- Improving BufferLog and remove need for queueLock on access
- Adding an array of atomic bool for lockless vbucket ready notification
- Changing some ActiveStream variables to be atomic to allow for lockless
  updates.

Change-Id: I11c54f1058c4c8a3f013dfc858a39d17362c9531
Reviewed-on: http://review.couchbase.org/56300
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoIncorrect log paramenters while logging backfill completion 17/56417/3
abhinavdangeti [Mon, 23 Nov 2015 17:05:01 +0000 (09:05 -0800)]
Incorrect log paramenters while logging backfill completion

Change-Id: I877fd7067862f09801ffd16e7014a0c952e8c559
Reviewed-on: http://review.couchbase.org/56417
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMerge remote-tracking branch 'couchbase/3.0.x' 39/57139/1 4.1.0-MP1 v4.1.0
Manu Dhundi [Wed, 18 Nov 2015 01:45:17 +0000 (17:45 -0800)]
Merge remote-tracking branch 'couchbase/3.0.x'

couchbase/3.0.x:
  MB-16836: Reset the stat 'ep_bg_fetched' to 0 on 'cbstats reset' command

Change-Id: I0ad9b7fdcd4b37cbffa38361ec33ccc4f3eec78b

5 years agoMB-16836: Reset the stat 'ep_bg_fetched' to 0 on 'cbstats reset' command 29/57129/2
Manu Dhundi [Tue, 17 Nov 2015 22:55:18 +0000 (14:55 -0800)]
MB-16836: Reset the stat 'ep_bg_fetched' to 0 on 'cbstats reset' command

Change-Id: I444bd6c76265788d4366061d4d2b25e3c5e60518
Reviewed-on: http://review.couchbase.org/57129
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-16357: Create a variable to get correct locking scope 77/56977/2
Jim Walker [Thu, 12 Nov 2015 12:21:41 +0000 (12:21 +0000)]
MB-16357: Create a variable to get correct locking scope

A mistake in 495e00acc24 means that no variable is
created for the ReaderLockHolder, the compiler either
optimises away the lock constructor/destructor or the lock
scope is wrong.

Either way we need to create a variable.

Change-Id: I156fcf6b0df7aee99b2c33c0e6e80d3c6ef7e546
Reviewed-on: http://review.couchbase.org/56977
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
5 years agoMB-16357: Create a variable to get correct locking scope 76/56976/2
Jim Walker [Thu, 12 Nov 2015 12:04:56 +0000 (12:04 +0000)]
MB-16357: Create a variable to get correct locking scope

A mistake in 495e00acc24 means that no variable is
created for the ReaderLockHolder, the compiler either
optimises away the lock constructor/destructor or the lock
scope is wrong.

Either way we need to create a variable.

Change-Id: I642ac64d71b73d3d78207ff50d33539a06ce0e7e
Reviewed-on: http://review.couchbase.org/56976
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMerge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock' 68/56568/1
abhinavdangeti [Fri, 30 Oct 2015 19:01:06 +0000 (12:01 -0700)]
Merge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock'

couchbase/3.0.x:
* 8939f83 MB-16686: Remove sanity check while adding TAP over DCP
* 0f1ff18 MB-15171: [BP] Initialize dcpConnMap_ to NULL in engine constructor
* 44bd072 MB-14825: [BP] While trying to stream next checkpoint item, check if vbucket is valid

Change-Id: I9ec7c046aa16aeeb0999438ef0186109828f3044

5 years agoMB-16686: Remove sanity check while adding TAP over DCP 64/56564/3 v3.1.2
abhinavdangeti [Fri, 30 Oct 2015 17:11:46 +0000 (10:11 -0700)]
MB-16686: Remove sanity check while adding TAP over DCP

This check isn't accurate as certain TAP messages from
the producer carry no vbucket information - initialized to
zero (expected), as they aren't vbucket specific operations.
In such a scenario, if the TAP consumer needs to be created,
it wouldn't be allowed to if a DCP passive stream exists
for vbucket 0. This would break an online upgrade.

Change-Id: I310b9cf4dbaf652c233cba02de7ca72469efa89d
Reviewed-on: http://review.couchbase.org/56564
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16574: Log when cluster config is changed 19/56219/2
Dave Rigby [Mon, 19 Oct 2015 11:41:49 +0000 (12:41 +0100)]
MB-16574: Log when cluster config is changed

Change-Id: I0b234e6330ab3e5c0cde84c14977aff8034000d6
Reviewed-on: http://review.couchbase.org/56219
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-15171: [BP] Initialize dcpConnMap_ to NULL in engine constructor 71/56171/3
Sriram Ganesan [Thu, 28 May 2015 01:51:23 +0000 (18:51 -0700)]
MB-15171: [BP] Initialize dcpConnMap_ to NULL in engine constructor

Not initializing this variable to NULL can cause access to an
invalid pointer during engine destroy.

Change-Id: Icc5d848f7826bb6331deb40b4832efcf64622dea
Reviewed-on: http://review.couchbase.org/51492
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-on: http://review.couchbase.org/56171

5 years agoMB-14825: [BP] While trying to stream next checkpoint item, check if vbucket is valid 70/56170/3
Manu Dhundi [Thu, 7 May 2015 01:30:24 +0000 (18:30 -0700)]
MB-14825: [BP] While trying to stream next checkpoint item, check if vbucket is valid

If a vbucket is deleted in middle of a DCP connection streaming a checkpoint
item, we should handle such a scenario in a graceful manner.

Change-Id: I24fe52adc572f504f492f015f82fc8d5e0325925
Reviewed-on: http://review.couchbase.org/50674
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-on: http://review.couchbase.org/56170
Tested-by: buildbot <build@couchbase.com>
5 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 24/56124/1
abhinavdangeti [Wed, 14 Oct 2015 18:29:15 +0000 (11:29 -0700)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

couchbase/3.0.x:
* dec472a MB-16500 [BP]: Address data race in DcpConsumer, by acquiring readyMutex
* 6abf179 MB-16500 [BP]: Removing unnecessary locking in consumer code
* d388995 MB-16500 [BP]: MB-16496 Fix the race on vbucket state between persistVBState() and compactVB()
* 8828d4d MB-16500 [BP]: Address possible data race in checkpoint remover
* 52e40ed MB-16500 [BP]: Address possible data race in item/expiry pagers
* 5305a71 MB-16500 [BP]: Fix potenial deadlock around Connmap::relaseLock / connLock
* b155309 MB-16500 [BP]: Address possible data race in Notifiable: setSuspended
* 3b8a7be MB-16500 [BP]: Address possible data race in ConnHandler: lastWalkTime

Change-Id: Ie72bf67c10b0bed125a89534815474b1309763bb

5 years agoMB-16500 [BP]: Address data race in DcpConsumer, by acquiring readyMutex 65/56065/4
abhinavdangeti [Wed, 7 Oct 2015 21:49:41 +0000 (14:49 -0700)]
MB-16500 [BP]: Address data race in DcpConsumer, by acquiring readyMutex

WARNING: ThreadSanitizer: data race (pid=27652)

  Write of size 8 at 0x7d08000443c0 by main thread (mutexes: write M57876):
    #0 operator delete(void*) <null>:0 (engine_testapp+0x000000050e7b)
    #1 __gnu_cxx::new_allocator<std::_List_node<unsigned short> >::deallocate(std::_List_node<unsigned short>*, unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/ext/new_allocator.h:110 (ep.so+0x00000005d69a)
    #2 DcpConsumer::step(dcp_message_producers*) /home/abhinav/couchbase/ep-engine/src/dcp/consumer.cc:516 (ep.so+0x00000005c5cc)
    #3 EvpDcpStep(engine_interface*, void const*, dcp_message_producers*) /home/abhinav/couchbase/ep-engine/src/ep_engine.cc:1479 (ep.so+0x0000000b480b)
    #4 mock_dcp_step(engine_interface*, void const*, dcp_message_producers*) /home/abhinav/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:476 (engine_testapp+0x0000000bb055)
    #5 dcp_step(engine_interface*, engine_interface_v1*, void const*) /home/abhinav/couchbase/ep-engine/tests/ep_test_apis.cc:1219 (ep_testsuite.so+0x0000000b61bd)
    #6 test_chk_manager_rollback(engine_interface*, engine_interface_v1*) /home/abhinav/couchbase/ep-engine/tests/ep_testsuite.cc:5526 (ep_testsuite.so+0x0000000809b4)
    #7 execute_test(test, char const*, char const*) /home/abhinav/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:1090 (engine_testapp+0x0000000b952c)
    #8 __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287 (libc.so.6+0x000000021ec4)

  Previous write of size 8 at 0x7d08000443c0 by thread T16:
    #0 operator new(unsigned long) <null>:0 (engine_testapp+0x00000005090d)
    #1 __gnu_cxx::new_allocator<std::_List_node<unsigned short> >::allocate(unsigned long, void const*) /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/ext/new_allocator.h:104 (ep.so+0x00000005f265)
    #2 PassiveStream::reconnectStream(RCPtr<VBucket>&, unsigned int, unsigned long) /home/abhinav/couchbase/ep-engine/src/dcp/stream.cc:1104 (ep.so+0x000000076f5f)
    #3 DcpConsumer::doRollback(unsigned int, unsigned short, unsigned long) /home/abhinav/couchbase/ep-engine/src/dcp/consumer.cc:676 (ep.so+0x00000005db67)
    #4 RollbackTask::run() /home/abhinav/couchbase/ep-engine/src/dcp/consumer.cc:574 (ep.so+0x00000005d9d4)
    #5 ExecutorThread::run() /home/abhinav/couchbase/ep-engine/src/executorthread.cc:115 (ep.so+0x0000000f834c)
    #6 launch_executor_thread(void*) /home/abhinav/couchbase/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000f7eb5)
    #7 platform_thread_wrap /home/abhinav/couchbase/platform/src/cb_pthreads.c:23 (libplatform.so.0.1.0+0x000000003d71)

Change-Id: I196a78e54bf8014967a51cdb081126597153f77b
Reviewed-on: http://review.couchbase.org/55881
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-on: http://review.couchbase.org/56065
Reviewed-by: Dave Rigby <daver@couchbase.com>
5 years agoMB-16500 [BP]: Removing unnecessary locking in consumer code 80/56080/3
abhinavdangeti [Thu, 13 Aug 2015 18:46:35 +0000 (11:46 -0700)]
MB-16500 [BP]: Removing unnecessary locking in consumer code

streamMutex is to protect the ready list, but not the streams list.

The front end operations: addStream, closeStream, handleResponse, step
- wouldn't race with each other over the streams list, as multiple
memcached threads will not serve a single cookie.

The back end operations: processBufferedMessages (doesn't grab lock any
way), doRollback just read from streams list.

An addstream (front end op) is the only one that updates streams, and
this wouldn't update when a rollback is in progress.

Therefore, renaming the streamMutex lock in DCPConsumer to readyMutex
which is more apt for its operation - guarding the ready list.

Change-Id: Ia342d7243fef4b97b729aa94fdc64ad020711589
Reviewed-on: http://review.couchbase.org/54406
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Reviewed-on: http://review.couchbase.org/56080
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
5 years agoMB-16500 [BP]: MB-16496 Fix the race on vbucket state between persistVBState() and... 68/56068/5
Chiyoung Seo [Fri, 9 Oct 2015 18:39:10 +0000 (11:39 -0700)]
MB-16500 [BP]: MB-16496 Fix the race on vbucket state between persistVBState() and compactVB()

The following data race is reported by thread sanitizer:

WARNING: ThreadSanitizer: data race (pid=29921)
  Write of size 8 at 0x7d680001f580 by thread T5 (mutexes: write M12734):
    #0 VBucket::setPurgeSeqno() ep-engine/src/vbucket.h:215:9 (ep.so+0x000000086204)
    #1 EventuallyPersistentStore::compactVBucket() ep-engine/src/ep.cc:1584 (ep.so+0x000000086204)
    #2 CompactVBucketTask::run() ep-engine/src/tasks.cc:94:12 (ep.so+0x00000012971e)
    #3 ExecutorThread::run() ep-engine/src/executorthread.cc:115:26 (ep.so+0x0000000ea41d)
    #4 launch_executor_thread() ep-engine/src/executorthread.cc:33:9 (ep.so+0x0000000e9fe5)
    #5 platform_thread_wrap platform/src/cb_pthreads.c:23:5 (libplatform.so.0.1.0+0x000000004161)

  Previous read of size 8 at 0x7d680001f580 by thread T7:
    #0 VBucket::getPurgeSeqno() ep-engine/src/vbucket.h:211:16 (ep.so+0x0000000821d3)
    #1 EventuallyPersistentStore::persistVBState() ep-engine/src/ep.cc:1217 (ep.so+0x0000000821d3)
    #2 VBStatePersistTask::run() ep-engine/src/tasks.cc:86:12 (ep.so+0x000000129636)
    #3 ExecutorThread::run() ep-engine/src/executorthread.cc:115:26 (ep.so+0x0000000ea41d)
    #4 launch_executor_thread() ep-engine/src/executorthread.cc:33:9 (ep.so+0x0000000e9fe5)
    #5 platform_thread_wrap platform/src/cb_pthreads.c:23:5 (libplatform.so.0.1.0+0x000000004161)

  Location is heap block of size 1392 at 0x7d680001f200 allocated by main thread:
    #0 operator new() <null> (engine_testapp+0x00000045cded)
    #1 EventuallyPersistentStore::setVBucketState() ep-engine/src/ep.cc:1300:30 (ep.so+0x000000082b1a)
    #2 EventuallyPersistentEngine::setVBucketState() ep-engine/src/ep_engine.h:718:16 (ep.so+0x0000000ca308)
    #3 setVBucket()) ep-engine/src/ep_engine.cc:884 (ep.so+0x0000000ca308)
    #4 processUnknownCommand()) ep-engine/src/ep_engine.cc:1178 (ep.so+0x0000000ca308)
    #5 EvpUnknownCommand()) ep-engine/src/ep_engine.cc:1389:38 (ep.so+0x0000000aafc8)
    #6 mock_unknown_command()) memcached/programs/engine_testapp/engine_testapp.cc:380:19 (engine_testapp+0x0000004c56b9)
    #7 set_vbucket_state() ep-engine/tests/ep_test_apis.cc:607:9 (ep_testsuite.so+0x0000000a3a4b)
    #8 test_setup() ep-engine/tests/ep_testsuite_common.cc:146:28 (ep_testsuite.so+0x00000009cdda)
    #9 execute_test() memcached/programs/engine_testapp/engine_testapp.cc:1085:47 (engine_testapp+0x0000004c4103)
    #10 main memcached/programs/engine_testapp/engine_testapp.cc:1439 (engine_testapp+0x0000004c4103)

To address the above issue, vbucket states should be read after grabbing
the vbucket writer lock in EPStore::persistVBState().

Change-Id: I5a42b3e15a1cf5c941d399897bc68d6f35746ff3
Reviewed-on: http://review.couchbase.org/55972
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Reviewed-on: http://review.couchbase.org/56068
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16500 [BP]: Address possible data race in checkpoint remover 67/56067/4
abhinavdangeti [Thu, 8 Oct 2015 22:14:25 +0000 (15:14 -0700)]
MB-16500 [BP]: Address possible data race in checkpoint remover

WARNING: ThreadSanitizer: data race (pid=102986)

  Read of size 1 at 0x7d180000c298 by thread T17:
    #0 ClosedUnrefCheckpointRemoverTask::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/checkpoint_remover.cc:139 (ep.so+0x00000003d4a6)
    #1 ExecutorThread::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:115 (ep.so+0x0000000f9c4c)
    #2 launch_executor_thread(void*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000f9815)
    #3 platform_thread_wrap /home/couchbase/.ccache/tmp/cb_pthread.tmp.9515862b2292.83537.i:0 (libplatform.so.0.1.0+0x0000000041b1)

  Previous write of size 1 at 0x7d180000c298 by thread T18:
    #0 CheckpointVisitor::complete() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/checkpoint_remover.cc:69 (ep.so+0x00000003e0a1)
    #1 VBCBAdaptor::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/ep.cc:3789 (ep.so+0x00000009d64a)
    #2 ExecutorThread::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:115 (ep.so+0x0000000f9c4c)
    #3 launch_executor_thread(void*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000f9815)
    #4 platform_thread_wrap /home/couchbase/.ccache/tmp/cb_pthread.tmp.9515862b2292.83537.i:0 (libplatform.so.0.1.0+0x0000000041b1)

Change-Id: I8579d5af259490e41028f57f302e547b0826fa61
Reviewed-on: http://review.couchbase.org/55937
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-on: http://review.couchbase.org/56067

5 years agoMB-16500 [BP]: Address possible data race in item/expiry pagers 66/56066/4
abhinavdangeti [Thu, 8 Oct 2015 22:05:58 +0000 (15:05 -0700)]
MB-16500 [BP]: Address possible data race in item/expiry pagers

WARNING: ThreadSanitizer: data race (pid=102450)

  Write of size 1 at 0x7d180000c2f8 by thread T17:
    #0 PagingVisitor::complete() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/item_pager.cc:175 (ep.so+0x000000106c57)
    #1 VBCBAdaptor::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/ep.cc:3789 (ep.so+0x00000009d64a)
    #2 ExecutorThread::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:115 (ep.so+0x0000000f9c4c)
    #3 launch_executor_thread(void*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000f9815)
    #4 platform_thread_wrap /home/couchbase/.ccache/tmp/cb_pthread.tmp.9515862b2292.83537.i:0 (libplatform.so.0.1.0+0x0000000041b1)

  Previous read of size 1 at 0x7d180000c2f8 by thread T18:
    #0 ExpiredItemPager::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/item_pager.cc:334 (ep.so+0x0000001053c6)
    #1 ExecutorThread::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:115 (ep.so+0x0000000f9c4c)
    #2 launch_executor_thread(void*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000f9815)
    #3 platform_thread_wrap /home/couchbase/.ccache/tmp/cb_pthread.tmp.9515862b2292.83537.i:0 (libplatform.so.0.1.0+0x0000000041b1)

Change-Id: Iebfe280c95847ee80b2d80d08b0eb340f40663d9
Reviewed-on: http://review.couchbase.org/55935
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-on: http://review.couchbase.org/56066

5 years agoMB-16500 [BP]: Fix potenial deadlock around Connmap::relaseLock / connLock 64/56064/2
Dave Rigby [Wed, 7 Oct 2015 11:23:39 +0000 (11:23 +0000)]
MB-16500 [BP]: Fix potenial deadlock around Connmap::relaseLock / connLock

As reported by ThreadSanitizer (see below), we have a lock inversion
creating a potential deadlock in Connmap, related to how we shutdown
connections:

There exists a cycle in lock order graph:

 M2176 (Connmap::releaseLock in ConnMap::notifyPausedConnection() connmap.cc:235) =>
   M128093 (mock_server::conn_struct::mutex) =>
     M2177 (Connmap::connsLock in TapConnMap::shutdownAllConnections(), connmap.cc:770) =>
       M2176 (Connmap::releaseLock in TapConnMap::shutdownAllConnections(), connmap.cc:777) DEADLOCK!

The problem appears to be that in TapConnMap::shutdownAllConnections()
we first acquire {connsLock}, then acquire {releaseLock}; all while
holding the cookie lock.

Fix is to drop releaseLock in shutdownAllConnections() once we've
released any references, *then* acquire the connLock to actually clear
out the connection map.

ThreadSanitizer report follows (irrelevent parts removed):

WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=1087)

  Cycle in lock order graph: M2176 (0x7d840001b810) => M128093 (0x7d280000efa0) => M2177 (0x7d840001b850) => M2176

  Mutex M128093 acquired here while holding mutex M2176 in thread T10:
    <cut>

  Mutex M2176 previously acquired by the same thread here:
    #0 pthread_mutex_lock <null> ()
    ...
    #5 ConnMap::notifyPausedConnection() ep-engine/src/connmap.cc:235 ()
    <cut>

  Mutex M2177 acquired here while holding mutex M128093 in main thread:
    #0 pthread_mutex_lock <null> ()
    ...
    #5 TapConnMap::newProducer() ep-engine/src/connmap.cc:378 ()
    #6 EventuallyPersistentEngine::createTapQueue() ep-engine/src/ep_engine.cc:2663:23 ()
    <cut>

  Mutex M128093 previously acquired by the same thread here:
    #0 pthread_mutex_lock <null> ()
    #1 cb_mutex_enter platform/src/cb_pthreads.c:115:14 ()
    #2 lock_mock_cookie memcached/programs/engine_testapp/mock_server.c:436:4 ()
    #3 test_tap_stream() ep-engine/tests/ep_testsuite.cc:6751:5 ()
    #4 execute_test() memcached/programs/engine_testapp/engine_testapp.cc:1090:19 ()
    #5 main memcached/programs/engine_testapp/engine_testapp.cc:1439 ()

  Mutex M2176 acquired here while holding mutex M2177 in main thread:
    #0 pthread_mutex_lock <null> ()
    ...
    #5 TapConnMap::shutdownAllConnections() ep-engine/src/connmap.cc:777 ()
    #6 EventuallyPersistentEngine::destroy() ep-engine/src/ep_engine.cc:2190:9 ()
    <cut>

  Mutex M2177 previously acquired by the same thread here:
    #0 pthread_mutex_lock <null> ()
    ...
    #5 TapConnMap::shutdownAllConnections() ep-engine/src/connmap.cc:770 ()
    #6 EventuallyPersistentEngine::destroy() ep-engine/src/ep_engine.cc:2190:9 ()
    <cut>

Change-Id: Ic9a4f028b202277729df025333ce630be056903d
Reviewed-on: http://review.couchbase.org/55863
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-on: http://review.couchbase.org/56064

5 years agoMB-16500 [BP]: Address possible data race in Notifiable: setSuspended 63/56063/2
abhinavdangeti [Tue, 6 Oct 2015 00:47:23 +0000 (17:47 -0700)]
MB-16500 [BP]: Address possible data race in Notifiable: setSuspended

WARNING: ThreadSanitizer: data race (pid=19078)

  Write of size 1 at 0x7d600000f078 by thread T10 (mutexes: write M21717):
    #0 Notifiable::setSuspended(bool) /home/abhinav/couchbase/ep-engine/src/tapconnection.h:764 (ep.so+0x00000005fe6a)
    #1 TapProducer::suspendedConnection_UNLOCKED(bool) /home/abhinav/couchbase/ep-engine/src/tapconnection.cc:717 (ep.so+0x00000013b24e)
    #2 ExecutorThread::run() /home/abhinav/couchbase/ep-engine/src/executorthread.cc:112 (ep.so+0x0000000f89f6)
    #3 launch_executor_thread(void*) /home/abhinav/couchbase/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000f8595)
    #4 platform_thread_wrap /home/abhinav/couchbase/platform/src/cb_pthreads.c:23 (libplatform.so.0.1.0+0x000000003d31)

  Previous read of size 1 at 0x7d600000f078 by main thread (mutexes: write M21715):
    #0 Notifiable::isSuspended() /home/abhinav/couchbase/ep-engine/src/tapconnection.h:768 (ep.so+0x0000000dfdf2)
    #1 EventuallyPersistentEngine::walkTapQueue(void const*, void**, void**, unsigned short*, unsigned char*, unsigned short*, unsigned int*, unsigned short*) /home/abhinav/couchbase/ep-engine/src/ep_engine.cc:2531 (ep.so+0x0000000b7ecc)
    #2 EvpTapIterator(engine_interface*, void const*, void**, void**, unsigned short*, unsigned char*, unsigned short*, unsigned int*, unsigned short*) /home/abhinav/couchbase/ep-engine/src/ep_engine.cc:1440 (ep.so+0x0000000d5ad7)
    #3 mock_tap_iterator(engine_interface*, void const*, void**, void**, unsigned short*, unsigned char*, unsigned short*, unsigned int*, unsigned short*) /home/abhinav/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:81 (engine_testapp+0x0000000bbda2)
    #4 test_tap_ack_stream(engine_interface*, engine_interface_v1*) /home/abhinav/couchbase/ep-engine/tests/ep_testsuite.cc:7353 (ep_testsuite.so+0x0000000504a7)
    #5 execute_test(test, char const*, char const*) /home/abhinav/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:1090 (engine_testapp+0x0000000b946c)
    #6 __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287 (libc.so.6+0x000000021ec4)

Change-Id: I596c93c048767911b052861193822ca17270a5dd
Reviewed-on: http://review.couchbase.org/55792
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-on: http://review.couchbase.org/56063

5 years agoMB-16500 [BP]: Address possible data race in ConnHandler: lastWalkTime 62/56062/2
abhinavdangeti [Mon, 5 Oct 2015 22:11:24 +0000 (15:11 -0700)]
MB-16500 [BP]: Address possible data race in ConnHandler: lastWalkTime

WARNING: ThreadSanitizer: data race (pid=26185)

  Write of size 4 at 0x7d4c0000a154 by main thread:
    #0 ConnHandler::setLastWalkTime() /home/abhinav/couchbase/ep-engine/src/tapconnection.h:356 (ep.so+0x000000065641)
    #1 EvpDcpStep(engine_interface*, void const*, dcp_message_producers*) /home/abhinav/couchbase/ep-engine/src/ep_engine.cc:1478 (ep.so+0x0000000b4d5b)
    #2 mock_dcp_step(engine_interface*, void const*, dcp_message_producers*) /home/abhinav/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:476 (engine_testapp+0x0000000baf95)
    #3 dcp_stream(engine_interface*, engine_interface_v1*, char const*, void const*, unsigned short, unsigned int, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, int, int, int, int, bool, bool, unsigned char, bool, unsigned long*, bool) /home/abhinav/couchbase/ep-engine/tests/ep_testsuite.cc:4164 (ep_testsuite.so+0x0000000990df)
    #4 test_dcp_producer_stream_req_dgm(engine_interface*, engine_interface_v1*) /home/abhinav/couchbase/ep-engine/tests/ep_testsuite.cc:4564 (ep_testsuite.so+0x000000077634)
    #5 execute_test(test, char const*, char const*) /home/abhinav/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:1090 (engine_testapp+0x0000000b946c)
    #6 __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287 (libc.so.6+0x000000021ec4)

  Previous read of size 4 at 0x7d4c0000a154 by thread T10 (mutexes: write M1367):
    #0 ConnHandler::getLastWalkTime() /home/abhinav/couchbase/ep-engine/src/tapconnection.h:360 (ep.so+0x000000049cbe)
    #1 ConnManager::run() /home/abhinav/couchbase/ep-engine/src/connmap.cc:150 (ep.so+0x00000005031e)
    #2 ExecutorThread::run() /home/abhinav/couchbase/ep-engine/src/executorthread.cc:112 (ep.so+0x0000000f8746)
    #3 launch_executor_thread(void*) /home/abhinav/couchbase/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000f82e5)
    #4 platform_thread_wrap /home/abhinav/couchbase/platform/src/cb_pthreads.c:23 (libplatform.so.0.1.0+0x000000003d31)

Change-Id: I2c5024afde6cb749aad901572bfd68734f6d7d5d
Reviewed-on: http://review.couchbase.org/55780
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-on: http://review.couchbase.org/56062

5 years agoMerge remote-tracking branch 'couchbase/3.0.x' into 'sherlock' 33/55933/1
abhinavdangeti [Thu, 8 Oct 2015 21:10:14 +0000 (14:10 -0700)]
Merge remote-tracking branch 'couchbase/3.0.x' into 'sherlock'

couchbase/3.0.x:
* MB-16357: Interlock expiry and vbucket state changes

Change-Id: Ie4ba4138d23a4b811bd43ee182a641c99b5896ad

5 years agoMB-16357: Interlock expiry and vbucket state changes 46/55646/8
Jim Walker [Thu, 1 Oct 2015 16:12:19 +0000 (17:12 +0100)]
MB-16357: Interlock expiry and vbucket state changes

Expiry should only occur whilst the vbucket is active.
Background tasks performing expiry deletion must stop
driving deletions when the vb changes status to !active.

Using a reader/writer lock the core deleteExpiredItem
function which is used by both compaction driven expiry
and the item pager are now interlocked with vbucket::setState()

Change-Id: I19d30c3d7855778613ccb4534a042c0daf627b8c
Reviewed-on: http://review.couchbase.org/55646
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMerge remote-tracking branch 'couchbase/3.0.x' into sherlock 40/55740/1
abhinavdangeti [Fri, 2 Oct 2015 20:36:21 +0000 (13:36 -0700)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock

couchbase/3.0.x:
* MB-16434: In consumer stream get bytes cleared atomically.
* MB-16434: In setDead release streamMutex before cleaning up stream buffer
* [BP] MB-14382 Increase the default number of ht_locks by a factor of approx. 10
* MB-16421: BGFetch to restore an item that is non-resident
* MB-16402:  Ensure objectregistry has an engine when deleting the VBucketMemoryDeletionTask.

Change-Id: I56bf61ddb324f2472682f3b3ac7f86e81e3f6636

5 years agoMB-16434: In consumer stream get bytes cleared atomically. 08/55708/2
Manu Dhundi [Thu, 1 Oct 2015 22:38:14 +0000 (15:38 -0700)]
MB-16434: In consumer stream get bytes cleared atomically.

When a comsumer stream is set to dead we clear the consumer buffer and
notify the producer of the number of bytes cleared. Clearing the
consumer buffer and the accounting of the bytes cleared should be done
atomically

Change-Id: I602d5307c6c2bbd3dc7f03f1d6c43cbe294389ac
Reviewed-on: http://review.couchbase.org/55708
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16434: In setDead release streamMutex before cleaning up stream buffer 07/55707/5
Manu Dhundi [Thu, 1 Oct 2015 22:35:47 +0000 (15:35 -0700)]
MB-16434: In setDead release streamMutex before cleaning up stream buffer

To avoid lock order inversion in DCP passive stream we must release streamMutex
before acquiring bufMutex. This is because while processing incoming mutations
on dcp consumer we acquire bufMutex first and then streamMutex.

Change-Id: I129d014dc3a7ec91416af04851419782b89cea23
Reviewed-on: http://review.couchbase.org/55707
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years ago[BP] MB-14382 Increase the default number of ht_locks by a factor of approx. 10 05/55705/3
Chiyoung Seo [Fri, 27 Mar 2015 21:58:27 +0000 (14:58 -0700)]
[BP] MB-14382 Increase the default number of ht_locks by a factor of approx. 10

This change is backported from sherlock.

From the recent perf test results, we observed the lock contention on hash table
buckets when a hash table scanning task (i.e., expiry pager, item pager,
accecss scanner, etc.) is running.

As a long term solution, we may need to implement resizing the number of
hash table locks dynamically at runtime.

Change-Id: Ic7b9f951f58fe7190d8d0d23fb62979057e545ac
Reviewed-on: http://review.couchbase.org/48869
Reviewed-by: Michael Wiederhold <mike@couchbase.com>
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-on: http://review.couchbase.org/55705
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16421: BGFetch to restore an item that is non-resident 80/55680/3
abhinavdangeti [Thu, 1 Oct 2015 00:40:14 +0000 (17:40 -0700)]
MB-16421: BGFetch to restore an item that is non-resident

In a full eviction scenario, a bgfetch is to restore an item
not just if the hash table item is a temp-initial item, but
if the hash table item is non-resident as well.

Change-Id: I127b0cbe7034133a656b046cd4022635be23aedd
Reviewed-on: http://review.couchbase.org/55680
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>