ep-engine.git
4 years agoMB-19815: doDcpVbTakeoverStats, addTakeoverStats: 0 deleted items on exception 66/64666/3 4.5.0 v4.5.0
Manu Dhundi [Wed, 1 Jun 2016 21:31:47 +0000 (14:31 -0700)]
MB-19815: doDcpVbTakeoverStats, addTakeoverStats: 0 deleted items on exception

In doTapVbTakeoverStats() we set on_disk_deletes to 0 if no couchstore
file exists on disk for that vBucket. We need to handle the exception
in the same way if it occurs during doDcpVbTakeoverStats(),
addTakeoverStats() or BackfillDiskLoad::run(() calls. (Similar to
http://review.couchbase.org/#/c/64297/)

Note: there is more to be understood about the various scenarios where
a vBucket file does not exist on disk when stats call is made.

Change-Id: Idde212db8ed5d7ed9a0eca02805a7ccc5a34e0b0
Reviewed-on: http://review.couchbase.org/64666
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19605: Add more tests for stats 30/64330/2
Trond Norbye [Fri, 20 May 2016 13:08:53 +0000 (15:08 +0200)]
MB-19605: Add more tests for stats

We've added range checking for the calls to snprintf to ensure
that we don't blow the buffer. This patch add a bunch of tests
that call the various stats groups to ensure that they don't
crash or blow the buffer...

Change-Id: I3a16a59c7fad74504a86ff4c9c287b0259d41420
Reviewed-on: http://review.couchbase.org/64330
Reviewed-by: Dave Rigby <daver@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19428: Don't schedule a backfill for streams on dead vbuckets 22/64322/5
Jim Walker [Tue, 24 May 2016 13:35:21 +0000 (14:35 +0100)]
MB-19428: Don't schedule a backfill for streams on dead vbuckets

MB-17230 added a new check to stop the creation of an ActiveStream
on a dead vbucket. Due to lock inversion issues this check is done
after creating the ActiveStream object and more importantly
after we've called "setActive".

setActive will drive the stream state machine once, moving from
pending to backfilling, yet we never put the new ActiveStream
into the streams map.

These dangling streams can still then pull data into memory and
increment the backfill manager's accounting, yet because the
stream is not connected to the producer it never goes through
the logic to decrement the accounting.

Change-Id: I31b43530a2f6bec2e0741ec0c5e1fedb1af50190
Reviewed-on: http://review.couchbase.org/64322
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19695: Log VBucket creation and state change at NOTICE 98/64298/2
Dave Rigby [Mon, 23 May 2016 09:53:15 +0000 (10:53 +0100)]
MB-19695: Log VBucket creation and state change at NOTICE

VBucket state changes are relatively rare in the grand scheme of
things so log when whey change at NOTICE to assist in debugging.

Change-Id: I3b1e4b70b8d7aa8100cda6b0f1ba9cce5bc25923
Reviewed-on: http://review.couchbase.org/64298
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Sriram Ganesan <sriram@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19695: doTapVbTakeoverStats: assume zero deleted items if exception caught 97/64297/2
Dave Rigby [Mon, 23 May 2016 14:42:24 +0000 (15:42 +0100)]
MB-19695: doTapVbTakeoverStats: assume zero deleted items if exception caught

We have observed instances where doTapVbTakeoverStats() is called when
no couchstore file exists on disk for the given vBucket. This results
in an excpetion being thrown, which is caught in the worker runloop
and disconnects the client - which is ns_server, resulting in
rebalance using TAP failing (e.g. a 2.5.x upgrade).

There has been a number of instances of essentially the same bug
(MB-16657, MB-18679, MB-19567, MB-19695), along with a number of
patches to fix it, however we are still seeing the same symptoms.

Given there is clearly more to be understood about the various
scenarious where a vBucket file does not exist on disk, this patch
reverts to pre-Watson (specifically http://review.couchbase.org/56237
) behaviour - if CouchKVStore::getNumPersistedDeletes does not find a
couchstore file on disk then zero will be used for `del_items` in
doTapVbTakeoverStats.

Change-Id: Ida47ec749546fc15aa66a9dc6d75ee31c767874e
Reviewed-on: http://review.couchbase.org/64297
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoRevert "MB-19567: Don't set bucket creation to false during vbucket reset" 96/64296/2
Dave Rigby [Mon, 23 May 2016 15:14:36 +0000 (16:14 +0100)]
Revert "MB-19567: Don't set bucket creation to false during vbucket reset"

This reverts commit 1cc0b30b7b04ee0c390fc4b3c4bae5b62fd6d900.

While the commit does address one possible edge-case where a vBucket
file is not immediately recreated after resetVBucket(), it is
incomplete (the bug still occurs), and having this partial fix only
complicates the code.

Reverting the fix and solving the issue in a different way in
follow-up.

Change-Id: Ifd0a18fd9062b0663c1fb28999ef7f44ef483ceb
Reviewed-on: http://review.couchbase.org/64296
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoRevert "MB-19695: Always persist VBstate when resetting VB" 95/64295/2
Dave Rigby [Mon, 23 May 2016 13:17:38 +0000 (14:17 +0100)]
Revert "MB-19695: Always persist VBstate when resetting VB"

This reverts commit cb6c7141366d533874c63cb9bc0a144f9ab0347c.

While the commit does address one possible edge-case where a vBucket
file is not being recreated after resetVBucket(), it is incomplete
(the bug still occurs), and having this partial fix only complicates
the code.

Reverting the fix and solving the issue in a different way in
follow-up.

Change-Id: Ia44d22a09829f65884cc807743be3a2723221833
Reviewed-on: http://review.couchbase.org/64295
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19695: Always persist VBstate when resetting VB 52/64252/4
Dave Rigby [Fri, 20 May 2016 17:00:53 +0000 (18:00 +0100)]
MB-19695: Always persist VBstate when resetting VB

During tapNotify / TAP_OPAQUE_INITIAL_VBUCKET_STREAM - when we are
setting up a replica vBucket which is populated by TAP - the vbucket
is reset. This deletes the current database file from disk, and
creates a new file.

Both the delete and create are done asynchronously via our Task
mechanism (on a writer thread). It has been observed that the vbucket
file is sometimes not re-created, which can result in the following
exception being thrown which disconnects the connection:

    WARNING 47: exception occurred in runloop - closing connection:
    CouchKVStore::getNumPersistedDeletes:Failed to open database
    file for vBucket = 96 rev = 1 with error:no such file

Interestingly this isn't a simple case of the filesystem being slow -
cbcollect logs taken ~5minutes later still show the vbucket file not
existing.

This appears to be caused by an optimization in
EventuallyPersistentStore::scheduleVBStatePersist - which is used to
schedule a new VBstate to be saved (and in turn create the vb file on
disk). The optimization is to not schedule a persist if one is
already scheduled.

However, there exists the possibility that a VBStatePersistTask
already exists (causing us not to create one during resetVBucket(),
*and* the existing VBStatePersistTask is scheduled before the
delete. This results in the file being deleted, and not re-created as
part of the reset process.

The solution is to forcefully persist the VBState during resetVBucket,
essentially disabling the optimization in this case and ensuring the
file is created. Full unit regression test to follow in separate patch
(it needs test infrastructure which doesn't yet exist).

(Note: the vbucket datafile file /will/ be created upon the next write
to the vbucket, but in the case of a vbucket with zero items this may
take some time).

Change-Id: I3f4e76eee35a75d64f5e981529ab47fcf2007fcb
Reviewed-on: http://review.couchbase.org/64252
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
Reviewed-by: Sriram Ganesan <sriram@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19605: Increase buffer size for addSeqnoVbStats 09/64209/2
Daniel Owen [Thu, 19 May 2016 11:33:26 +0000 (12:33 +0100)]
MB-19605: Increase buffer size for addSeqnoVbStats

The SeqnoVbStats can be greater than 31 characters.
Therefore we need to increase the buffer size.
Increasing to a value of 64.

Change-Id: I43d482630444cb267e4a5f264844cc22a55cd750
Reviewed-on: http://review.couchbase.org/64209
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19503: Fully restore server cookie API in unit test 22/64122/3
Dave Rigby [Tue, 17 May 2016 17:04:10 +0000 (18:04 +0100)]
MB-19503: Fully restore server cookie API in unit test

In the regression / unit test for this MB we interpose our own
function in place of the normal server cookie API function
notify_io_complete. However we do not correctly restore the original
function when finished, which can cause other subsequent tests in the
same binary to fail.

As the cookie API is accessed via pointer from server API we need to
take a copy of the cookie API struct, and restore it when the test is
complete.

Change-Id: I71045f3d7bd4d181db43876954b99c3ed0db688b
Reviewed-on: http://review.couchbase.org/64122
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
4 years agoMB-19145: Merge backfill and in-memory snapshots correctly on replica vb 95/64095/4
Manu Dhundi [Tue, 17 May 2016 18:55:28 +0000 (11:55 -0700)]
MB-19145: Merge backfill and in-memory snapshots correctly on replica vb

When a DCP client, on replica vb, opens a stream which it intends to
keep open forever, merge the backfill and in-memory snapshots by using the
the checkpoint snapshot_end as snapshot_end_seqno.

Change-Id: Ic05a59ccafa54bbee19882707404a78c47002be7
Reviewed-on: http://review.couchbase.org/64095
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19635: Initialise failovers correctly from 2.5.x vbstate 19/64119/2
Jim Walker [Tue, 17 May 2016 16:41:10 +0000 (17:41 +0100)]
MB-19635: Initialise failovers correctly from 2.5.x vbstate

When loading a vb file, don't force the failover table data
to be ("[{\"id\":0,\"seq\":0}]"); if the file doesn't contain
any data.

Change-Id: I41673bf848fcbab9b616edec5c7fd2ab9a3ddd6b
Reviewed-on: http://review.couchbase.org/64119
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19627: Log the actual last seqno sent before closing the stream. 87/64087/4
Manu Dhundi [Mon, 16 May 2016 23:33:09 +0000 (16:33 -0700)]
MB-19627: Log the actual last seqno sent before closing the stream.

When a DCP stream closes, we log the last sent seqno at the time when
stream transitions to dead state. However, we further stream items in
the readyQ from  dead state as well. This commit adds the correct
last seqno sent.

Change-Id: I0f0bfd199544dc5bf20e0ca97b3c5ea8d207c6a8
Reviewed-on: http://review.couchbase.org/64087
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19503: Expand ConnMap notify unit test for unpaused case 04/64104/4
Dave Rigby [Tue, 17 May 2016 09:59:41 +0000 (10:59 +0100)]
MB-19503: Expand ConnMap notify unit test for unpaused case

Expand the unit/regression test for MB-19503 to check for the case
where notifyAllPausedConnections() is called when a connection is not
paused.

Checks for the case where:

1. notifyAllPausedConnections() is called on unpaused connection
2. the connection is later re-paused.
3. notifyPausedConnection() is called

We fail to correctly add a pending notification, meaning a subsequent
notifyAllPausedConnections() does not notify.

Change-Id: Ibe2e110736463eaf8311b01ebe631c96a28384ce
Reviewed-on: http://review.couchbase.org/64104
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMerge remote-tracking branch 'couchbase/sherlock' into watson 14/64114/2
Dave Rigby [Tue, 17 May 2016 14:20:22 +0000 (15:20 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into watson

* couchbase/sherlock:
  MB-19503: Fix ConnMap so notifications don't go missing [2]

Change-Id: I263c23a25f746e85adbe8b44ba5398953b9d2f8e

4 years agoMB-19503: Fix ConnMap so notifications don't go missing. 02/64102/2
Jim Walker [Wed, 11 May 2016 15:26:47 +0000 (16:26 +0100)]
MB-19503: Fix ConnMap so notifications don't go missing.

There's a reliance on an atomic bool and cmpxchg to
prevent the producer of notification from queueing
himself if he's already got a notification scheduled.

There's an ordering issue though where the producers code
can execute, see the flag is true and not bother queueing
a notification, yet the consumer side is about to clear the
flag and finish. The notification thus never gets queued
and the producer side thinks he will get a notification.

In my terminology:
producer is ConnMap::notifyPausedConnection
consumer is ConnMap::notifyAllPausedConnections

Change-Id: Id324b6369c5ee3a6b6758a7a93e017a4ff7c4a78
Reviewed-on: http://review.couchbase.org/64102
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19503: Fix ConnMap so notifications don't go missing [2] 72/64072/3
Jim Walker [Mon, 16 May 2016 15:24:35 +0000 (16:24 +0100)]
MB-19503: Fix ConnMap so notifications don't go missing [2]

Previous patch[1] cleared the isNotificationScheduled flag
at the wrong place and meant things could then never
again get scheduled.

This is because we only cleared the flag if tp->isPaused()
yet we still pop the notification from the queue, so we
left tp->isNotificationScheduled yet the queue is empty.
Now no more notifications will ever get scheduled!

So we need to clear the notification scheduled boolean
unconditionally of the other flags on tp.

[1] - Commit 0856e0b3d3fc6

Change-Id: I11c9fd72f4b35102328022bd4c334a9e09a61cd0
Reviewed-on: http://review.couchbase.org/64072
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
4 years agoMB-19503: Enable connmap_notify unit test 69/64069/2
Dave Rigby [Mon, 16 May 2016 09:04:45 +0000 (10:04 +0100)]
MB-19503: Enable connmap_notify unit test

Now the fix for MB-19503 has been committed, test can be enabled.

Change-Id: Ibcd1fb6f740c9b03a3f3827dc3905c183e3b59c0
Reviewed-on: http://review.couchbase.org/64069
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
4 years agoMB-19567: Don't set bucket creation to false during vbucket reset 12/63912/10
Sriram Ganesan [Tue, 10 May 2016 22:50:31 +0000 (15:50 -0700)]
MB-19567: Don't set bucket creation to false during vbucket reset

There are 2 tasks that happen during vbucket reset, a vbucket deletion
followed by setting a vbucket state that creates the vbucket files.
Vbucket deletion sets bucket creation to false. A setVbucketState
call only sets bucket creation to true only before scheduling a vbucket
persist task. During this window, a stat call for DCP takeover stats
finds that vbucket deletion is set to false and vbucket creation is
also set to false, thus resulting in an exception being thrown from
getNumPersistedDeletes from CouchKVStore

Fix: Because a reset involves a deletion followed by a recreation of
a vbucket, vbucket deletion and creation should both be set to true
at the beginning of reset and set to false once the respective tasks
complete

Change-Id: Idccb753124c85be2399020b436a737f176cc95ef
Reviewed-on: http://review.couchbase.org/63912
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoRevert "MB-19503: Fix ConnMap so notifications don't go missing." 92/64092/3
Sriram Ganesan [Mon, 16 May 2016 22:33:29 +0000 (15:33 -0700)]
Revert "MB-19503: Fix ConnMap so notifications don't go missing."

This reverts commit 0856e0b3d3fc62a50677a9be7963be3d5c04d041.

Change-Id: I2caffc440ce663771276668373d178d537c2926a
Reviewed-on: http://review.couchbase.org/64092
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Finlay <dave.finlay@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
4 years agoMB-19605: Check return value for snprintf 20/64020/7
Trond Norbye [Fri, 13 May 2016 08:42:26 +0000 (10:42 +0200)]
MB-19605: Check return value for snprintf

snprintf does not return -1 if the buffer is too small, and the
buffer is not zero terminated on windows.

Change-Id: Icbb166385843df0a7d44f815c2c3e5fd8341ded4
Reviewed-on: http://review.couchbase.org/64020
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19503: Unit test to demonstrate notifications can go missing. 77/63977/4
Jim Walker [Thu, 12 May 2016 12:53:02 +0000 (13:53 +0100)]
MB-19503: Unit test to demonstrate notifications can go missing.

The test serially interleaves ConnMap::notifyPausedConnection into the
path of ConnMap::notifyAllPausedConnections so that we show a second
notification goes missing because of the ordering of clearing
the Notifiable::notificationScheduled atomic bool.

Test is currently marked as disabled because the fix is coming
from downstream via merge commits.

Fix -> http://review.couchbase.org/#/c/63934/

Change-Id: Icd7348364c393f154ad5db9c741fc1616a121805
Reviewed-on: http://review.couchbase.org/63977
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMerge remote-tracking branch 'couchbase/sherlock' into watson 44/64044/1
Dave Rigby [Fri, 13 May 2016 19:49:17 +0000 (20:49 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into watson

* couchbase/sherlock:
  MB-19503: Fix ConnMap so notifications don't go missing.

Change-Id: Id3d547590c5edc88b466cc585a0b017edb48af9a

4 years agoMB-19503: Fix ConnMap so notifications don't go missing. 25/64025/2
Jim Walker [Wed, 11 May 2016 15:26:47 +0000 (16:26 +0100)]
MB-19503: Fix ConnMap so notifications don't go missing.

There's a reliance on an atomic bool and cmpxchg to
prevent the producer of notification from queueing
himself if he's already got a notification scheduled.

There's an ordering issue though where the producers code
can execute, see the flag is true and not bother queueing
a notification, yet the consumer side is about to clear the
flag and finish. The notification thus never gets queued
and the producer side thinks he will get a notification.

In my terminology:
producer is ConnMap::notifyPausedConnection
consumer is ConnMap::notifyAllPausedConnections

Change-Id: Id324b6369c5ee3a6b6758a7a93e017a4ff7c4a78
Reviewed-on: http://review.couchbase.org/64025
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMerge remote-tracking branch 'couchbase/sherlock' into watson 36/64036/1
Dave Rigby [Fri, 13 May 2016 17:40:14 +0000 (18:40 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into watson

* couchbase/sherlock:
  MB-16656: Send snapshotEnd as highSeqno for replica vb in GET_ALL_VB_SEQNOS call
  MB-19093 [BP]: [ActiveStream] Address potential lock-inversion scenarios
  MB-19075: Remove printing of empty string in CouchKVStore::getMulti()
  MB-18476: Handle write failures more gracefully in the mutation log

Change-Id: Ic827ee434a7149cb14d6d69358ccab23f204146a

4 years agoMB-19430: Handle temporary failure in BackfillDiskLoad::run() 29/63929/7
Daniel Owen [Wed, 11 May 2016 13:14:54 +0000 (14:14 +0100)]
MB-19430: Handle temporary failure in BackfillDiskLoad::run()

memcached.log contains the following error message:

    "CouchKVStore::getDbFileInfo: Failed to open database file
    for vBucket = 0 rev = 1 with error:no such file:

The reason for this failure is a race condition between the thread
which creates couch-db and the thread which opens this file for read
operation (to get DbFileInfo).

To fix, if BackfillDiskLoad::run fails to obtain the item count for a
vbucket (due to the file not being ready yet), then snooze the task
and retry later.

Change-Id: I4613ef4716b0a1dccd5928d776e1f20ecdfe129e
Reviewed-on: http://review.couchbase.org/63929
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19371: Exit warmup on OOM condition in valueOnly key loading phase 32/63732/7
Sriram Ganesan [Thu, 5 May 2016 01:15:58 +0000 (18:15 -0700)]
MB-19371: Exit warmup on OOM condition in valueOnly key loading phase

During the key loading phase of warmup for valueOnly eviction, there
is a possibility of hitting an out-of-memory condition. In that
case, we should not enable traffic and return ENOMEM.

Change-Id: I507e90aeec1392206198d39a8522c9457919f909
Reviewed-on: http://review.couchbase.org/63732
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18692: Wait until you receive an item in flow control test 64/63764/6
Manu Dhundi [Thu, 5 May 2016 18:52:06 +0000 (11:52 -0700)]
MB-18692: Wait until you receive an item in flow control test

If no items are added yet to the DCP ready queue, the step() wouldn't
send an item. That case is handled in this commit

Change-Id: I4255f97a117de59df93c0d55237802ea40167d46
Reviewed-on: http://review.couchbase.org/63764
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoRevert "MB-19527: Disable broken DCP compression perf tests" 92/63792/2
Dave Rigby [Fri, 6 May 2016 10:59:02 +0000 (11:59 +0100)]
Revert "MB-19527: Disable broken DCP compression perf tests"

The failures seen in DCP compression performance tests were actually
due to me not realising that different vBuckets are used for the test,
and hence the sentinal was created on the wrong bucket.

This has now been fixed, so these tests can be re-enabled.

This reverts commit 16dd6118febe0b40f615868780c8d7e585046570.

Change-Id: I80b2a31886ebce1c17165b2c0c1c54ea68f5c608
Reviewed-on: http://review.couchbase.org/63792
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18670: ep_perfsuite: Use correct vbid for sentinal doc 91/63791/2
Dave Rigby [Fri, 6 May 2016 11:19:55 +0000 (12:19 +0100)]
MB-18670: ep_perfsuite: Use correct vbid for sentinal doc

The sentinal document for ep_perfsuite tests was always using vbid
0. This meant that tests which use vbuckets other than zero (e.g. the
DCP compression tests) wouldn't have the sentinal set correctly.

Change-Id: I8a467e7067fdc2280d8a7eec8044f13568a4b799
Reviewed-on: http://review.couchbase.org/63791
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18669: Address sporadic test failure seen with a DCP test 66/63766/2
abhinavdangeti [Thu, 5 May 2016 19:08:23 +0000 (12:08 -0700)]
MB-18669: Address sporadic test failure seen with a DCP test

Context: test_dcp_producer_stream_req_partial

Sporadic failure of number of mutations received lesser
than expected number at times because of de-duplication
very likely.

Change-Id: I89bcce14789dcc246921b460e4a04ecae193fe84
Reviewed-on: http://review.couchbase.org/63766
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
4 years agoMB-18670: ep_perfsuite: Fix end of DCP/TAP stream calculation 49/63749/4
Dave Rigby [Thu, 5 May 2016 13:47:24 +0000 (14:47 +0100)]
MB-18670: ep_perfsuite: Fix end of DCP/TAP stream calculation

A number of ep_perfsuite tests check for a specific number of TAP/DCP
messages to be received, based on the number of mutations. However we
have seen intermittant test failures in these tests, similar to:

    Test failed: `Didn't receive expected number of mutations' (num_mutations <= static_cast<size_t>(ha->itemCount))

The problem is how DCP stream (running in the background) is setup,
and how it detects the end of the test. Currently the code tries to
predict what sequence number it will see up to (i.e the total mutation
/ deletion count) when the stream is created, and once the stream ends
it compares how many mutations it has received with the expected
number. It uses a simple calculation of 3x the number of keys as the
expected number of operations (one per ADD, REPLACE and DELETE).

The first problem with this approach is it fails to account for the
updates created by the recently-added APPEND operations - we append
multiple times to a single key, which creates higher sequence numbers
than expected. The net effect of this is that what it thinks is the
last sequence number (should be the last document getting deleted) is
actually one of the append operations. As a consequence (and due to
potential de-duplication) we do not always see as many mutations as
expected.

 (For the record - I managed to significantly increase the chances of
  hitting the failure (to pretty much every time) by adding sleeps of
  1us between each append operation in perf_latency_core).

I initially looked at solving this by fixing the calculation of how
many updates will occur. This worked for DCP (where we have an
accurate sequence number) however the TAP tests do not have an
accurate sequence number, and so we cannot predict how many mutations
will occur.

The chosen solution is to instead create a sentinal mutation at the
end of the load phase, and have the DCP & TAP clients simply check a
mutations' key against this marker and stop when they encounter it.

Change-Id: I0d73a27cbe666e8e7446757940046060ed723a4b
Reviewed-on: http://review.couchbase.org/63749
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-19527: Disable broken DCP compression perf tests 48/63748/2
Dave Rigby [Thu, 5 May 2016 13:45:36 +0000 (14:45 +0100)]
MB-19527: Disable broken DCP compression perf tests

See MB-19526 for the background. Given the currently broken tests
block an issue targeted for Watson, and DCP compression isn't enabled
in Watson, disable these tests for now to allow us to make forward
progress.

Change-Id: Ibba41fef178d6fe2d43ba4b8f91c25297a085aed
Reviewed-on: http://review.couchbase.org/63748
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18670: Add additional checks / diagnostic to perf_dcp_client 47/63747/2
Dave Rigby [Fri, 29 Apr 2016 12:24:23 +0000 (13:24 +0100)]
MB-18670: Add additional checks / diagnostic to perf_dcp_client

Add additional checks and diagnostic output to assist in tracking down
the intermittent failure in perf_dcp_client.

Change-Id: I0e0ebfa783baa922f8af6a20c1bf4802c7ebe4bf
Reviewed-on: http://review.couchbase.org/63747
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-19303 Use error string and OS error for CouchKVStore::getAllKeys 40/63740/2
Will Gardner [Wed, 20 Apr 2016 18:15:04 +0000 (19:15 +0100)]
MB-19303 Use error string and OS error for CouchKVStore::getAllKeys

Previously just the error number was logged which is hard to read
and does not include the OS error in the event of an os-level issue.

This change amends this by making logging follow the rest of
CouchKVStore and get those two bits.

Change-Id: I911fce68738495271b667ba59b255b8c01949d79
Reviewed-on: http://review.couchbase.org/63740
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19305 Log couchstore_open retries at NOTICE rather than INFO 39/63739/2
Will Gardner [Wed, 20 Apr 2016 18:18:19 +0000 (19:18 +0100)]
MB-19305 Log couchstore_open retries at NOTICE rather than INFO

Retries were not previously being logged at a visible level which
means intermittant errors would not be logged. This change bumps
up the logging level to NOTICE.

Change-Id: I0a575e7f7734ab34e7affc4c20f2d3ca9f6bdf27
Reviewed-on: http://review.couchbase.org/63739
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19302 Use visible logging on couchstore_open_local_document error 38/63738/2
Will Gardner [Wed, 20 Apr 2016 18:03:15 +0000 (19:03 +0100)]
MB-19302 Use visible logging on couchstore_open_local_document error

This change ensures that CouchKVStore::readVBState will log an
error at a default visible level in the event that
couchstore_open_local_document fails. In addition any serious error
(i.e. any not DOC_NOT_FOUND) will be logged at WARNING.

Change-Id: I1479fc6ee2c830e5d1b1e52324617a81ae1434fe
Reviewed-on: http://review.couchbase.org/63738
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19301 Include error context in logging in CouchKVStore::rollback 37/63737/2
Will Gardner [Wed, 20 Apr 2016 18:01:18 +0000 (19:01 +0100)]
MB-19301 Include error context in logging in CouchKVStore::rollback

This change ensures both the Couchstore error string and OS error
string is included in rollback logging.

Change-Id: I1a531565030b64d162bac5d818a4c23e83aaca6e
Reviewed-on: http://review.couchbase.org/63737
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-19504: Remove full-eviction variant of ep_perfsuite 12/63712/3
Dave Rigby [Wed, 4 May 2016 14:45:15 +0000 (15:45 +0100)]
MB-19504: Remove full-eviction variant of ep_perfsuite

The full eviction variant of ep_perfsuite is essentially a waste of
time (and prone to intermittent failures).

The perfsuite tests run with persistence disabled (see the call to
stop_persistence at the start of perf_latency. As a consequence the
tests sometimes fail, as items are essentially getting evicted from
memory (and put onto the disk queue) but the queue will never be
persisted, and hence item counts will not be correctly updated. (Note:
this relates to the issues found in MB-19501).

Additionally, even if the tests did work, given we disable persistence
there's essentially no value in the results output as they aren't
representative of what full eviction operations might cost (as no disk
overhead is included).  For these reasons I'm disabling the full
eviction mode of ep_perfsuite. While in the abstract it would be
useful to have numbers from this suite for full eviction, it requires
a reasonable amount of work to make work in a sensible way.

Change-Id: I450dc7297b25d3c6b09408688d5d7706bc26b5fe
Reviewed-on: http://review.couchbase.org/63712
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19501: ep_test_apis.cc: Wait for outstanding flushers to complete 10/63710/3
Dave Rigby [Wed, 4 May 2016 14:14:41 +0000 (15:14 +0100)]
MB-19501: ep_test_apis.cc: Wait for outstanding flushers to complete

There is an issue in how the testsuites wait for items to be flushed
to disk when in full eviction mode, which results in intermittent test
failures are the item count (curr_items) is under-reported.

The issue is that we only wait for the {{ep_queue_size}} stat to be
zero (assuming that means that all items are now on disk), however in
the case of full eviction we need to update the count of how many
items are in a vBucket (HashTable) based on how many are now on disk;
with is done after the disk commit - see
EventuallyPersistentStore::commit().

We also need to to wait for any outstanding flushes to disk to
complete - specifically so when in full eviction mode we have waited
for the item counts in each vBucket to be synced with the number of
items on disk.

Change-Id: Ibbffddd137fc17f1dfe80edcc35e177ee6915763
Reviewed-on: http://review.couchbase.org/63710
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19255: Record time for all DCP consumer messages 07/63307/12
Daniel Owen [Mon, 25 Apr 2016 13:06:41 +0000 (14:06 +0100)]
MB-19255: Record time for all DCP consumer messages

The DCP documentation states that the consumer should see
some sort of message or a No-Op message in a period
equal to twice the noop interval otherwise it should close
its connection.  See documentation/commands/no-op.md in
https://github.com/couchbaselabs/dcp-documentation

This patch changes from checking only the receival of a
no-op message to check for recieving the following messages
- deletion
- add stream
- close stream
- deletion
- expiration
- flush
- mutation
- set VBucket state
- snapshot Marker
- stream end

Change-Id: Ib2268dba339cbf3701f3c7782ee8256bddc79ba3
Reviewed-on: http://review.couchbase.org/63307
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-19354 Make flow-control-manager thread safe 09/63409/4
Daniel Owen [Wed, 27 Apr 2016 10:31:54 +0000 (11:31 +0100)]
MB-19354 Make flow-control-manager thread safe

During shutdown the ConnsLock is not available.
Therefore make the flow-control-manager thread
safe as opposed to relying on an external lock.

Change-Id: Ia271a650e29983b8022850edfa193299ddd83f84
Reviewed-on: http://review.couchbase.org/63409
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-19255: Modify return types in two DcpMockProducer functions 03/63303/3
Daniel Owen [Mon, 25 Apr 2016 11:20:35 +0000 (12:20 +0100)]
MB-19255: Modify return types in two DcpMockProducer functions

Relate to comments from the following commit that needed to be
addressed: e56a8faa594342eae4e8bfed83ee87bc5db5317f

For getNoopPendingRecv() and getNoopEnabled() return "bool"
instead of Couchbase::RelaxedAtomic<bool>.

Change-Id: I59f5505862d39521f37f424a462498271e8d01a3
Reviewed-on: http://review.couchbase.org/63303
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-19255: Simplify the control-flow of maybeSendNoop 29/63229/8
Daniel Owen [Fri, 22 Apr 2016 12:14:15 +0000 (13:14 +0100)]
MB-19255: Simplify the control-flow of maybeSendNoop

Change-Id: If0932bcc2faffdb633ca80c6ffa42a34683e9ef4
Reviewed-on: http://review.couchbase.org/63229
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-19255: Only update sendTime if successfully send noop 72/63172/12
Daniel Owen [Thu, 21 Apr 2016 09:51:48 +0000 (10:51 +0100)]
MB-19255: Only update sendTime if successfully send noop

In the maybeSendNoop function when a DCP producer attempts
to send a noop to a consumer it can receive back
ENGINE_SUCCESS or ENGINE_E2BIG.

We should only set pendingRecv to true and update the
last sendTime if ENGINE_SUCCESS is returned.

Change-Id: Ice8a66dcae35505d7bab7d261f080d5ffb95c8e3
Reviewed-on: http://review.couchbase.org/63172
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-19183: Clean-up connection handlers in DCP tests 52/62952/2
Daniel Owen [Mon, 18 Apr 2016 12:31:47 +0000 (13:31 +0100)]
MB-19183: Clean-up connection handlers in DCP tests

Disconnect the connection handler cleanly instead of
just calling reset.

This results in the mock_cookie_release being called
which will free the cookie.  Therefore we do not
need to explictly delete the cookie at the end of
each test.

Change-Id: Icd1ea5732045c350471c067c6685a2364cd2a2c2
Reviewed-on: http://review.couchbase.org/62952
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-16656: Send snapshotEnd as highSeqno for replica vb in GET_ALL_VB_SEQNOS call 25/62925/3 v4.1.1
Manu Dhundi [Fri, 15 Apr 2016 20:54:42 +0000 (13:54 -0700)]
MB-16656: Send snapshotEnd as highSeqno for replica vb in GET_ALL_VB_SEQNOS call

For replica vbucket we must send snapshotEnd received in the last snapshotMarker
as the high seqno. Sending lastClosedChkSeqno can cause problems for view engine
which builds an index from replica vbucket.

Previously this was sent correctly in seqno stats, now adding it for
GET_ALL_VB_SEQNOS as well.

Change-Id: Ifad267521184c4976e1cb194e6814b56963298b0
Reviewed-on: http://review.couchbase.org/62925
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-16656: Send snapshotEnd as highSeqno for replica vb in GET_ALL_VB_SEQNOS call 29/62929/3
Manu Dhundi [Fri, 15 Apr 2016 22:04:48 +0000 (15:04 -0700)]
MB-16656: Send snapshotEnd as highSeqno for replica vb in GET_ALL_VB_SEQNOS call

For replica vbucket we must send snapshotEnd received in the last snapshotMarker
as the high seqno. Sending lastClosedChkSeqno can cause problems for view engine
which builds an index from replica vbucket.

Previously this was sent correctly in seqno stats, now adding it for
GET_ALL_VB_SEQNOS as well.

Change-Id: I245f345f2f85fe693831f0dbdfdeede31ae638ba
Reviewed-on: http://review.couchbase.org/62929
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
4 years agoMB-19029: Wait for consumer task to run before checking backoffs 08/62808/5
Manu Dhundi [Wed, 13 Apr 2016 22:43:08 +0000 (15:43 -0700)]
MB-19029: Wait for consumer task to run before checking backoffs

Backfoff stat is updated only after consumer processor task runs. Hence it
is better to wait for the stat than to query it for a particular value.

Change-Id: I8c88f76d5ac6d6623ae5b3681438a3dd6c05ea65
Reviewed-on: http://review.couchbase.org/62808
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Manu Dhundi <manu@couchbase.com>
4 years agoMB-14988: Add helper funcs write_items() and write_items_upto_mem_perc() 79/62679/5
Manu Dhundi [Tue, 12 Apr 2016 21:34:29 +0000 (14:34 -0700)]
MB-14988: Add helper funcs write_items() and write_items_upto_mem_perc()

In ep_testsuite umpteen times we write a bunch of items or items upto a
memory usage on server. Having helper functions do this will avoid
unncessary repetition of code.

Change-Id: Ia5b940390f35c828c0c208a79c6af7d5dbdc2bf4
Reviewed-on: http://review.couchbase.org/62679
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-19124: Disconnect existing connection of the same name 44/62644/23
Daniel Owen [Mon, 11 Apr 2016 10:01:59 +0000 (11:01 +0100)]
MB-19124: Disconnect existing connection of the same name

If a DCP connection request has the same name as an
existing connections, mark the existing connection
as "doDisconnect" before creating the new connection.
ns_server relies on this behaviour.

Change-Id: I008253ad9247a56db21baaaccce9f24df5ff7711
Reviewed-on: http://review.couchbase.org/62644
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-19086: Do not read from backfill if the scanBuffer is full 63/62463/6
Manu Dhundi [Fri, 8 Apr 2016 18:15:29 +0000 (11:15 -0700)]
MB-19086: Do not read from backfill if the scanBuffer is full

While backfilling, we want to limit the amount of bytes read in one
run of the backfill (for one vbucket). This commit address a bug in
that logic.

To test this we need to check how many times the backfill task runs.
To do this as part of the commit, code to read histogram stats in
ep_testsuite is added.

Change-Id: Ia5f653325583ebae32e1b858924c29327e035318
Reviewed-on: http://review.couchbase.org/62463
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-19033: Release cookie if newProducer/newConsumer fails 15/62515/8
Daniel Owen [Mon, 4 Apr 2016 15:10:23 +0000 (16:10 +0100)]
MB-19033: Release cookie if newProducer/newConsumer fails

In the dcpOpen function we first call reserveCookie on a
connection object and then call newProducer/newConsumer
to create a new connection.

If the newProducer/newConsumer fails to create a new
connection and instead returns a nullptr we must call
releaseCookie before returning ENGINE_DISCONNECT.

A test is also added to ep_testsuite_dcp that provide a
regression test for the fix.

Change-Id: I1aceea01ae0e764f4118e4a5e5b29e2aa8ff30f0
Reviewed-on: http://review.couchbase.org/62515
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMerge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock' 98/62598/1
abhinavdangeti [Fri, 8 Apr 2016 03:03:22 +0000 (20:03 -0700)]
Merge remote-tracking branch 'couchbase/3.0.x' into 'couchbase/sherlock'

couchbase/3.0.x:
|\
| * 771859f MB-19093 [BP]: [ActiveStream] Address potential lock-inversion scenarios
| * 6811030 MB-19075: Remove printing of empty string in CouchKVStore::getMulti()

Change-Id: I06717dcaad2b764582ebe378a6f9023f7ca76d88

4 years agoMB-19093 [BP]: [ActiveStream] Address potential lock-inversion scenarios 98/62498/4
abhinavdangeti [Fri, 15 Jan 2016 16:36:52 +0000 (08:36 -0800)]
MB-19093 [BP]: [ActiveStream] Address potential lock-inversion scenarios

Acquire vbucket state lock only when really necessary
in the ActiveStream context. Also avoid acquiring one
lock within the other wherever possible in the ActiveStream
context again.

This change is to avert potential deadlocks due to
lock inversion that will be induced by upcoming changes,
here are the scenarios:
(i)     Locking between streamsMutex, streamMutex and
        vb_stateLock in the set operation - handle
        response scenario.
        (http://factory.couchbase.com/job/ep-engine-threadsanitizer-master/1225/console)
(ii)    In case of a set operation, vb_stateLock is
        acquired and then streamMutex is acquired for
        notification. During markDiskSnapshot, the
        streamMutex is acquired before the vb_stateLock
        lock is acquired.
        (http://factory.couchbase.com/job/ep-engine-threadsanitizer-master/1268/console)

(Already reviewed at: http://review.couchbase.org/58557)

Change-Id: I5e5a3e2cc5ba9ae17090e1a3ee4bde100d305f1c
Reviewed-on: http://review.couchbase.org/62498
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-16337: Enable test_access_scanner unit test 20/62520/4
Norair Khachiyan [Wed, 6 Apr 2016 21:42:03 +0000 (14:42 -0700)]
MB-16337: Enable test_access_scanner unit test

This test was disabled few weeks ago as it failed on CV runs.
With latest builds the failure is not reproducable neither on
local AIX box, nor on Ubuntu container.
The failure is also not reproducable on jenkins runs.
Enabling this test on regular CV runs for now.

Change-Id: Ic7852a0d90c01c9571de596567af7d61b3d31b92
Reviewed-on: http://review.couchbase.org/62520
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoAddress data race seen with the module dcp tests 27/62527/2
abhinavdangeti [Wed, 6 Apr 2016 23:56:02 +0000 (16:56 -0700)]
Address data race seen with the module dcp tests

16:50:07 WARNING: ThreadSanitizer: data race (pid=36686)
16:50:07   Write of size 8 at 0x7d0c00003ad8 by main thread (mutexes: write M130807):
16:50:07     #0 operator delete(void*) <null> (ep-engine_dcp_test+0x00000047357b)
16:50:07     #1 std::_Rb_tree<void const*, std::pair<void const* const, DcpConsumer*>, std::_Select1st<std::pair<void const* const, DcpConsumer*> >, std::less<void const*>, std::allocator<std::pair<void const* const, DcpConsumer*> > >::_M_erase_aux(std::_Rb_tree_const_iterator<std::pair<void const* const, DcpConsumer*> >) /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/ext/new_allocator.h:110 (ep-engine_dcp_test+0x00000052ad65)
16:50:07     #2 FlowControl::~FlowControl() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/dcp/flow-control.cc:44 (ep-engine_dcp_test+0x000000528ea1)
16:50:07     #3 DcpConsumer::~DcpConsumer() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/dcp/consumer.cc:133 (ep-engine_dcp_test+0x00000051b223)
16:50:07     #4 DcpConsumer::~DcpConsumer() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/dcp/consumer.cc:130 (ep-engine_dcp_test+0x00000051b815)
16:50:07     #5 Processer::~Processer() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/atomic.h:272 (ep-engine_dcp_test+0x00000051a51c)
16:50:07     #6 ~SingleThreadedRCPtr /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/atomic.h:272 (ep-engine_dcp_test+0x0000005ba662)
16:50:07     #7 ExecutorPool::stopTaskGroup(unsigned long, task_type_t, bool) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorpool.cc:594 (ep-engine_dcp_test+0x0000005badee)
16:50:07     #8 EventuallyPersistentStore::~EventuallyPersistentStore() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/ep.cc:565 (ep-engine_dcp_test+0x00000054f302)
16:50:07     #9 EventuallyPersistentEngine::~EventuallyPersistentEngine() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/ep_engine.cc:6349 (ep-engine_dcp_test+0x00000059bf7a)
16:50:07     #10 EvpDestroy(engine_interface*, bool) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/ep_engine.cc:148 (ep-engine_dcp_test+0x00000057e2b7)
16:50:07     #11 DCPTest::TearDown() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/tests/module_tests/dcp_test.cc:126 (ep-engine_dcp_test+0x0000004e5737)
16:50:07     #12 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/googletest/googletest/src/gtest.cc:2402 (ep-engine_dcp_test+0x0000006d7792)
16:50:07     #13 testing::Test::Run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/googletest/googletest/src/gtest.cc:2482 (ep-engine_dcp_test+0x0000006a4e31)
16:50:07     #14 testing::TestInfo::Run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/googletest/googletest/src/gtest.cc:2656 (ep-engine_dcp_test+0x0000006a6e0b)
16:50:07     #15 testing::TestCase::Run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/googletest/googletest/src/gtest.cc:2774 (ep-engine_dcp_test+0x0000006a78ba)
16:50:07     #16 testing::internal::UnitTestImpl::RunAllTests() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/googletest/googletest/src/gtest.cc:4649 (ep-engine_dcp_test+0x0000006b84f3)
16:50:07     #17 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/googletest/googletest/src/gtest.cc:2402 (ep-engine_dcp_test+0x0000006d8902)
16:50:07     #18 testing::UnitTest::Run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/googletest/googletest/src/gtest.cc:4257 (ep-engine_dcp_test+0x0000006b7ad0)
16:50:07     #19 main /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/googletest/googletest/include/gtest/gtest.h:2237 (ep-engine_dcp_test+0x0000004e2cc0)
16:50:07
16:50:07   Previous read of size 8 at 0x7d0c00003ad8 by thread T5 (mutexes: write M130950):
16:50:07     #0 DcpFlowControlManagerAggressive::handleDisconnect(DcpConsumer*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/dcp/flow-control-manager.cc:211 (ep-engine_dcp_test+0x00000052aea8)
16:50:07     #1 FlowControl::~FlowControl() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/dcp/flow-control.cc:44 (ep-engine_dcp_test+0x000000528ea1)
16:50:07     #2 DcpConsumer::~DcpConsumer() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/dcp/consumer.cc:133 (ep-engine_dcp_test+0x00000051b223)
16:50:07     #3 DcpConsumer::~DcpConsumer() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/dcp/consumer.cc:130 (ep-engine_dcp_test+0x00000051b815)
16:50:07     #4 Processer::~Processer() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/atomic.h:272 (ep-engine_dcp_test+0x00000051a51c)
16:50:07     #5 ExecutorThread::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/atomic.h:325 (ep-engine_dcp_test+0x0000005bfa5b)
16:50:07     #6 launch_executor_thread(void*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:33 (ep-engine_dcp_test+0x0000005bf885)
16:50:07     #7 platform_thread_wrap(void*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/platform/src/cb_pthreads.cc:54 (libplatform.so.0.1.0+0x00000000568b)

Change-Id: I84860030d78d0bc2e5010255e8ba30bec6109719
Reviewed-on: http://review.couchbase.org/62527
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
4 years agoMB-19033 Allow connections of the same name 61/62361/18
Daniel Owen [Mon, 4 Apr 2016 14:52:25 +0000 (15:52 +0100)]
MB-19033 Allow connections of the same name

It was thought that connections would not exist
in the connection map with the same name.

However the view code does create connections of
the same name.  As we index on the connection object
(cookie) as opposed to the name, we can relax the
constraints to allow connections of the same name.

Change-Id: I721c4d409d7f02119af534cbf1d887d9e65246c3
Reviewed-on: http://review.couchbase.org/62361
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-17230: Do not allow active stream creation over a dead VB 70/62470/5
abhinavdangeti [Wed, 6 Apr 2016 19:06:14 +0000 (12:06 -0700)]
MB-17230: Do not allow active stream creation over a dead VB

+ Active and Notifier streams will not be created for
  a vbucket whose state is DEAD, error response will be
  ENGINE_NOT_MY_VBUCKET.
+ Close streams after the vbucket's state has been changed,
  as part of the setVBucketState.
+ Note that acquiring the producer's streamsMutex within
  a vbucket's stateLock shouldn't cause a lock inversion as
  this pattern is followed in several other code paths, for
  example the set->queueDirty->notifyConnection codepath.
+ test case

Change-Id: I905787a74d6eafc2175f1635197bbf825988b8fb
Reviewed-on: http://review.couchbase.org/62470
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-17631: Fix intermittently failing CV test case 'test_est_vb_move' 25/62025/3
Norair Khachiyan [Fri, 25 Mar 2016 23:39:12 +0000 (16:39 -0700)]
MB-17631: Fix intermittently failing CV test case 'test_est_vb_move'

This fix has been checked-in in master branch few weeks ago. The test
occasionally fails in watson branch as well, and this commit will
prevent that.
Here are some more details about underlying problem.
1. Thread A.
Testcase (actually couple of them) fails in
"CouchKVStore::getNumPersistedDeletes" proc right after calling
openDB( ) api with RDONLY mode for bucket 0. It fails because
this call does not complete with success, and the reason is that
couch-db "0.couch.1" file attempted to be opened does not exist,
to be exact has not been created yet, and is not being created in
this call because the api has RDONLY mode.
2. Thread B.
This thread runs "test_setup" proc, as part of which "0.couch.1"
file for bucket 0 should be created. What we do here is schedule
task which will create file "0.couch.1" and without waiting till
it completes start testcase itself in Thread A. And so basically
there is a racing condition between threads A & B. Though request
to schedule task for file "0.couch.1" creation in Thread B happens
before we start testcase in Thread A, this request is not being
completed by the time openDB call is being issued in Thread A.
Code committed with check-in allows to fix the testcase, and in
particularly wait till couch-db file "0.couch.1" will be created,
so testcase can check that the content of this file is correct
(numPersistedDeletes == 0) for bucket 0.

Change-Id: I9573ffb86de770f98c366e13fe2866bd0002df21
Reviewed-on: http://review.couchbase.org/62025
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-19075: Remove printing of empty string in CouchKVStore::getMulti() 25/62325/7
Sriram Ganesan [Fri, 1 Apr 2016 23:12:10 +0000 (16:12 -0700)]
MB-19075: Remove printing of empty string in CouchKVStore::getMulti()

In case of an error in opening a file, an error message is logged.
But the string that is supposed to hold the name of the file is
not populated, thus resulting in an empty string getting printed.
Remove the string from printed as openDB already prints the name
of the file in case of an open failure.

Change-Id: Ife3aec8381ead4f2e0b84c921a3781efa39a2126
Reviewed-on: http://review.couchbase.org/62325
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-14988: Split TestDcpConsumer::run() into multiple functions 08/62408/4
Manu Dhundi [Tue, 5 Apr 2016 18:53:48 +0000 (11:53 -0700)]
MB-14988: Split TestDcpConsumer::run() into multiple functions

Run method in TestDcpConsumer opens a DCP connection, opens streams
and streams all items from the DCP producer. However, there are cases
where you just want to open a connection but not open a stream or
open a connection and a stream but not stream items.

Hence making run() more modular helps more granular testing of DCP.

Change-Id: I9fcf3a62b6d0fa2cdba18dce4c2e3513143c5669
Reviewed-on: http://review.couchbase.org/62408
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
4 years agoMB-18974: Fix intermittant failure in test_expiration_on_warmup (3) 40/62440/2
Dave Rigby [Tue, 5 Apr 2016 14:45:21 +0000 (15:45 +0100)]
MB-18974: Fix intermittant failure in test_expiration_on_warmup (3)

There is an *another* issue with the expiration_on_warmup test which
can cause it to fail intermittently - investigation shows that the
curr_items stat is not synchronous with respect to expired_pager.

Specifically, for non-temporary, expired items we call
unlocked_softDelete (soft-marking the item as deleted in the
hashtable), and then call queueDirty to queue a deletion, and then
increment the expired stat. Only when that delete is actually
persisted and the deleted callback is invoked -
PeristenceCallback::callback(int&) - is curr_items finally
decremented.

Therefore we need to wait for the flusher to settle (i.e. delete
callback to be called) for the curr_items stat to be accurate.

This is hopefully the last patch needed to address this issue (however
I did say that last time :/)

Change-Id: Iaec44e4149c6fef549036870b31c9e0631f3949b
Reviewed-on: http://review.couchbase.org/62440
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
4 years agoMB-19043: Fix memory leak in gencode 22/62422/2
Dave Rigby [Tue, 5 Apr 2016 08:08:52 +0000 (09:08 +0100)]
MB-19043: Fix memory leak in gencode

Change-Id: Ia696ac5f808cfb04b9ef7f9ebbf9360425925d54
Reviewed-on: http://review.couchbase.org/62422
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
4 years agoMB-18940: Make ExecutorPool::get() thread-safe 99/62299/4
Dave Rigby [Fri, 1 Apr 2016 15:46:17 +0000 (16:46 +0100)]
MB-18940: Make ExecutorPool::get() thread-safe

ExecutorPool::get() has a similar problem as
MemoryTracker::getInstance() (although not as bad) - there is a
potential data-race which could result in multiple ExecutorPools being
created.

The issue is that the 'instance' pointer is not atomic - this means
that the compiler /could/ reorder the assignment and object creation
so the object is assigned /before/ it is created. Paraphrasing from a
Dr Dobbs article[1] about double-checked locking:

    Singleton* Singleton::instance() {
       if (pInstance == 0) {
           Lock lock;
           if (pInstance == 0) {
               pInstance = // Step 3
                  operator new(sizeof(Singleton)); // Step 1
               new (pInstance) Singleton; // Step 2
           }
       }
       return pInstance;
    }

    ... consider the following sequence of events:

    * Thread A enters instance, performs the first test of pInstance,
      acquires the lock, and executes the statement made up of Steps 1
      and 3. It is then suspended. At this point, pInstance is not
      null, but no Singleton object has yet been constructed in the
      memory pInstance points to.

    * Thread B enters instance, determines that pInstance is not null,
      and returns it to instance's caller. The caller then
      dereferences the pointer to access the Singleton that, oops, has
      not yet been constructed.

Fix in the same way as MemoryTracker::getInstance() - make `instance`
be an atomic variable and perform the object creation using a
temporary variable; which ensures that the ordering is correct.

[1]: http://www.drdobbs.com/cpp/c-and-the-perils-of-double-checked-locki/184405726

Change-Id: Ic8b5c29e0c404e2a4ce0b1e62545776f97754acc
Reviewed-on: http://review.couchbase.org/62299
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
4 years agoMB-19002 Address data race in TapConnMap::shutdownAllConnections 47/62347/2
Daniel Owen [Mon, 4 Apr 2016 09:55:23 +0000 (10:55 +0100)]
MB-19002 Address data race in TapConnMap::shutdownAllConnections

Ensure the access to "deadConnections" and "all" are
protected by first taking the connsLock mutex lock.

Change-Id: I1d19ce610dc3b35edcad124c1961c5380e84eb8d
Reviewed-on: http://review.couchbase.org/62347
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
4 years agoMB-18940: Make MemoryTracker::getInstance() thread-safe 86/62286/10
Dave Rigby [Fri, 1 Apr 2016 12:26:27 +0000 (13:26 +0100)]
MB-18940: Make MemoryTracker::getInstance() thread-safe

The MemoryTracker::getInstance() function, used to obtain a pointer to
the Memory Tracker singleton is not thread-safe, and may result in two
(or more!)  MemoryTracker objects being created. This in turn can
result in multiple stats threads bring spawned, which at shutdown time
can cause a crash when trying to join threads which we no longer have
correct threadIds for.

Solve this problem by making getInstance() thread-safe, by making the
instance variable atomic and using double-locking when creating the
singleton.

(As an aside, in C++11 we /should/ be able to simplify this code
 significantly by using 'magic statics' - e.g.:

    MemoryTracker* MemoryTracker::getInstance() {
        static MemoryTracker instance;
        return &instance;
    }

 However they are /not/ supported in MSVC 2013 - see 'Magic statics'
 row in [Support For C++11 Features][1] - so we need to use a more
 involved approach for now).

[1]: https://msdn.microsoft.com/en-us/library/hh567368(v=vs.120).aspx#concurrencytable

Change-Id: Id52c3fcd5430a8726fac03a05ff602ea073b6084
Reviewed-on: http://review.couchbase.org/62286
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-17042: Do not permit duplicate DCP producers/consumers 31/61931/12
Daniel Owen [Thu, 24 Mar 2016 12:24:28 +0000 (12:24 +0000)]
MB-17042: Do not permit duplicate DCP producers/consumers

If an attempt is made to create a new DCP producer/
consumer with the same name as an existing producer/
consumer, or there is already a producer/consumer
associated with the cookie, then return ENGINE_DISCONNECT.

Change-Id: I0ba523bae2045d62d56b50f4b22d103508b44392
Reviewed-on: http://review.couchbase.org/61931
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18974: Fix intermittant failure in test_expiration_on_warmup (2) 96/62296/2
Dave Rigby [Fri, 1 Apr 2016 15:34:40 +0000 (16:34 +0100)]
MB-18974: Fix intermittant failure in test_expiration_on_warmup (2)

There is an additional issue with the expiration_on_warmup test which
can cause it to fail intermittently - we increment the 'expired_pager'
statistic /before/ we actually expire the item. This means there is a
potential window in the test where the expired_pager statistic will
report '1', yet curr_items will not report 0 (as the item hasn't yet
been removed).

Fix by deferring the expired_pager statistic update until we have
actually expired the item.

Change-Id: Iab7bf30edbea049efccd0746d6208218e931c205
Reviewed-on: http://review.couchbase.org/62296
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18135: Ensure manageConnections cleans up all deadConnections 42/61942/11
Daniel Owen [Thu, 24 Mar 2016 15:02:13 +0000 (15:02 +0000)]
MB-18135: Ensure manageConnections cleans up all deadConnections

Dead connections are cleaned-up by manageConnections.
manageConnections is invoked in the run() of ConnManager,
which is a NONIO Task.  The task has a MIN_SLEEP_TIME of 2s,
which means dead connections will only be clean-up at most
every two seconds.

The ConnManager task stops being put back on the run list if
isShutdown is set to true.

isShutdown is set to true at the start of the destroy function,
just before calling shutdownAllConnections. shutdownAllConnections
will cause connections to be disconnected and hence added to the
deadConnections list.  It is therefore essential that
manageConnections are called to clean-up these dead connections.

It was originally thought that the clean-up of dead connections
could be best achieved by calling manageConnections at the end
of the shutdownAllConnections.  However this is not sufficient.

Instead we need to ensure that ConnManager task keeps running
until the all list and deadConnections list are empty.

In addition the shutdownAllConnections function clears the
all list and map_.  It then calls releaseReference on each
connection waiting for them to disconnect.  However the
disconnect function checks to see if the connection is in the
map_ before disconnecting.  Therefore we should just let the
connections disconnect using the disconnect function, and the
connections will then be added to the deadConnections list and
cleaned-up in manageConnections.

Change-Id: I7e6c577f30b862e22437f381a3c0106cb72b3e96
Reviewed-on: http://review.couchbase.org/61942
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18974: Fix intermittant failure in test_expiration_on_warmup 34/62234/2
Dave Rigby [Thu, 31 Mar 2016 11:28:38 +0000 (11:28 +0000)]
MB-18974: Fix intermittant failure in test_expiration_on_warmup

This test attempts to check that after a restart, the expiry pager is
run and will remove items which have now expired. However, it checks
the 'ep_num_expiry_pager_runs' to determine when the expiry pager has
been run - except this stat is incremented at the /start/ of the
expiry pager Task, and hence there is a race condition whereby even if
this stat is non-zero it may not yet have visited all items.

Fix by instead checking the 'ep_expired_pager' stat which counts how
many items have been expired - and crucially is updated when the item
is actually expired.

Change-Id: I0bc3fb79b8bed78e88d1d6ed7fd468b113705846
Reviewed-on: http://review.couchbase.org/62234
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18666 Add bytes written/read stats for compaction 43/61943/9
Will Gardner [Thu, 24 Mar 2016 16:18:40 +0000 (16:18 +0000)]
MB-18666 Add bytes written/read stats for compaction

The current io_total_read_bytes and io_total_write_bytes
stats do not currently include the destination file
during compaction.

This commit amends those stats to include compaction and
additionally adds separate stats for the total bytes
written/read for compaction alone, io_compaction_read_bytes
and io_compaction_write_bytes.

Change-Id: I8c33cce5d2049329f88b445e9f7812b3310a12c4
Reviewed-on: http://review.couchbase.org/61943
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18956: Fix data race with ActiveStream::takeoverStart 67/62167/2
Dave Rigby [Wed, 30 Mar 2016 13:13:33 +0000 (14:13 +0100)]
MB-18956: Fix data race with ActiveStream::takeoverStart

As reported by ThreadSanitizer:

    WARNING: ThreadSanitizer: data race (pid=64147)
    Read of size 4 at 0x7d480000b9d4 by thread T11 (mutexes: write M13543, read M17306):
    #0 ActiveStream::addStats(void (*)(char const*, unsigned short, char const*, unsigned int, void const*), void const*) ep-engine/src/dcp/stream.cc:545 (ep.so+0x000000076f03)
    #1 DcpProducer::addStats(void (*)(char const*, unsigned short, char const*, unsigned int, void const*), void const*) ep-engine/src/dcp/producer.cc:710 (ep.so+0x00000006b1d7)
    #2 ConnStatBuilder::operator()(SingleThreadedRCPtr<ConnHandler>&) ep-engine/src/ep_engine.cc:3880 (ep.so+0x0000000e58d1)
    #3 EventuallyPersistentEngine::doDcpStats(void const*, void (*)(char const*, unsigned short, char const*, unsigned int, void const*)) ep-engine/src/ep_engine.cc:4137 (ep.so+0x0000000c518a)
    ...
    Previous write of size 4 at 0x7d480000b9d4 by main thread (mutexes: write M18112):
    #0 ActiveStream::transitionState(stream_state_t) ep-engine/src/dcp/stream.cc:1033 (ep.so+0x000000073888)
    #1 ActiveStream::setVBucketStateAckRecieved() ep-engine/src/dcp/stream.cc:394 (ep.so+0x000000076185)
    #2 DcpProducer::handleResponse(protocol_binary_response_header*) ep-engine/src/dcp/producer.cc:608 (ep.so+0x00000006a3c3)

Change-Id: Idfd91e996aa94be0dab3e8137e189e2f5435be41
Reviewed-on: http://review.couchbase.org/62167
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years ago[BP] Increase grace period for tap tests & xdcr tests 74/62174/3
abhinavdangeti [Wed, 16 Mar 2016 19:34:54 +0000 (12:34 -0700)]
[BP] Increase grace period for tap tests & xdcr tests

- Tap tests: From 120 to 180
- XDCR tests: From 30 to 120

Change-Id: Ie2228aa53684e4ad94ea5d8979521a77bca8b30b
Reviewed-on: http://review.couchbase.org/62174
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-12029: Add memcached log message when we delete access log file 39/62139/2
Manu Dhundi [Tue, 29 Mar 2016 23:29:45 +0000 (16:29 -0700)]
MB-12029: Add memcached log message when we delete access log file

Change-Id: I33169acdd1ebb7ead0a30f186fdc591fcb4cbab6
Reviewed-on: http://review.couchbase.org/62139
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Sriram Ganesan <sriram@couchbase.com>
4 years agoMB-18679: Check for vbucket file creation or deletion 58/61658/11
Sriram Ganesan [Thu, 17 Mar 2016 18:01:32 +0000 (11:01 -0700)]
MB-18679: Check for vbucket file creation or deletion

Before making statistics call to a vbucket file, we need
to ensure that call is not made when the vbucket is
being created or deleted.

Change-Id: Id20fbffd93dc502b7584f0e4f1244c2be88de1a7
Reviewed-on: http://review.couchbase.org/61658
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoFix memory errors in tests detected by AddressSanitizer 44/61644/2
Dave Rigby [Thu, 17 Mar 2016 15:04:08 +0000 (15:04 +0000)]
Fix memory errors in tests detected by AddressSanitizer

Fix two issues detected by AddressSanitizer:

* Incorrect size used for checkpoint_it for TAO_CHECKPOINT_START
* Incorrect offset for byseq in test_est_vb_move

Change-Id: Ife2b49de0a978c133b9735ce4f8332dc25880569
Reviewed-on: http://review.couchbase.org/61644
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18650: Remove 250ms wait in statsThread - 50% speedup in tests 93/61493/4
Dave Rigby [Tue, 15 Mar 2016 16:49:24 +0000 (16:49 +0000)]
MB-18650: Remove 250ms wait in statsThread - 50% speedup in tests

The MemoryTracker stats thread updates statistics every 250ms. To
implement this it uses usleep() to sleep for 250ms between
updates. This has the disadvantage that it only checks if it should
shut itself down between sleeps; which has the consequence that
ep_engine essentially takes a minimum of 250ms to shutdown.

Change the implementation of this to use a timed condition variable,
which allows us to immediately notify the stats thread that it should
shutdown, but still allows us to wait for 250ms between stat updates.

The net effect of this is *all* shutdowns of ep_engine are 250ms
faster. For ep_testsuite.so alone (which has 163 testcases) this
reduces the runtime from 79s to 32s. Similar speedups are seen on
other suites. In total this reduces the total runtime of ep-engine's
unit tests (measured as `ctest -j8`) from:

    Total Test time (real) =  95.15 sec

to:

    Total Test time (real) =  62.42 sec

Change-Id: I021fd4169a386175299df5a335363e46c345d58c
Reviewed-on: http://review.couchbase.org/61493
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18510: Force a copy of cachedTableJSON (std::string) 88/61488/5
Jim Walker [Tue, 15 Mar 2016 14:39:12 +0000 (14:39 +0000)]
MB-18510: Force a copy of cachedTableJSON (std::string)

ThreadSanitiser flagged that there is a data-race accessing
the cachedTableJSON string in the failover tables. The code
is correct but we are falling foul of an "optimisation" that
certain implementations[1] of std::string perform, that is to
provide a COW implementation. Thus we look like we are creating
a copy of an object, but under the covers a shared string is
being accessed.

A fix is to explictly construct a new std::string which is a copy
of cachedTableJSON.

TSAN output:

==================
WARNING: ThreadSanitizer: data race (pid=18190)
  Write of size 8 at 0x7d100009e018 by thread T10:
    #0 operator delete(void*) <null>:0 (engine_testapp+0x00000009378b)
    #1 <null> <null>:0 (libstdc++.so.6+0x0000000cd8af)
    #2 VBucket::~VBucket() /home/couchbase/couchbase/ep-engine/src/vbucket.cc:143 (ep.so+0x0000003651c7)
    #3 RCPtr<VBucket>::swap(VBucket*) /home/couchbase/couchbase/ep-engine/src/atomic.h:245 (ep.so+0x00000008c37c)
    #4 RCPtr<VBucket>::reset(VBucket*) /home/couchbase/couchbase/ep-engine/src/atomic.h:198 (ep.so+0x000000212995)
    #5 VBucketMemoryDeletionTask::run() /home/couchbase/couchbase/ep-engine/src/ep.cc:235 (ep.so+0x0000002126f1)
    #6 ExecutorThread::run() /home/couchbase/couchbase/ep-engine/src/executorthread.cc:115 (ep.so+0x0000002bdb8c)
    #7 launch_executor_thread(void*) /home/couchbase/couchbase/ep-engine/src/executorthread.cc:33 (ep.so+0x0000002bd3b8)
    #8 CouchbaseThread::run() /home/couchbase/couchbase/platform/src/cb_pthreads.cc:54 (libplatform.so.0.1.0+0x00000000d0a6)
    #9 platform_thread_wrap(void*) /home/couchbase/couchbase/platform/src/cb_pthreads.cc:66 (libplatform.so.0.1.0+0x00000000984d)

  Previous read of size 8 at 0x7d100009e018 by thread T6 (mutexes: write M13528):
    #0 memcmp <null>:0 (engine_testapp+0x00000009428e)
    #1 <null> <null>:0 (libstdc++.so.6+0x0000000cd252)
    #2 KVStore::updateCachedVBState(unsigned short, vbucket_state const&) /home/couchbase/couchbase/ep-engine/src/kvstore.cc:81 (ep.so+0x0000003ae46e)
    #3 _ZN12CouchKVStore15snapshotVBucketEtR13vbucket_stateP8CallbackIJ10KVStatsCtxEEb /home/couchbase/couchbase/ep-engine/src/couch-kvstore/couch-kvstore.cc:980 (ep.so+0x0000003c1a5d)
    #4 EventuallyPersistentStore::persistVBState(Priority const&, unsigned short) /home/couchbase/couchbase/ep-engine/src/ep.cc:1271 (ep.so+0x0000001c2ae1)
    #5 VBStatePersistTask::run() /home/couchbase/couchbase/ep-engine/src/tasks.cc:88 (ep.so+0x000000351b6e)
    #6 ExecutorThread::run() /home/couchbase/couchbase/ep-engine/src/executorthread.cc:115 (ep.so+0x0000002bdb8c)
    #7 launch_executor_thread(void*) /home/couchbase/couchbase/ep-engine/src/executorthread.cc:33 (ep.so+0x0000002bd3b8)
    #8 CouchbaseThread::run() /home/couchbase/couchbase/platform/src/cb_pthreads.cc:54 (libplatform.so.0.1.0+0x00000000d0a6)
    #9 platform_thread_wrap(void*) /home/couchbase/couchbase/platform/src/cb_pthreads.cc:66 (libplatform.so.0.1.0+0x00000000984d)

  Mutex M13528 (0x7db40000a000) created at:
    #0 pthread_mutex_trylock <null>:0 (engine_testapp+0x000000097cb0)
    #1 __gthread_mutex_trylock(pthread_mutex_t*) /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/x86_64-linux-gnu/c++/4.9/bits/gthr-default.h:757 (ep.so+0x0000000983d0)
    #2 std::mutex::try_lock() /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/mutex:146 (ep.so+0x00000009d361)
    #3 LockHolder::trylock() /home/couchbase/couchbase/ep-engine/src/locks.h:81 (ep.so+0x00000009d1c3)
    #4 LockHolder::LockHolder(std::mutex&, bool) /home/couchbase/couchbase/ep-engine/src/locks.h:48 (ep.so+0x00000009cb51)
    #5 EventuallyPersistentStore::persistVBState(Priority const&, unsigned short) /home/couchbase/couchbase/ep-engine/src/ep.cc:1249 (ep.so+0x0000001c26a7)
    #6 VBStatePersistTask::run() /home/couchbase/couchbase/ep-engine/src/tasks.cc:88 (ep.so+0x000000351b6e)
    #7 ExecutorThread::run() /home/couchbase/couchbase/ep-engine/src/executorthread.cc:115 (ep.so+0x0000002bdb8c)
    #8 launch_executor_thread(void*) /home/couchbase/couchbase/ep-engine/src/executorthread.cc:33 (ep.so+0x0000002bd3b8)
    #9 CouchbaseThread::run() /home/couchbase/couchbase/platform/src/cb_pthreads.cc:54 (libplatform.so.0.1.0+0x00000000d0a6)
    #10 platform_thread_wrap(void*) /home/couchbase/couchbase/platform/src/cb_pthreads.cc:66 (libplatform.so.0.1.0+0x00000000984d)

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21334#c47

Change-Id: I91be44ea26750a8ce9b7e6060219b80b12f38ad5
Reviewed-on: http://review.couchbase.org/61488
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18561: Initialize processerNotification variable in DCP consumer. 18/61518/3
Manu Dhundi [Tue, 15 Mar 2016 22:01:40 +0000 (15:01 -0700)]
MB-18561: Initialize processerNotification variable in DCP consumer.

If the boolean variable processerNotification is left un-initialized
then sometimes it gets a default value of true which would result in processor
task not being woken up.

Change-Id: I91450b19fde7a7908c8ee7eb7135155f7d78996a
Reviewed-on: http://review.couchbase.org/61518
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-18661: DcpConsumer to ACK back whenever necessary 95/61295/6
abhinavdangeti [Thu, 10 Mar 2016 23:54:33 +0000 (15:54 -0800)]
MB-18661: DcpConsumer to ACK back whenever necessary

With immediate processing of received items at the consumer,
memcached needs to be notified to visit the consumer's
step function to send buffer acknowledgement to the producer,
whenever data is processed.

Change-Id: I5c0c90a0018ce662746fce46ae68e4a5f604ca60
Reviewed-on: http://review.couchbase.org/61295
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-18650: Optimize get_stat() calls in ep_test_apis 18/61418/7
Dave Rigby [Fri, 11 Mar 2016 16:53:03 +0000 (16:53 +0000)]
MB-18650: Optimize get_stat() calls in ep_test_apis

The get_{str,int,bool,...}_stat() functions in ep_test_apis currently
constuct a map of every stat returned back from ep_engine, and then
find the one stat the user requested at the end. This is one of the
main reasons why the get_XXX_stat calls are expensive, and for example
we have tried to minimise their use in tight loops by only checking
stats every N iterations.

While there are some cases where tests need multiple stats in a given
stat group, the vast majority of use-cases only care about a single
stat. This patch thefore:

* Optimize the get_XXX_stat functions to use a callback which checks
  the key, and only records the result for the requested key. This
  speeds them up, particulary when running under tools like Valgrind
  or ThreadSanitizer.

* For use-cases where all stats from a given group are needed, a new
  get_all_stats() function has been added; which returns a map of the
  stats so the test can check multiple keys.

This reduces the total runtime of ep-engine unit tests under
    ThreadSanitizer from:

        user    22m21.218s

    to:

        user    19m45.035s

    or 12% reduction in user time.

Change-Id: Ic8847943ec631c865734d9e873d453db283e3d86
Reviewed-on: http://review.couchbase.org/61418
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18476: Handle write failures more gracefully in the mutation log 68/61468/5
Sriram Ganesan [Tue, 8 Mar 2016 22:08:50 +0000 (14:08 -0800)]
MB-18476: Handle write failures more gracefully in the mutation log

Log and error message in case of a write failure and remove any unnecessary
asserts in that code path.

Change-Id: Ia2be0f21686bee72596857f7f129105b67834aae
Reviewed-on: http://review.couchbase.org/61468
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoFix unit-test failures on Win32 76/61476/2
Dave Rigby [Tue, 15 Mar 2016 10:58:58 +0000 (10:58 +0000)]
Fix unit-test failures on Win32

Due to CBD-1740 we have had unit test running disabled on Windows. That
issue has now been resolved (by moving to the newer Jenkins on
cv.jenkins). However in the interim a couple of bugs have entered our unit
tests meaning they fail on Windows:

* ep_perfsuite uses the non-standard '%n' formatter which MSVC doesn't
  support. Fix by removing the use of the formatter.

* gethrtime() has lower precision compared to other platforms which means
  two calls can return the same value. This resulted in warmup duration
  being calculated as zero, which prevents the ep_warmup_time stat from
  being generated. Fix by adding the hrtime_period to the warmup duraiton.

All ep-engine unit tests now pass on my local Windows 7 VM.

Change-Id: If9575fdeb4f614c6940f96b4f6fdcff97388192d
Reviewed-on: http://review.couchbase.org/61476
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18669: Disable intermittantly failing test 'stream_req_partial_with_time_sync' 77/61477/2
Dave Rigby [Tue, 15 Mar 2016 11:09:58 +0000 (11:09 +0000)]
MB-18669: Disable intermittantly failing test 'stream_req_partial_with_time_sync'

Temporarily disable this test as it intermittantly fails - e.g.

    ep-engine/tests/ep_testsuite_dcp.cc:559 Test failed: `' (Expected `100', got `25' - Invalid number of deletes)

MB-18669 is tracking this issue and test will be fixed under it.

Change-Id: I8e3982cf8d4db63e2c6a93ffb12e026853ee71b6
Reviewed-on: http://review.couchbase.org/61477
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18650: Speedup test 'producer stream request (DGM)' by ~5x 76/61176/9
Dave Rigby [Wed, 9 Mar 2016 16:49:12 +0000 (16:49 +0000)]
MB-18650: Speedup test 'producer stream request (DGM)' by ~5x

test_dcp_producer_stream_req_dgm operates by storing items such that
mem_used reaches 50% of the total memory quota. However the test
currently runs with a quota of 6MB, creating Items with a null body,
resulting in ~65,000 items being created and later streamed. When
running under ThreadSantizer this test takes 25 seconds.

Given that the test doesn't actually need a particular number of items
(but just needs to ensure that it has enouch such that a desires
residency ratio can be reached), tune the max_size down to
2MB. This creates only much fewer items, and the test now completes in
under 5 seconds (under ThreadSanitizer). Note this still give at least
0.1% precision in residency ratio calculations, which should be
sufficient for this test.

Change-Id: If53ba71adf9d1ed61f82f5e7322d897713db1a32
Reviewed-on: http://review.couchbase.org/61176
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
4 years agoMB-18669: Disable dcp_producer_stream_req_par test temporary 61/61461/3
Norair Khachiyan [Mon, 14 Mar 2016 23:59:06 +0000 (16:59 -0700)]
MB-18669: Disable dcp_producer_stream_req_par test temporary

Temporarily skip this testcase to prevent CV regr failures
till the complete fix will be implemented and committed.

Change-Id: I5893dfe5ce962d6f6ca798fe382fb7c705f93d62
Reviewed-on: http://review.couchbase.org/61461
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-16337: Disable test_acccess_scanner unit test temporarily 60/61460/4
Norair Khachiyan [Mon, 14 Mar 2016 23:54:52 +0000 (16:54 -0700)]
MB-16337: Disable test_acccess_scanner unit test temporarily

Temporarily skip this testcase to prevent CV regr run failures
till the complete fix will be implemented and committed.

Change-Id: I62bac8b90395cf5af5cf788e1ce5deb785ab111e
Reviewed-on: http://review.couchbase.org/61460
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18476: Handle write failures more gracefully in the mutation log 16/61116/11
Sriram Ganesan [Tue, 8 Mar 2016 22:08:50 +0000 (14:08 -0800)]
MB-18476: Handle write failures more gracefully in the mutation log

Log and error message in case of a write failure and remove any unnecessary
asserts in that code path

Change-Id: I50b7e4de4d414e21bf00404a22863baff06c0f4f
Reviewed-on: http://review.couchbase.org/61116
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoCMake: Simplify specification of testsuites for full & value eviction 36/61336/3
Dave Rigby [Thu, 10 Mar 2016 18:05:46 +0000 (18:05 +0000)]
CMake: Simplify specification of testsuites for full & value eviction

Remove a lot of the duplication in the CMakeLists.txt when specifying
the various testsuite tests and their full & value eviction variants.

Change-Id: Ic1d48b0eb8ca6033dba4b5b41850eb6ed1e3bf54
Reviewed-on: http://review.couchbase.org/61336
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18650: Move XDCR tests into ep_testsuite_xdcr 35/61335/3
Dave Rigby [Thu, 10 Mar 2016 15:36:50 +0000 (15:36 +0000)]
MB-18650: Move XDCR tests into ep_testsuite_xdcr

Move the XDCR tests into their our own suite. We are into smaller
payoffs now, the 31 XDCR tests only take ~10s under ThreadSanitizer,
but that's 10s less off the wallclock of ep_testsuite...

Change-Id: I9cd994f78bc0cbf6463e8066bc12f6ded1362733
Reviewed-on: http://review.couchbase.org/61335
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18650: Move checkpoint tests into ep_testsuite_checkpoint 34/61334/3
Dave Rigby [Thu, 10 Mar 2016 15:36:50 +0000 (15:36 +0000)]
MB-18650: Move checkpoint tests into ep_testsuite_checkpoint

Move the checkpoint tests into their our own suite. The the 8 tests
take ~28s to run, by moving them to their own suite we have better
maintainability - ep_testsuite is /only/ 8000 lines now ;) - and allow
them to run in parallel (and hence reduce overall wall-clock time).

Change-Id: I2f59e2f08a9d16c61577af50371d94fc3244cb65
Reviewed-on: http://review.couchbase.org/61334
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18650: Move TAP tests into ep_testsuite_tap 33/61333/3
Dave Rigby [Thu, 10 Mar 2016 15:36:50 +0000 (15:36 +0000)]
MB-18650: Move TAP tests into ep_testsuite_tap

The TAP tests (26 of them) are only ~10% of the number of tests in
ep_testsuite, but take ~20% of the runtime. Move them into their own
suite, for better maintainability and to allow them to run in parallel
(and hence reduce overall wall-clock time).

Change-Id: Idcd1564f59fdd45b4b5172c6eeb3757d01d302b4
Reviewed-on: http://review.couchbase.org/61333
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18561: Add stats to track unacked bytes in consumer. 04/61304/4
Manu Dhundi [Fri, 11 Mar 2016 21:39:56 +0000 (13:39 -0800)]
MB-18561: Add stats to track unacked bytes in consumer.

It would be good to have stats indicating the bytes processed by the
consumer but not acked yet. This helps us to track any delays in sending
acks to the producer.

Change-Id: Iaf803d7943e11729c6d0aee93430e64e2d399a96
Reviewed-on: http://review.couchbase.org/61304
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-18256: Disabling cursor droppping temporarily. 01/61301/4
Manu Dhundi [Fri, 11 Mar 2016 01:29:07 +0000 (17:29 -0800)]
MB-18256: Disabling cursor droppping temporarily.

When cursor dropping closes a slow stream and then consumer reconnects
the closed stream, there is a race conidtion where ns-server also tries to
add the same stream, hence causing problem.

We can solve the problem by switching the stream state to backfill state from
in-memory state. But that change requires good amount of testing before we
commit it into Watson branch.

Temporarily to unblock the QA and other folks we are currently disabling
cursor dropping. We will reopen the cursor dropping issue MB-9897 and
decide if we will fix that for Watson or Spock.

Change-Id: I7d023873eda085bb07cd07a208dd945b584ec092
Reviewed-on: http://review.couchbase.org/61301
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
4 years agoMB-18650: Speedup test 'dcp cursor dropping' by ~15x 75/61175/8
Dave Rigby [Wed, 9 Mar 2016 16:49:12 +0000 (16:49 +0000)]
MB-18650: Speedup test 'dcp cursor dropping' by ~15x

test_dcp_cursor_dropping operates by storing items such that mem_used
reaches 90% of the total memory quota. However the test currently runs
with a quota of 25MB, creating Items with a null body, resulting in
~117,000 items being created and later streamed. When running under
ThreadSantizer this test takes 59 seconds.

Given that the test doesn't actually need a particular number of items
(but just needs to ensure that it has enouch such that a desires
residency ratio can be reached), tune the max_size down to ~2MB. This
creates only ~6000 items. Additionally, only check the statistics
every 100 iterations (stats are expensive). The test now completes in
under 4 seconds (under ThreadSanitizer). Note this still give at least
0.1% precision in residency ratio calculations, which should be
sufficient for this test.

Change-Id: I3f898a4e84446ea3fbd431550f23bae35eb100ce
Reviewed-on: http://review.couchbase.org/61175
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-17955: Move defragmenter test to VBucket-level unit test 92/61092/12
Dave Rigby [Tue, 8 Mar 2016 15:43:57 +0000 (15:43 +0000)]
MB-17955: Move defragmenter test to VBucket-level unit test

The defragmenter test in ep_testsuite has turned out to be not 100%
reliable - most of the time is passes but it does fail perhaps ~5% of
runs. This is unacceptable for a commit-validation test.

After some investigation the problem /appears/ to be related to the
various background (asynchronous) tasks which run in a "full"
ep-engine, i.e. ep_testsuite. These cause problems when trying to
assert that the defragmenter has correctly reduced the overall mapped
memory footprint - essentially we are trying to measure the drop in
mapped memory due to the defragmenter running, but sometimes measure
the effect of mapped memory *increasing* due to a background task
running at the wrong time.

To solve this, the test has been moved to a VBucket-level unit test,
where there are no background tasks running. This means that memory
usage should be stable, and we can purely focus on the effect the
defragmenter has.

Change-Id: Ia364cdecd8cbe2b824c774af87e56344601ac313
Reviewed-on: http://review.couchbase.org/61092
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-18650: Speedup test 'producer stream request backfill no value' by 10x 74/61174/7
Dave Rigby [Wed, 9 Mar 2016 16:49:12 +0000 (16:49 +0000)]
MB-18650: Speedup test 'producer stream request backfill no value' by 10x

test_dcp_producer_stream_backfill_no_value operates by storing items
such that a resident ratio of 80% is reached, and then checks that
backfill is handled correctly. However the test currently runs with a
quota of 6MB, and creates Items with a null body, resulting in ~68,000
items being created and later streamed. When running under
ThreadSantizer this test takes over 60seconds.

Given that the test doesn't actually need a particular number of items
(but just needs to ensure that it has enouch such that a desires
residency ratio can be reached), reduce the max_size down to
~2MB. This creates only ~6000 items, and the test now completes in
under 6 seconds (under ThreadSanitizer). Note this still give at least
0.1% precision in residency ratio calculations, which should be
sufficient for this test.

Change-Id: I98d6507daccd4de68724f8c0bca4cc5cc84286f2
Reviewed-on: http://review.couchbase.org/61174
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
4 years agoMB-18650: Speedup test 'dcp consumer's processer task behavior' by ~50x 78/61178/6
Dave Rigby [Wed, 9 Mar 2016 16:49:12 +0000 (16:49 +0000)]
MB-18650: Speedup test 'dcp consumer's processer task behavior' by ~50x

test_dcp_consumer_processer_behavior operates by storing items such
that mem_used reaches a given fraction of ep_max_size. However it
checks this condition on every loop iteration; looking up a full set
of stats.

get_int_stat() is surprisingly expensive - that call alone accounts
for 97%(!) of the test runtime (including it's callees).

Simplify by only calling get_int_stat() every 100 iterations (similar
to other tests) - we don't need to be at the exact fraction of
ep_max_size, as long as we are over the threshold.

This reduces the test runtime under ThreadSanitizer (which suffers
particulary with all the memory allocations get_int_stat() triggers
from 99s to 1.8s.

Change-Id: Iceb8bda9b83196404499886a42f574ebc9aba1db
Reviewed-on: http://review.couchbase.org/61178
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
4 years agoMB-18650: Speedup hash_table_test (particulary under TSan) by ~40x 27/61227/6
Dave Rigby [Thu, 10 Mar 2016 11:49:05 +0000 (11:49 +0000)]
MB-18650: Speedup hash_table_test (particulary under TSan) by ~40x

The hash_table_test takes ~45s to run when run under ThreadSantizer
during commit-validation jobs, and sometimes (under machine load) can
hit it's 60s timeout. Looking at the test itself, there are a few
adjustments which can be made to the implementation to speed it up,
without removing any of the test functionality / scope:

* Use shorter keys (%d instead of key%d). While this may seem unlikely
  to speed the test up, hash table searching requires a memcmp of keys
  and hence this halves the amount of data which needs to be
  compared. Note that TSan intercepts all memcmp() calls, so these are
  much slower than normal when running under TSan).

* Remove unnecessary copying of std::string.

* Reduce the number of threads for the concurrent test from 16 to
  4. Note that while this does reduce the amount of concurrency, these
  tests were written before we started using Valgrind /
  ThreadSanitizer which do a very good job of detecting data races
  without necessarily "brute-forcing" by running a large number of
  threads.

* Reduce the key count for some tests - the actual number doesn't
  matter here, just there's a representable sample.

Combined, these changes reduce the wall-lock runtime down to under 1s
when run with ThreadSanitizer.

Change-Id: I17c6e20d0c31f1071e23c3db5c8ff2a9d464f1cf
Reviewed-on: http://review.couchbase.org/61227
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>