Jim Walker [Tue, 13 Sep 2016 19:51:32 +0000 (20:51 +0100)]
MB-20798: Update queueDirty options in prep for a new option
Switch the bool 'genBySeqno' to a more descriptive enum, call
site will now be clearer regarding the parameter.
Change-Id: I2d6707df0360f02f7455b480383f5ca9d6e64261
Reviewed-on: http://review.couchbase.org/67582
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Jim Walker [Mon, 26 Sep 2016 13:36:01 +0000 (14:36 +0100)]
MB-21143: Remove adjusted-time/drift and associated code
As part of simplfying the supported LWW code, remove the
adjusted-time API and associated code.
Change-Id: I4d1cb092d4fce3155d1cd1e0134333655bcb3a61
Reviewed-on: http://review.couchbase.org/67999
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Jim Walker [Fri, 23 Sep 2016 13:45:40 +0000 (14:45 +0100)]
MB-21143: Don't store conflict resolution mode per document
Disk and memory have storage on a per document basis for the
conflict resolution mode.
LWW is now enabled globally so this meta data is wasted.
Change-Id: I7b54a96f453b4182e0976e6c18cb48ac964e5177
Reviewed-on: http://review.couchbase.org/67955
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Jim Walker [Fri, 23 Sep 2016 16:01:27 +0000 (17:01 +0100)]
MB-21144: Simplify how LWW is enabled
Use existing conflict_resolution_type config flag to enable
LWW on a global bucket basis. Now ignoring the per document
LWW flag.
Change-Id: I2a19fd5633ec6cf28deead3cb5a371e242131fc6
Reviewed-on: http://review.couchbase.org/67954
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Fri, 12 Aug 2016 14:53:16 +0000 (15:53 +0100)]
Move more unit tests into the ep_unit_tests binary
For GTest-style tests we have created a single test binary
(ep-engine_ep_unit_tests) to link all the tests into. This has the
advantage of not having to compile different variants of our source
files for multiple different test binaries (which is partly a
limitation / 'feature' of CMake's dependancy calculation).
However not all tests are in this binary. This patch moves an
additional 2 test suites - checkpoint & defragmenter - into the single
binary. This speeds up our build time, and also removes a bunch of
duplicated boilerplate test setup code.
Change-Id: I7a9b6f8166fe2dcb739bdf124b43d1de6abc1e42
Reviewed-on: http://review.couchbase.org/68212
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Dave Rigby [Fri, 12 Aug 2016 13:45:20 +0000 (14:45 +0100)]
Remove hooksApi global; reduce coupling with MemoryTracker
MemoryTracker is somewhat tightly coupled with ep_engine.cc as it uses
the getHooksApi() function to obtain the memory allocator hooks.
Firstly this makes it hard to test - compile one file and you have to
include the other, and it's also difficult to provide a different
hooks api - either for injecting a mock one for testing, or simply to
use the 'normal' hooks API but without pulling in ep-engine.
Secondly, there is unnecessary indirection in NewHook / DeleteHook -
which are called on every new/delete so performance is relevant
there. By giving the MemoryTracker it's own copy of the alloc_hooks
(instead of calling getHooksApi() on every call) we can reduce the
amount of work in NewHook / DeleteHook by approx. 50% (measured in
terms of x64 instructions).
Change-Id: Ia0f8ebb0a5263567dc08b32fe6ff9b7ea9eefa92
Reviewed-on: http://review.couchbase.org/68210
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Manu Dhundi [Tue, 4 Oct 2016 20:31:33 +0000 (13:31 -0700)]
MB-20822: When loading persisted failovers, remove any erroneous entries
Due to bugs in older releases (and also possibly in future releases),
we may end up storing wrong failover table on disk. Hence during
warmup while loading failover table from disk we must prune out any
wrong entries.
Also, in the past we have seen erroneous entries with vb_uuid == 0.
So we make sure that vb_uuid is not generated as a valid value and
prune out any entry that has vb_uuid == 0
Change-Id: I630cb7fb1ea9a711432be64f36924d04fcd5e361
Reviewed-on: http://review.couchbase.org/67702
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Dave Rigby [Mon, 3 Oct 2016 16:41:14 +0000 (17:41 +0100)]
MB-21193: Fix ConnectionTest failure when test runs for >21s
When CV jobs take longer than normal (e.g. when running under
ThreadSanitizer) the ConnectionTest can fail:
[ RUN ] ConnectionTest.test_maybesendnoop_buffer_full
/home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-watson/ep-engine/tests/module_tests/dcp_test.cc:308: Failure
Value of: ret
Actual: 255
Expected: ENGINE_E2BIG
Which is: 8
maybeSendNoop not returning ENGINE_E2BIG
[ FAILED ] ConnectionTest.test_maybesendnoop_buffer_full (656 ms)
The test is assuming that it starts running in less than 21 seconds
when setting the DCP noop timeout. Instead of using an absolute '21'
we need to use the current_time + 21.
Change-Id: Id4fe8b372da4c447e00128de2c94c72bb60373e4
Reviewed-on: http://review.couchbase.org/68273
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Sriram Ganesan [Mon, 3 Oct 2016 09:14:18 +0000 (14:44 +0530)]
MB-21142: Ensure vbucket states are persisted before the test
In test_tap_filter_stream, vb ids 0 to 3 are set to active
before the test. The test needs to ensure that all the vbucket
states are persisted and the respective couchstore files created
so that there is no race between file creation and operations
on the file
Context: test_tap_filter_stream
Change-Id: Idad34474bdf418504012731459cf4eb3272b1ba9
Reviewed-on: http://review.couchbase.org/68264
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Sriram Ganesan [Mon, 29 Aug 2016 22:48:40 +0000 (15:48 -0700)]
MB-20744: Increment reject ops on insertion to reject queue
In case of a failure to persist items, the items are added to the
reject queue in which the reject stats needs to be updated for
the associated vbucket.
Change-Id: I15b7ad26d8bcab5b6a437b8d172d8b914df8b683
Reviewed-on: http://review.couchbase.org/68219
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Dave Rigby [Fri, 23 Sep 2016 13:59:34 +0000 (14:59 +0100)]
MB-21140: Fix race in test_access_scanner_settings (gmtime_r)
When calculating access scanner adjusted time values, use the
thread-safe variant (gmtime_r) so the test doesn't conflict with the
ep_engine code.
Also use std::chrono to perform the time manipulaton (which handles
any modulus of minutes -> hours etc).
Change-Id: Icf8505e4ce465f382904934dcaa05527efc57454
Reviewed-on: http://review.couchbase.org/68151
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Tue, 23 Aug 2016 14:28:07 +0000 (15:28 +0100)]
MB-20623: Warmup: Process the access log in chunks
Instead of loading the entire Access log into memory, and then
applying in one (potentially very large) getMulti; process it in
warmup_batch_size chunks (default 10,000 items).
This sigificantly reduces the amount of temporary memory consumed
during warmup, which in turn means this memory can be used for
document values.
Results: in the following workload we manage to load 19.2M items (or
4.2GB) into memory during warmup, where previously only two items
(yes, *two*) were warmed up due to the size of the temporary data
structures - because the temporary data structures consumed ~4.2GB.
* 1 bucket, 10,000MB quota, 1 node.
* 30M items, 300bytes each; giving a residency ratio of ~60%. Dataset
generated by:
cbc-pillowfight -U couchbase://localhost:12000 -I
30000000 -m 300 -M 300 -t16
Change-Id: I511b70d5ea9c9c6b9556249a936030a67bf70c02
Reviewed-on: http://review.couchbase.org/68045
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Tue, 23 Aug 2016 14:21:22 +0000 (15:21 +0100)]
MB-20623: Warmup: Implement MutationLog::iterator copy assignment
MutationLog::iterator doesn't follow the Rule of Three - it doesn't
implement the copy-assigment operator. This means that it's not a
complete iterator implementation.
Fix this, and add a unit test for it.
Change-Id: I12d67bc072d72e481e6a195e2d45b16c0318fdc0
Reviewed-on: http://review.couchbase.org/68044
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Tue, 23 Aug 2016 11:12:05 +0000 (12:12 +0100)]
MB-20623: Remove 'fetches' vector from MutationLogHarvester::apply
MutationLogHarvester::apply() builds a vector of keys (`fetches`) to
fetch from disk. This is essentially identical to the contents of
`committed`, except it only contains keys which currently exist in the
VBucket.
We can optimize this by removing fetches, and instead erasing the
elements from `committed[vb]` which are no longer valid. This also
means we no longer have to check for duplicates in batchWarmupCallback
(which was kinda pointless even before), as the data structure passed
in is now a set (and hence cannot by definition contain dupes).
Results in a reduction in the memory used by these temporary warmup
data structures - from 3876MB to 3218MB (17%) for the following
workload:
* 1 bucket, 10,000MB quota, 1 node.
* 30M items, 300bytes each. Dataset generated by:
cbc-pillowfight -U couchbase://localhost:12000 -I
30000000 -m 300 -M 300 -t16
Change-Id: I1c2e480e53626be41fa20dcd44abe4d6fa327e4c
Reviewed-on: http://review.couchbase.org/68043
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Mon, 22 Aug 2016 14:43:29 +0000 (15:43 +0100)]
MB-20623: Remove unused fields from temporary warmup data structures
Warmup currently creates three data structures (per vBucket) in memory
when warming up using the access.log:
1. A map of keys -> sequence number at the time the access log was
generated - MutationLogHarvester::committed
2. A list of {key, hashTable_seqNo} pairs -
MutationLogHarvester::apply::fetches
3. A map of key -> VBucketBGFetchItem for every key in the list above
- batchWarmupCallback::items2fetch
There are a number of inefficiencies in this implementation, the first
of which is that we record sequence numbers in the first two data
structures which are never actually used - the final BGfetch doesn't
need them.
This patch therefore removes the recording of sequence numbers. This
changes the data structures to:
1. MutationLogHarvester::committed - a set of keys (per vBucket).
2. MutationLogHarvester::apply::fetches - a vector of keys.
Results in a reduction in the memory used by these temporary warmup
data structures - from 4356MB to 3876MB (11%) for the following
workload:
* 1 bucket, 10,000MB quota, 1 node.
* 30M items, 300bytes each. Dataset generated by:
cbc-pillowfight -U couchbase://localhost:12000 -I
30000000 -m 300 -M 300 -t16
Change-Id: I0666e2d1a8a0d9e996cdbdd61d41118d2c2d6dfc
Reviewed-on: http://review.couchbase.org/68042
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Mon, 22 Aug 2016 13:10:17 +0000 (14:10 +0100)]
MB-20623: MutationLog: Remove unused Delete and Uncommitted functionality
The MutationLog (when actually used for Mutations) could record
Deletes (in addition to 'adds') of keys, and return a list of
'uncommitted' keys. However this functionality has long been unused
(since March 2013, http://review.couchbase.org/26943).
Remove this unused code to simplify the way the MutationLog class is
now used - for the access log.
Change-Id: I5642553708c7d5cf320cec5524ae49643a414192
Reviewed-on: http://review.couchbase.org/68041
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Tue, 27 Sep 2016 14:00:03 +0000 (15:00 +0100)]
MB-20623: Convert MutationLog unit tests to GTest
Convert to using the GoogleTest framework, and integrate into
ep-engine_ep_unit_tests binary.
Change-Id: I7caff50202abc4ecf55c513f4b6859d5c5d48d40
Reviewed-on: http://review.couchbase.org/68040
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Mon, 22 Aug 2016 13:46:01 +0000 (14:46 +0100)]
MB-20623: re-enable and fix MutationLog unit test
This test appears to have been not built/run a while back. Resurrect
it.
Change-Id: I4b35291239882e5e58bc2c7d435e3c793a7ae6ba
Reviewed-on: http://review.couchbase.org/68039
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Fri, 23 Sep 2016 10:29:54 +0000 (11:29 +0100)]
Add LoggedAtomic<> debug class
A Debugging wrapper around std::atomic which print all accesses to the
atomic value to stderr.
Drop-in compatible with AtomicValue for simple use-cases - currently
only implements load() / store().
Change-Id: I78cec4d8bad55588900573f201d81a38f16f97ee
Reviewed-on: http://review.couchbase.org/67944
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
Manu Dhundi [Wed, 14 Sep 2016 19:12:29 +0000 (12:12 -0700)]
MB-20822: Erase all diverged branch entries correctly in failover table
When we add an entry in failover table we must erase all the other
entries with higher seqno because they are from a diverged branch.
This commit fixes a bug in the loop that was erasing diverged entires.
In a std::list when we erase an entry, the function returns the
iterator pointing to next element and hence we must be careful not to
double increment it.
Change-Id: I9275fba8057f26e2853a8d7d359f1d01f107f2be
Reviewed-on: http://review.couchbase.org/67481
Reviewed-by: Dave Rigby <daver@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Jim Walker [Mon, 26 Sep 2016 09:29:10 +0000 (10:29 +0100)]
MB-16181: Add replicate/persist traits to Item
Provide an abstraction for the flusher and DCP that tells them
if an item should be persisted or replicated.
This provides a stepping stone towards system owned items in
checkpoints.
Change-Id: Ie5e65a2b20d0d162e1b8fe3e439219c34fb7b508
Reviewed-on: http://review.couchbase.org/67990
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Wed, 7 Sep 2016 14:40:08 +0000 (15:40 +0100)]
MB-20869: Simplify test to fix intermittent test_access_scanner failure
In test_access_scanner the check that the number of non-resident items
was less than 6% of the total is incorrect - it was using integer
division (6/100) and hence the RHS of the check was always zero, which
would always succeed. Fix the check so it calculates the percentage
correctly.
Additionally we did not wait for the previous AccessScanner run to
finalize - which meant that the second start of the access acanner
could fail to start (and consequently we would timeout with the
message:
Exceeded maximum wait time of 60000000us waiting for stat
'ep_num_access_scanner_runs' to be greater or equal than 8 -
aborting.
To solve this, change to just one iteration of creating the file (not
sure why it did two before), and move the increment of stats.alogRuns
to /after/ the access log files are renamed.
Change-Id: I6c3b6eeed91302fd7fa67dfade6ae954078fc125
Reviewed-on: http://review.couchbase.org/67884
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Jim Walker [Fri, 19 Aug 2016 15:19:51 +0000 (15:19 +0000)]
[BP] MB-20598: AddressSanitizer problems
Two leaks and one stack overflow.
The forest-kvstore code should use dynamic_cast (like couch-kvstore)
else when the incoming callback is not a RememberingCallback, we will
access outside of the incoming object.
ep_testsuite has a leak in tap code where we must release items
during iteration.
Change-Id: If0db9e936ee141299c5a579235e828c7309b8118
Reviewed-on: http://review.couchbase.org/68106
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Jim Walker [Thu, 25 Aug 2016 09:47:14 +0000 (09:47 +0000)]
[BP] MB-20598: AddressSanitizer problem in perfsuite
This test was being OOM killed so was left out of
the orginal set of fixes. Tap iterator mutation/deletions
need to be released by the client.
Change-Id: Ib65f386f1080cb2a130cfdd7d90c85dd4a871989
Reviewed-on: http://review.couchbase.org/68107
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Sriram Ganesan [Sat, 3 Sep 2016 23:49:53 +0000 (16:49 -0700)]
MB-19578: Fail warmup only when explicitly configured
- Instead of unconditionally failing warmup in case of a OOM failure,
fail it only when it is configured to do so. Note that ns_server
explicitly sets this value to false.
- Remove an unnecessary check in the bucket initialization path to
check for warmup failures as soon as warmup is started. The warmup
process is asynchronous and thus checking for failures right away
is futile.
Change-Id: I5725644a3731e7cd75e98b0b77ca6ae8a3d88ab9
Reviewed-on: http://review.couchbase.org/67353
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Sriram Ganesan [Thu, 1 Sep 2016 00:02:55 +0000 (17:02 -0700)]
MB-20528: unconditionally increment get meta ops
The get meta operations prior to introducing bloom filters was
always incremented whether the key was present or not present
in the bucket. Restore the behavior where the get meta ops stat
is always incremented.
Change-Id: I09fadedb71929d0d078d0ca6c49ef27de3fd5bef
Reviewed-on: http://review.couchbase.org/67307
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Wed, 21 Sep 2016 14:33:18 +0000 (15:33 +0100)]
MB-20852: ep_unit_tests_main: Show DEBUG logs with -v
Previously we would only print INFO level log messages when the -v
flag was given, even though the code claimed to have enabled DEBUG
level.
This was because we didn't get the log level early enough - the
extension defaults to INFO, and will propogate that down to ep_engine
(overwriting what we specify). Fix by setting the log level in the
server API to debug.
Change-Id: I8a192169721fad631965600b69975d882b6221f8
Reviewed-on: http://review.couchbase.org/67888
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Daniel Owen [Tue, 13 Sep 2016 11:14:32 +0000 (12:14 +0100)]
Remove default parameter from queueDirty
The EventuallyPersistentStore method queueDirty has a parameter called
notifyReplicator, which is defaulted to true. The parameter is used
to specify whether or not to notify the replicator.
However the queueDirty method either uses the notifyReplicator default
of true, or is passed in true.
This patch removes the unrequired parameter from the queueDirty
definition and simpifies the body of the method.
Change-Id: I6340e522f0e6137bc744450ddc90e418ed7716a2
Reviewed-on: http://review.couchbase.org/67623
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Reviewed-by: Sriram Ganesan <sriram@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Thu, 8 Sep 2016 13:32:45 +0000 (14:32 +0100)]
MB-20834: Use get_available_cpu_count() for executorpool #threads calculation
Use the new Couchbase::get_available_cpu_count() when calculating how
many executor pool threads to create. get_available_cpu_count()
accounts for the number of logical cores a process is allowed to run
on, which is the mechanism cgroups (and hence Docker) uses to limit
the CPUs available to a container.
This fixes the problem of us creating "too many" executorpool threads
under Docker containers which have the --cpuset-cpus option set.
Change-Id: I3e3b91eecc51aea298ae9aceb9e8c6d3f16b7612
Reviewed-on: http://review.couchbase.org/67493
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Daniel Owen [Tue, 13 Sep 2016 10:25:39 +0000 (11:25 +0100)]
Remove default parameter from deleteItem
The EventuallyPersistentStore method deleteItem has a parameter
called tapBackfill, which is defaulted to false. The parameter
is used to specify if an item deletion is from a TAP backfill
stream.
However the deleteItem method is never passed a tapBackfill
parameter and therefore the default of false is always used.
This patch removes the unrequired parameter from the deleteItem
definition and simpifies the body of the method.
Change-Id: Ic1aa9a41bd68411f9b29b61333f66b4c1ae35278
Reviewed-on: http://review.couchbase.org/67624
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Dave Rigby [Fri, 23 Sep 2016 16:52:52 +0000 (17:52 +0100)]
MB-21036: Fix intermittent failure in shutdown_snapshot_range
Issue is that the test attempts to create exactly 3 checkpoints,
however this is load-dependent (i.e. how quickly the flusher runs and
creates checkpoints).
Remove this intermediate checks in the test, and just check the
sequence numbers.
Change-Id: Ic7c0a9217afcdc8bd65680efb992b09db0f5023b
Reviewed-on: http://review.couchbase.org/67948
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Dave Rigby [Fri, 23 Sep 2016 13:59:34 +0000 (14:59 +0100)]
MB-21034: Fix race in test_expiry_pager_settings (gmtime_r)
When calculating expiry pager adjusted time values, use the
thread-safe variant (gmtime_r) so the test doesn't conflict with the
ep_engine code.
Also use std::chrono to perform the time manipulaton (which handles
any modulus of minutes -> hours etc).
Change-Id: Iee39a86b73a71642b9dab4ff2821d589699731ac
Reviewed-on: http://review.couchbase.org/67946
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Fri, 23 Sep 2016 10:23:11 +0000 (11:23 +0100)]
MB-21029: Don't return ep_access_scanner_task_time until it has been scheduled
Fix a latent race condition in ep_access_scanner_task_time stat
(exposed by intermittent failure in access_scanner_settings test),
whereby we can return a value for ep_access_scanner_task_time of the
Unix epoch (1970-01-01 00:00:00) if the AccessScanner has been enabled
but not yet scheduled.
Solution is to not return a value (and instead 'NOT SCHEDULED') until
scheduling has occurred.
Change-Id: Id85a602b3424c30c8795884207a1b0d31cb3c75a
Reviewed-on: http://review.couchbase.org/67943
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Fri, 23 Sep 2016 10:12:36 +0000 (11:12 +0100)]
MB-21029: ep_test_apis: make wait_for_stat_XXX functions templated
There is a range of helper functions to wait for a stat to meet some
criteria (equal, not equal, greater, ...), and currently these are
duplicated for each type (int, string, ...) required.
As the subsequent fix for MB-21029 needs a wait_for_stat_to_change for
the std::string type, genericize the current wait_for_stat function so
it can be used with any type.
Change-Id: I681218d31c4dcd1ef8b34de225efd13e99bbf8db
Reviewed-on: http://review.couchbase.org/67942
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Wed, 21 Sep 2016 12:32:14 +0000 (13:32 +0100)]
MB-20852: ep_test_apis: report final value in wait_for_stat...
Change-Id: I5aeae80ffcc0791701cf5982df4b11d00b288c8a
Reviewed-on: http://review.couchbase.org/67887
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Tue, 13 Sep 2016 13:06:04 +0000 (14:06 +0100)]
MB-20852: Remove unused 'force' parameter from scheduleVBStatePersist()
'force' is always false, so remove it.
Change-Id: I02e245b2a0237697c9abc4d55e2a9117cf4ca214
Reviewed-on: http://review.couchbase.org/67876
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Wed, 14 Sep 2016 17:52:49 +0000 (18:52 +0100)]
MB-20697: ep_testsuite: restore deleted DB dir before cleanup
In some environments (seen on Windows VM), the regression test for
MB-20697 can hang forever during cleanup due to the writer thread being
stuck in an infinite loop trying to flush to disk.
To address this, re-create the database directory before shutting down
EPStore (after the test is complete).
Change-Id: I474e77bafbe5b30858d9a306669c52713890f846
Reviewed-on: http://review.couchbase.org/67681
Reviewed-by: Daniel Owen <owend@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Mon, 26 Sep 2016 13:54:02 +0000 (14:54 +0100)]
Merge remote-tracking branch 'couchbase/4.5.1' into watson
* couchbase/4.5.1:
MB-20943: Set state to dead when deleting vbucket
Change-Id: I79ef4c2d0e66cf18e08f884ad7fecf2afe3b5578
Daniel Owen [Wed, 21 Sep 2016 09:50:33 +0000 (10:50 +0100)]
MB-20943: Set state to dead when deleting vbucket
When executing the VBucketMemoryDeletionTask the vbucket state is
unchanged. notifyAllPendingConnsFailed is called in the run
function of VBucketMemoryDeletionTask. This inturn calls fireAllOps,
which ensures all pending ops are cleared if the vbucket is in an
active state.
However if the vbucket is in a pending state is does nothing and
therefore the pending operations remain. This results in connections
not being closed down.
The solution provided is to set the vbucket state to dead in
deleteVBucket, prior to calling scheduleVBDeletion.
A corresponding test has been added, which without the fix will hang.
Change-Id: I09cd4597b26576dd4b99d26f3a60c031e1b5f41d
Reviewed-on: http://review.couchbase.org/67873
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Dave Rigby [Wed, 14 Sep 2016 14:49:42 +0000 (15:49 +0100)]
MB-20771: Remove unnecessary defragmenter_test operator overloads
The defragmenter & ep_unit tests have their own operator new/delete
overload, to improve interopability with Valgrind (so Valgrind saw
explicit calls our malloc / free).
However the global_new_replacement in platform does this anyway now, so
this code is no longer needed.
Change-Id: I49dbc9e14582d39b58b1db48a0c7772d4f3855c9
Reviewed-on: http://review.couchbase.org/67675
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Dave Rigby [Fri, 26 Aug 2016 12:37:57 +0000 (13:37 +0100)]
MB-20586: Update ep-engine to use cb_malloc memory API
Similar to changes in memcached, update all C-style memory allocation
uses to use cb_malloc instead of raw system malloc.
Change-Id: Ic9bce029c34f74e161aed20b99129985264e0d4c
Reviewed-on: http://review.couchbase.org/67313
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Daniel Owen <owend@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Fri, 12 Aug 2016 14:08:39 +0000 (15:08 +0100)]
MB-20586: ep_unit_tests: Use real memory tracking code
Use the 'real' memory tracking hooks instead of alloc_hooks_dummy in
the ep-engine unit tests. This more accurately reflects how our code
used in the 'real world'
Change-Id: I231a179e7765d46a63c72686c0279983db21cf0b
Reviewed-on: http://review.couchbase.org/67442
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Dave Rigby [Fri, 26 Aug 2016 12:39:59 +0000 (13:39 +0100)]
MB-20586: DefragmenterTest.MappedMemory: Fix on OS X
This test currently fails as it does not have a way of tracking memory
usage, due to us not initializing the memory hooks (which installs a
custom zone allocator).
Fix by initializing the allocator before we run any tests in this
suite.
Change-Id: I3b567001b070483d19d16ff1787d29ebd9669bfa
Reviewed-on: http://review.couchbase.org/67312
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Dave Rigby [Wed, 17 Aug 2016 14:00:20 +0000 (15:00 +0100)]
MB-20586: Configuration: Reduce use of C-style memory allocation
Remove C-style allocation where possible from Confiuration, replacing
with C++ owning containers.
Change-Id: I009a07064fb5c2727ab4d62f85aa9efd7f79684a
Reviewed-on: http://review.couchbase.org/67311
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Dave Rigby [Thu, 18 Aug 2016 16:04:02 +0000 (17:04 +0100)]
MB-20586: MemoryTracker: Use new[] instead of calloc()
Change-Id: I44882ee3d385c3ab70787fd187105fcdeb26545e
Reviewed-on: http://review.couchbase.org/67310
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Dave Rigby [Thu, 18 Aug 2016 16:11:29 +0000 (17:11 +0100)]
MB-20586: Use cJSON_Free() to free allocations made by cJSON_Print
Otherwise we can get mismatched malloc/free when using cbmalloc or
another custom allocator.
Change-Id: Ifdd2fe031cb6d6c785a77d552b966fa173de1593
Reviewed-on: http://review.couchbase.org/67069
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Dave Rigby [Tue, 6 Sep 2016 08:04:16 +0000 (09:04 +0100)]
MB-20716: Ensure DCP consumers in EWOULDBLOCK are unpaused on bucket delete
This is a follow-up to 8734958 - in addition to ensuring that DCP
producers are unpaused, also unpause DCP _consumers_.
Change-Id: I538e7bca865c4fa41240263da1c92312b3866bfa
Reviewed-on: http://review.couchbase.org/67375
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Thu, 1 Sep 2016 15:30:41 +0000 (16:30 +0100)]
MB-20751: Fix lock cycle (deadlock) during bucket delete & client disconnect
MB-20716 recently fixed an issue where idle DCP connections (in
EWOULDBLOCK state) would not be notified (woken up) in the frontend
when a bucket was deleted. The fix for this was to trigger a notify
(via producer->notifyPaused()) as part of ep-engine bucket delete.
Unfortunately this introduced a lock-order-inversion (deadlock)
between two mutxes, which caused memcached to hang during shutdown,
as one (or more) worker threads would never terminate.
The issue is between:
1. Frontend_worker thread mutex (threadLock)
2. ConnMap::connsLock
And at least two threads (although normally 3 in the wild):
T1: Frontend worker thread
T2: DestroyBucket thread
(optional T3: A NONIO thread running ConnManager)
During bucket delete, the follow sequence occurs which creates a cycle
between threadLock and connsLock:
T1<Worker>:
event_handler() ... conn_pending_close()
-> LOCK(threadLock)
DcpConnMap::disconnect()
-> ACQUIRE(connsLock)
T2<DeleteBucket>:
EventuallyPersistentEngine::handleDeleteBucket() ...
DcpConnMap::shutdownAllConnections()
-> LOCK(connsLock)
notifyIOComplete() ... DcpProducer::notifyPaused()
-> ACQUIRE(threadLock)
Part of the issue here is that DcpProducer::notifyPaused() *must* be
called with schedule==false, as there is no longer a ConnNotifier task
running on another thread (which never acquires the connsLock and
hence avoids any deadlock), as the ConnNotifier has been shutdown in
DcpConnMap::shutdownAllConnections previously. Therefore we need to
use the variant of notifyPaused which calls notify_IO_complete in the
same thread.
The solution chosen is to essentially drop the connsLock in
shutdownAllConnections before calling notify. We achive this by taking
a _copy_ of the connections map (under connsLock), and then iterating
over this copy and calling notify etc. This is safe as the elements of
the map are all ref-counted pointers.
Change-Id: I73f9b7576e42030a9f5219ae51e604e36fabcac7
Reviewed-on: http://review.couchbase.org/67252
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Dave Rigby [Fri, 26 Aug 2016 18:29:43 +0000 (19:29 +0100)]
MB-20697: Return false if commit2couchstore fails
This ensures that callers are notified of the failure, and
specifically we correctly increment the ep_item_commit_failed stat
Change-Id: I56f2591479c45c03fba184236aa3790a67290b38
Reviewed-on: http://review.couchbase.org/67093
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Dave Rigby [Tue, 30 Aug 2016 14:10:32 +0000 (15:10 +0100)]
MB-20716: Ensure DCP connections in EWOULDBLOCK are unpaused on bucket delete
When a bucket delete occurs, memcached notifies the deleted engine via
the ON_DELETE_BUCKET callback, which in turn calls
DCPConnmap::shutdownAllConnections(). This correctly shuts down all
the DCP streams associated with DCP connections, however if any of
these DCP connections are in the EWOULDBLOCK state - i.e. the frontend
is waiting for a notify_IO_complete call to "wake" them up, then the
frontend will be blocked waiting for a notify_IO_complete which will
never arrive.
This behaviour is essentially a latent bug, however prior to the fix
for MB-20549, memcached would (incorrectly) call signalIfIdle on
connections in the EWOULDBLOCK state, forcing them to wake up. With
that fix in place this longer occurs.
The solution here is to explictly unpause all producer connections
when all streams are closed.
Change-Id: Ia105e78304f5481bb56a0c0ff1cfc973959e1016
Reviewed-on: http://review.couchbase.org/67169
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Wed, 24 Aug 2016 10:54:13 +0000 (11:54 +0100)]
MB-20645: Don't request stats from null DCP backfill manager
If a DCP Producer has DcpProducer::addStats called on it after its
been disconnected (but before it's removed from the connMap) then we
end up dereferencing a null backfillMgr pointer.
Fix by adding a guard that the manager is valid before including its
stats.
Change-Id: Idc97b447090f5390054a9c40f207dae5494e63b9
Reviewed-on: http://review.couchbase.org/67025
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Daniel Owen [Fri, 5 Aug 2016 11:50:09 +0000 (12:50 +0100)]
MB-20425: Change options parameter to correct values
Updates epstore get to use the options passed in.
Requires the call to ep_engine get from ep_engine
arithmetic to be updated to use the following
options:
QUEUE_BG_FETCH | HONOR_STATES |
TRACK_REFERENCE | HIDE_LOCKED_CAS
Requires the call to ep_engine get from epstore
store to be updated to use the following options:
QUEUE_BG_FETCH | HONOR_STATES | TRACK_REFERENCE |
DELETE_TEMP | HIDE_LOCKED_CAS
Also adds an associated test, where the bloom filter
is disabled which in the presense of the bug will
cause the test to hang.
Change-Id: I8fd275c3e14b0050e172b32f15fb7ed555e4b0c2
Reviewed-on: http://review.couchbase.org/66537
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Daniel Owen [Fri, 5 Aug 2016 10:20:30 +0000 (11:20 +0100)]
MB-20425: Remove default options parameter from get functions
The ep_engine get function defaults the option parameter.
The ep_store get function also defaults the option parameter.
These multiple levels of defaulting has made it difficult to
track the value of the options parameter for different calls.
Therefore the use of defaults are removed for these cases.
This will make the change that addresses the regression of
MB-20425 much easier to understand. This patch makes no
functional change.
Change-Id: I69aaa31a9a437f13299eb019956aa0488f13b95a
Reviewed-on: http://review.couchbase.org/66531
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Jim Walker [Wed, 3 Aug 2016 13:54:49 +0000 (14:54 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'
* commit '
1301ca609be559248af78d6fa52ce766dd8e4915':
MB-20307: Re-enable dcp ep_dcp_dead_conn_count
MB-20312: Initialise snapshot task priority
MB-20330: ReaderLockHolder with no lvalue
Change-Id: I5878d95f8d792971fbb4ab5342baf4b017b6614a
Jim Walker [Wed, 3 Aug 2016 13:53:54 +0000 (14:53 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'
* commit '
62f88138da834e216b953d3cf8064accb521c205':
MB-19837: Increase number of NONIO threads
MB-18453: Make task scheduling fairer
[BP] MB-18452: Single threaded test harness improvements
Change-Id: If16ed42aed060f94d3180e832aaae0a7f5c5f052
Jim Walker [Tue, 2 Aug 2016 09:08:25 +0000 (10:08 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'
* commit '
36d772883b9bf2179694f2ca9d0575ed52135a66':
MB-20182: Update checkpoint snapshot correctly during TAP backfill
MB-20105: Ensure purge_seq is not reset when no items are purged in a compaction
MB-20054: Fix windows build error by adding size() func in class AtomicQueue
MB-20054: Fix windows build error by including a missing header file
MB-20054: Regression test - bucket is deleted with DCPBackfill running
MB-20054: Account for memory alloc/dealloc in unregisterBucket
MB-20054: [BP] Add verbose (logging) output to ep_unit_tests_main
MB-20054: Backport ep-engine_unit_tests from watson to 3.0.x
Change-Id: I4e82985e4ed7c506faa44b19b456b98d1067ed6a
Jim Walker [Tue, 2 Aug 2016 09:07:21 +0000 (10:07 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'
* commit '
6403bc0e8bbd7e94bb03672f505d99ff68d56c36':
MB-18453: Give all tasks their own stats and priority
[BP] MB-18580: Wait for VB state to be persisted before starting tests
Change-Id: Ia573467344f4c9ee2e2092322e54f7788201310e
Jim Walker [Tue, 2 Aug 2016 09:05:49 +0000 (10:05 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'
* commit '
c509624bda146dfdc26ebaf8044657ecc1160912':
MB-19948: enable disabled meta-data tests.
MB-19948: Handle 18 bytes of metadata
MB-19948: CouchKVStore metadata tests
MB-19897: Fix the data race on lastSendTime between stats and dcp worker threads
Change-Id: Ia8b67427b969eefc50310762b92e9aa6a4662003
Dave Rigby [Mon, 1 Aug 2016 13:24:39 +0000 (14:24 +0100)]
MB-20351: Fix lock-order inversion in ~CheckpointManager
As identified by TSan. Seen whilst testing sherlock->watson merge,
analysed the code and it seems this is a latent issue and hard to
re-produce.
The issue is that when the executor thread does a reset on the current
task, the VBCBAdapator is the last one holding the ref-counted
vbucket, so destruction occurs and ~VBucket calls the destructor of
the checkpoint manager, which is the reverse locks ordering of a
previous code path.
Number of threads in play here, but main ones of interest:
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=170834)
Cycle in lock order graph: M_checkpoint (0x7d640002e9a8) => M_exepool (0x7d4c00008288) => M_taskqueue (0x7d4400008e80) => M_exethread (0x7d380000df60) => M_checkpoint
Mutex M_exepool acquired here while holding mutex M_checkpoint in thread T35:
#0 pthread_mutex_lock <null> (engine_testapp+0x000000486760)
#1 std::mutex::lock() /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/x86_64-linux-gnu/c++/4.9/bits/gthr-default.h:748 (ep.so+0x0000000f47b0)
#2 ExecutorPool::wake(unsigned long) ep-engine/src/executorpool.cc:355 (ep.so+0x0000000f48f1)
#3 Flusher::wake() ep-engine/src/flusher.cc:155 (ep.so+0x000000101ee6)
#4 NotifyFlusherCB::callback(unsigned short&) ep-engine/src/flusher.h:88 (ep.so+0x00000010d194)
#5 Checkpoint::queueDirty(SingleThreadedRCPtr<Item> const&, CheckpointManager*) ep-engine/src/checkpoint.h:675 (ep.so+0x0000000271b0)
#6 CheckpointManager::closeOpenCheckpoint_UNLOCKED() ep-engine/src/checkpoint.cc:454 (ep.so+0x000000028dcb)
#7 CheckpointManager::addNewCheckpoint_UNLOCKED(unsigned long, unsigned long, unsigned long) ep-engine/src/checkpoint.cc:371 (ep.so+0x00000002881f)
#8 CheckpointManager::checkOpenCheckpoint_UNLOCKED(bool, bool) ep-engine/src/checkpoint.cc:361 (ep.so+0x00000002bd71)
#9 CheckpointVisitor::visitBucket(RCPtr<VBucket>&) ep-engine/src/checkpoint_remover.cc:43 (ep.so+0x00000003c3bd)
#10 VBCBAdaptor::run() ep-engine/src/ep.cc:3924 (ep.so+0x0000000a6174)
#11 ExecutorThread::run() ep-engine/src/executorthread.cc:115 (ep.so+0x0000000fe1b6)
#12 launch_executor_thread(void*) ep-engine/src/executorthread.cc:33 (ep.so+0x0000000fdd15)
#13 platform_thread_wrap(void*) platform/src/cb_pthreads.cc:54 (libplatform.so.0.1.0+0x0000000057fb)
...
Mutex M_checkpoint acquired here while holding mutex M_exethread in thread T36:
#0 pthread_mutex_lock <null> (engine_testapp+0x000000486760)
#1 CheckpointManager::~CheckpointManager() /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/x86_64-linux-gnu/c++/4.9/bits/gthr-default.h:748 (ep.so+0x000000027fdd)
#2 VBucket::~VBucket() ep-engine/src/vbucket.cc:152 (ep.so+0x00000014a018)
#3 PagingVisitor::~PagingVisitor() ep-engine/src/atomic.h:190 (ep.so+0x00000010a5e6)
#4 PagingVisitor::~PagingVisitor() ep-engine/src/item_pager.cc:43 (ep.so+0x00000010a645)
#5 std::_Sp_counted_ptr<PagingVisitor*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/shared_ptr_base.h:373 (ep.so+0x00000010a2b0)
#6 VBCBAdaptor::~VBCBAdaptor() /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/shared_ptr_base.h:149 (ep.so+0x0000000aea7e)
#7 ExecutorThread::run() ep-engine/src/atomic.h:325 (ep.so+0x0000000fdee4)
#8 launch_executor_thread(void*) ep-engine/src/executorthread.cc:33 (ep.so+0x0000000fdd15)
#9 platform_thread_wrap(void*) platform/src/cb_pthreads.cc:54 (libplatform.so.0.1.0+0x0000000057fb)
Change-Id: I0a966b3d112963243e17647184123fd8b3200656
Reviewed-on: http://review.couchbase.org/66357
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Jim Walker [Thu, 28 Jul 2016 18:29:17 +0000 (19:29 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'
* commit '
dd3b6ae5e919bf51adaf5183fc8f1a076eac5357':
MB-19982: Don't hold connsLock for duration of dcp stats
MB-19982: Fix potential deadlock between DcpConsumer::bufMutex & connsLock
MB-14859: Handle quick successive BG Fetch of a key interleaved with exp pager
Change-Id: Ie620baa1dc2151124f072084868020d3067c5fb2
Jim Walker [Thu, 28 Jul 2016 10:38:45 +0000 (11:38 +0100)]
MB-20307: Re-enable dcp ep_dcp_dead_conn_count
The call to collect this stat was dropped in
a recent merge commit. This commit adds it back.
Change-Id: I06d1d18cb4479edb2a74d899d4c3a8089a0c4656
Reviewed-on: http://review.couchbase.org/66284
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Jim Walker [Thu, 28 Jul 2016 10:36:53 +0000 (11:36 +0100)]
MB-20312: Initialise snapshot task priority
The internal priority of VBSnapshotTask is not
intitialised, it is likely tasks requested to run at
low may actually become high (or vice versa).
Note this is not the GlobalTask priority, just an internal
one to this particular task.
Change-Id: Iabf91a8fe6fee0a8cf8bce99e72e4b22dd57040b
Reviewed-on: http://review.couchbase.org/66283
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Jim Walker [Thu, 28 Jul 2016 10:32:46 +0000 (11:32 +0100)]
MB-20330: ReaderLockHolder with no lvalue
3.x merge brought in the wrong version of some
code meaning that a read lock is never acquired.
Change-Id: I139ac041d54fdf8d459f4309a9c2be22e40afb8e
Reviewed-on: http://review.couchbase.org/66282
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Jim Walker [Wed, 27 Jul 2016 09:20:31 +0000 (10:20 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'
* commit '
9b194271f12e9b620c803a11b77a62e5402fb346': (22 commits)
MB-19886: Fix data race on ActiveStream::curChkSeqno by making it atomic
MB-19886: In markDiskSnapshot() get current vb snapshot info outside streamMutex
MB-19843: Modify the end_seqno in DCP stream request after checking for rollback
MB-19732: Fix the data race on lastSendTime between stats and dcp worker threads
MB-19732: Record time for all DCP consumer messages
MB-19732: Only update sendTime if successfully send noop
MB-19691: Address data race on vb_state::high_seqno
MB-19678: Merge backfill and in-memory snapshots correctly on replica vb
MB-19636: Initialise failovers correctly from 2.5.x vbstate
MB-19673: Log the actual last seqno sent before closing the stream.
MB-19503: Fix ConnMap so notifications don't go missing [2]
MB-19503: Fix ConnMap so notifications don't go missing.
MB-19404: [BP] Address data race in DCP-Producer seen while making a stats request
MB-19405: [BP] Address possible data races in PassiveStream context
MB-19359: [3] Address lock inversion with vb's state lock and snapshot lock
MB-19383: [BP] Address possible data race with startuptime
MB-19380: Address data race observed with vb's pendingBGFetches
MB-19360: Init mock server in stream module tests
MB-19382: [BP] Create a variable to get correct locking scope
MB-19359: [2] Address lock inversion with vb's state lock and snapshot lock
...
Change-Id: I70a7a29f33c7ec276e3bc99bc80a9e6fd739281a
Jim Walker [Wed, 27 Jul 2016 09:19:50 +0000 (10:19 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'
* commit '
ac78070d8dae90427c4bd3030a7be4ab09f920ae':
[BP] MB-16366: Obtain vbstate readlock in numerous operations
MB-19280: Fix data race in CouchKVStore stats access
MB-19279: Fix race in use of gmtime()
MB-19113: Suppress test_mb16357 when on thread sanitizer
MB-19278: Fix lock-order inversion on ActiveStream::streamMutex
MB-19277: Set executorThread's waketime to atomic
MB-19276: Fix data race on ExecutorThread::taskStart
MB-19275: Address data race on a DCP stream's state
MB-19273: Fix data race on PassiveStream::buffer.{bytes,items}
MB-19260: Make cookie atomic to serialize set/get in ConnHandler
MB-19259: Fix data race on DcpConsumer::backoffs
MB-19258: Address data race with replicationThrottle parameters
MB-19281: [BP] Add template class RelaxedAtomic<>
MB-19257: Fix data race on ExecutorThread::now
MB-19256: Address possible data race on VBCBAdaptor::currentvb
Change-Id: Ie8194d570b1d367a90d277ed086dec90eb99d6e9
Jim Walker [Wed, 27 Jul 2016 09:19:09 +0000 (10:19 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'
* commit '
644c09df35ac4e05006347140240819704848d0f':
MB-19253: Fix race in void ExecutorPool::doWorkerStat
MB-19252: Fix data race on Stream::readyQueueMemory
MB-19251: Fix race in updating Vbucket.file{SpaceUsed,Size}
MB-19249: Address possible data races in ConnHandler context
MB-19248: Fix race in TaskQueue.{ready,future,pending}Queue access
MB-19247: Fix possible data race in workload.h: workloadPattern
MB-19246: Fix potentially incorrect persist_time in OBSERVE response
MB-19229: Address possible data race in vbucket.cc: numHpChks
MB-19228: Address possible data races in ActiveStream context
MB-19227: Fix race in ConnNotifier.task access
MB-19226: Address potential data races in the warmup code
MB-19225: Fix data race on Flusher::taskId
MB-19225: Fix race in Flusher._state
MB-19224: Address possible data race with global task's waketime
MB-19223: Switch to hrtime from timeval in Global Thread Pool
Change-Id: I6cab0405f2ce779e2cf9849fa5364a9549382905
Jim Walker [Wed, 27 Jul 2016 08:03:35 +0000 (09:03 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'
* commit '
8cbe913fa9a2f78388adb2d2ce6dbfeee1e23e6e':
MB-19222: Fix race condition in TaskQueue shutdown
MB-19220: Ensure HashTable::size is atomic
MB-19204: ep_testsuite: Don't release the item while we're using it
MB-19204: Address data race in ep_test_apis/testsuite
MB-19204: ep_testsuite: Use std::string for last_key/body
MB-19204: Remove alarm() call from atomic_ptr_test, reduce iteration count
MB-19204: hash_table_test: Fix TSan issues
MB-16656: Send snapshotEnd as highSeqno for replica vb in GET_ALL_VB_SEQNOS call
MB-19153: Break circular dependency while deleting bucket
MB-19113: Address false positive lock inversion seen with test_mb16357
Change-Id: I708f67379ab38ea1af8c1602b790e590c3038806
Will Gardner [Tue, 29 Mar 2016 12:26:35 +0000 (13:26 +0100)]
MB-20224: [BP] Replace ThreadLocal '#define' with a using
Using a define causes issues inside of GoogleTest which has its
own ThreadLocal class. By replacing it with 'using' we avoid
the collision.
*Required to build MB-20224 on watson.*
Change-Id: I05d3e25efc0eb361f7dbe82074d806ba116781c5
Reviewed-on: http://review.couchbase.org/66210
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Jim Walker [Tue, 19 Jul 2016 14:18:48 +0000 (15:18 +0100)]
MB-19837: Increase number of NONIO threads
We've found that the NONIO threads can become
heavily utilised. On smaller systems only 1 or 2
threads were created, easily overwhelmed during
rebalance leading to rebalance failures.
This commit changes the code from creating
NONIO as 10% of nCPU to be 30% of nCPU and
ensuring at least 2 are always present.
The % is still hardwired because the thread count is global
and would be intialised by the first bucket's config.
Given that we can already override with a config flag the changes
are hardwired to give better throughput for nearly all users.
Comparison of old vs new.
CPU count = 1 NONIO threads old{1} new{2}
CPU count = 2 NONIO threads old{1} new{2}
CPU count = 4 NONIO threads old{1} new{2}
CPU count = 8 NONIO threads old{1} new{2}
CPU count = 16 NONIO threads old{2} new{3}
CPU count = 32 NONIO threads old{3} new{7}
CPU count = 36 NONIO threads old{3} new{8}
CPU count = 64 NONIO threads old{5} new{8}
CPU count = 128 NONIO threads old{8} new{8}
Change-Id: Ifa56730ad934ca9ae83993b3c539f4a725872696
Reviewed-on: http://review.couchbase.org/65934
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Jim Walker [Thu, 30 Jun 2016 10:23:20 +0000 (11:23 +0100)]
MB-18453: Make task scheduling fairer
The MB identified that we can starve tasks by scheduling
a higher priority task via ExecutorPool::wake().
This occurs because ExecutorPool::wake() pushes tasks
into the readyQueue enabling frequent wakes to trigger
the starvation bug.
The fix is to remove readyQueue.push from wake, so that we only
push to the readyQueue. The fetch side of scheduling only looks at
the futureQueue once the readyQueue is empty, thus the identified
starvation won't happen.
A unit-test demonstrates the fix using the single-threaded harness and
expects that two tasks of differing priorities get executed, rather
than the wake() starving the low-priority task.
This test drives:
- ExecutorPool::schedule
- ExecutorPool::reschedule
- ExecutorPool::wake
These are all the methods which can add tasks into the scheduler
queue.
The fetch side is also covered:
- ExecutorPool::fetchNextTask
This commit is an update to a previous commit that was reverted due
to performance issues. The original commit was reverted to minimise
disruption.
- original commit is
e22c9ebeda1aac
- revert is
27cb1120e3e37
Change-Id: I70a4dcf7cd1c3a6f04548e9bbc3f95e24cdf50ad
Reviewed-on: http://review.couchbase.org/65929
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Jim Walker [Thu, 2 Jun 2016 15:05:50 +0000 (16:05 +0100)]
[BP] MB-18452: Single threaded test harness improvements
Refactor parts of the very new evp_store_single_threaded_test so that
it's simpler to drive tasks making new tests easier to write.
The main change is to provide helper methods for running any task from
a queue (with some checks) and a way to push a clean shutdown.
Change-Id: I7add574f0768c642f3c6c7c64293e882337a1cdc
Reviewed-on: http://review.couchbase.org/65928
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Manu Dhundi [Tue, 19 Jul 2016 18:21:11 +0000 (11:21 -0700)]
MB-20182: Update checkpoint snapshot correctly during TAP backfill
When we do a TAP backfill we must update checkpoint snapshot start
and end correctly. Otherwise, if we immediately proceed to DCP
after TAP backfill, the checkpoint mgr will have incorrect combination
of {snap_start, snap_end, vb_high_seqno}
Change-Id: I2b738fd3b24486dadbd2962e81e0c3820c5a8786
Reviewed-on: http://review.couchbase.org/65866
Tested-by: buildbot <build@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Jim Walker [Thu, 14 Jul 2016 17:07:29 +0000 (10:07 -0700)]
Revert "MB-18453: Make task scheduling fairer"
When running in a >1 node cluster memcached CPU is running
very high. The original fix has introduced a problem which
needs further investigation (fetchTask is very very cpu hot).
This reverts commit
e22c9ebeda1aac2fc8f4325cc39a93c3bcefffab.
Change-Id: If53a3a60692fbaaef4e54462f99284a8044cd899
Reviewed-on: http://review.couchbase.org/65780
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Manu Dhundi [Mon, 11 Jul 2016 22:10:48 +0000 (15:10 -0700)]
Merge remote-tracking branch 'couchbase/3.0.x'
*
|\
| * 6e10f8a 2016-07-11 | MB-20105: Ensure purge_seq is not reset when no items are purged in a compaction
| * 1fe3aac 2016-07-07 | MB-20054: Fix windows build error by adding size() func in class AtomicQueue
| * 536e32f 2016-07-07 | MB-20054: Fix windows build error by including a missing header file
Change-Id: Ib01ecb053ffa4b6fe2a6bac6cfbe6eccc3630549
Manu Dhundi [Mon, 11 Jul 2016 21:38:15 +0000 (14:38 -0700)]
MB-20105: Ensure purge_seq is not reset when no items are purged in a compaction
When a compaction request is made, we initially set the purge_seqno in the req
to 0, hoping to update it when we purge items. However, if there are no purged
items in a compaction call, then we end up reseting the purge_seqno
(correct one) set by the previous compaction call.
This commit addresses the problem by setting the purge seqno in the request
to current purge seqno in the ep-engine.
Change-Id: I9581abe7a4cb9d7cd84c1bf5563b98c91dc67525
Reviewed-on: http://review.couchbase.org/65626
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Dave Rigby [Fri, 8 Jul 2016 13:44:29 +0000 (14:44 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock
* couchbase/3.0.x:
MB-20054: Regression test - bucket is deleted with DCPBackfill running
MB-20054: Account for memory alloc/dealloc in unregisterBucket
MB-20054: [BP] Add verbose (logging) output to ep_unit_tests_main
Change-Id: I5f05dd3355cc0d581350db65463c6c1dc155f3c6
Dave Rigby [Fri, 8 Jul 2016 09:49:51 +0000 (10:49 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock
* couchbase/3.0.x:
MB-20054: Backport ep-engine_unit_tests from watson to 3.0.x
Change-Id: I811e0b796be8611a4a574ab6b6a488ef50219bbf
Dave Rigby [Thu, 7 Jul 2016 15:03:05 +0000 (16:03 +0100)]
docs/Testing.md: Document the different test types
Change-Id: I3fcdb9e7cb347fa63f94e5c7a760bb54749aa375
Reviewed-on: http://review.couchbase.org/65585
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: Dave Rigby <daver@couchbase.com>
Manu Dhundi [Thu, 7 Jul 2016 21:28:52 +0000 (14:28 -0700)]
MB-20054: Fix windows build error by adding size() func in class AtomicQueue
Change-Id: I808e31c9a9ba97b67e75c07534350aa91cb040a2
Reviewed-on: http://review.couchbase.org/65596
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Sriram Ganesan <sriram@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Manu Dhundi [Thu, 7 Jul 2016 17:50:45 +0000 (10:50 -0700)]
MB-20054: Fix windows build error by including a missing header file
Change-Id: Ifdc8d6a09e8ff1bd68218066cffa44bc9de0a5a3
Reviewed-on: http://review.couchbase.org/65589
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Sriram Ganesan <sriram@couchbase.com>
Dave Rigby [Thu, 7 Jul 2016 08:23:25 +0000 (09:23 +0100)]
MB-20054: Regression test - bucket is deleted with DCPBackfill running
Regression test for MB-20054 - the following abort is encountered when
a DCPBackfill task is still running when a bucket is deleted:
Assertion failed: (engine), function verifyEngine, file
ep-engine/src/objectregistry.cc, line 58.
This issue occurs because the DCPBackfill object (and associated
objects ActiveStream and importantly ActiveStreams' readyQ of Items)
is not deleted earlier in the shutdown sequence (via EvpDestroy), as
we use ref-counted pointers for it and there is a still an outstanding
reference by the AuxIO Thread which is running the task. Hence the
DCPBackfill object is only deleted when we finally unregister the
deleted bucket from the shared ExecutorPool - see the following
backtrace:
#1 0x00007f513b75a085 in abort () from /lib64/libc.so.6
#2 0x00007f51337034e2 in ObjectRegistry::onDeleteItem (pItem=<value optimized out>) at ep-engine/src/objectregistry.cc:157
#3 0x00007f5133652094 in ~Item (this=<value optimized out>) at ep-engine/src/item.h:352
#4 SingleThreadedRCPtr<Item>::~SingleThreadedRCPtr (this=<value optimized out>) at ep-engine/src/atomic.h:430
#5 0x00007f51336c7f47 in ~MutationResponse (this=0x3cd87880) at ep-engine/src/dcp-response.h:275
#6 MutationResponse::~MutationResponse (this=0x3cd87880) at ep-engine/src/dcp-response.h:275
#7 0x00007f51336d86aa in clear_UNLOCKED (this=0x7a3f5fa0) at ep-engine/src/dcp-stream.cc:201
#8 ~ActiveStream (this=0x7a3f5fa0) at ep-engine/src/dcp-stream.h:178
#9 ActiveStream::~ActiveStream (this=0x7a3f5fa0) at ep-engine/src/dcp-stream.h:179
#10 0x00007f51336cc808 in RCPtr<Stream>::~RCPtr (this=0xb1823780) at ep-engine/src/atomic.h:348
#11 0x00007f51336d77c7 in ~DCPBackfill (this=0xb1823740) at ep-engine/src/dcp-stream.cc:114
#12 DCPBackfill::~DCPBackfill (this=0xb1823740) at ep-engine/src/dcp-stream.cc:114
#13 0x00007f513368d95f in ~SingleThreadedRCPtr (this=0x5b55a20, e=0x59c4000, taskType=NO_TASK_TYPE) at ep-engine/src/atomic.h:430
#14 ExecutorPool::_stopTaskGroup (this=0x5b55a20, e=0x59c4000, taskType=NO_TASK_TYPE) at ep-engine/src/executorpool.cc:532
#15 0x00007f513368dad3 in ExecutorPool::_unregisterBucket (this=0x5b55a20, engine=0x59c4000) at ep-engine/src/executorpool.cc:551
#16 0x00007f513368e143 in ExecutorPool::unregisterBucket (this=0x5b55a20, engine=0x59c4000) at ep-engine/src/executorpool.cc:602
#17 0x00007f5133655f82 in EventuallyPersistentStore::~EventuallyPersistentStore (this=0x59e6000)
at ep-engine/src/ep.cc:365
#18 0x00007f5133672a25 in EventuallyPersistentEngine::~EventuallyPersistentEngine (this=0x59c4000)
at ep-engine/src/ep_engine.cc:5791
#19 0x00007f5133672c95 in EvpDestroy (handle=0x59c4000, force=<value optimized out>) at ep-engine/src/ep_engine.cc:143
To actually reproduce the issue is somewhat involved - we need to
orchestrate the world such that we delete the engine while a
DCPBackfill task is still running. We spin up a separate thread which
will run the DCPBackfill task concurrently with destroy - specifically
DCPBackfill must start running (and add items to the readyQ) before
destroy(), it must then continue running (stop after) _stopTaskGroup
is invoked. To achieve this we use a couple of condition variables to
synchronise between the two threads - the timeline needs to look like:
auxIO thread: [------- DCPBackfill ----------]
main thread: [--destroy()--] [ExecutorPool::_stopTaskGroup]
--------------------------------------------------------> time
Change-Id: Ic64c419cb8e4e0af2378efba9711b121aacee15b
Reviewed-on: http://review.couchbase.org/65520
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Dave Rigby [Tue, 5 Jul 2016 20:34:56 +0000 (21:34 +0100)]
MB-20054: Account for memory alloc/dealloc in unregisterBucket
While unregistering a bucket, any memory allocations/deallocations
made should be accounted to the bucket in question. Hence no
`onSwitchThread(NULL)` call.
Change-Id: I5c260e3aa7e2c8d1fd4ff0a1ca20f2185a7362a8
Reviewed-on: http://review.couchbase.org/65525
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Dave Rigby [Tue, 5 Jul 2016 11:31:59 +0000 (12:31 +0100)]
MB-20054: [BP] Add verbose (logging) output to ep_unit_tests_main
Not originally part of MB-20054, but needed for test development for
this MB.
Change-Id: Ia38db00d4f8cd84b2c90b5bddbd0bc01f51b61de
Reviewed-on: http://review.couchbase.org/65516
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Dave Rigby [Thu, 16 Jun 2016 11:19:34 +0000 (12:19 +0100)]
MB-20054: Backport ep-engine_unit_tests from watson to 3.0.x
In Watson we have created a set of 'unit' (i.e. class-level) tests for
ep-engine. To assist in backporting bug fixes, and specifically their
unit tests (to demonstrate they are correct), this patch backports the
test infrastructure itself.
Note these tests require GTest, so the CMake changes necessary for it
have also been included.
Tests are a backport from couchbase/watson as of commit feda304.
Modified to handle changes in APIs etc, and to remove tests
which fail on 3.0.x as we never chose to fix them in the 3.0.x
branch.
Change-Id: Iaaf59b0d8d6ba0a2211b630ba00fd837ca01614a
Reviewed-on: http://review.couchbase.org/64979
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Dave Rigby [Thu, 7 Jul 2016 13:59:04 +0000 (14:59 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock
* couchbase/3.0.x:
MB-19982: Don't hold connsLock for duration of dcp stats
MB-19982: Fix potential deadlock between DcpConsumer::bufMutex & connsLock
MB-14859: Handle quick successive BG Fetch of a key interleaved with exp pager
Change-Id: Ie192ce93370c3218948434794b335732a6a7ff18
Dave Rigby [Thu, 7 Jul 2016 13:48:01 +0000 (14:48 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock
* couchbase/3.0.x:
MB-19886: Fix data race on ActiveStream::curChkSeqno by making it atomic
MB-19886: In markDiskSnapshot() get current vb snapshot info outside streamMutex
MB-19843: Modify the end_seqno in DCP stream request after checking for rollback
Change-Id: I32c52689b4c78f3416af180818df772d217db882
Dave Rigby [Thu, 7 Jul 2016 13:13:19 +0000 (14:13 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock
* couchbase/3.0.x:
MB-19732: Fix the data race on lastSendTime between stats and dcp worker threads
MB-19732: Record time for all DCP consumer messages
MB-19732: Only update sendTime if successfully send noop
MB-19691: Address data race on vb_state::high_seqno
Change-Id: I2d994dd799c8fe5ee5779d3916e374aa3fa9615b
Dave Rigby [Thu, 7 Jul 2016 13:20:43 +0000 (14:20 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock
* couchbase/3.0.x:
MB-19678: Merge backfill and in-memory snapshots correctly on replica vb
Change-Id: I1b0adbafe45e4f4414da62846019b55c7dd05833
Dave Rigby [Thu, 7 Jul 2016 12:59:10 +0000 (13:59 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock
* couchbase/3.0.x:
MB-19636: Initialise failovers correctly from 2.5.x vbstate
MB-19673: Log the actual last seqno sent before closing the stream.
Change-Id: If0aae515a9fb3232a390b8228cf92274fcc81456
Dave Rigby [Thu, 7 Jul 2016 10:57:50 +0000 (11:57 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock
* couchbase/3.0.x:
MB-19503: Fix ConnMap so notifications don't go missing [2]
MB-19503: Fix ConnMap so notifications don't go missing.
MB-19404: [BP] Address data race in DCP-Producer seen while making a stats request
MB-19405: [BP] Address possible data races in PassiveStream context
Change-Id: I241ebd07f9e6177d557dd0ea37da97d6b4cc1489
Dave Rigby [Thu, 7 Jul 2016 09:34:12 +0000 (10:34 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock
* couchbase/3.0.x:
MB-19359: [3] Address lock inversion with vb's state lock and snapshot lock
MB-19383: [BP] Address possible data race with startuptime
MB-19380: Address data race observed with vb's pendingBGFetches
MB-19360: Init mock server in stream module tests
MB-19382: [BP] Create a variable to get correct locking scope
Change-Id: Ice3e97c12ee1423923ffeda47bc30890332a1770
Jim Walker [Thu, 30 Jun 2016 10:23:20 +0000 (11:23 +0100)]
MB-18453: Make task scheduling fairer
The MB identified that we can starve tasks by scheduling
a higher priority task via ExecutorPool::wake().
This occurs because ExecutorPool::wake() pushes tasks
into the readyQueue enabling frequent wakes to trigger
the starvation bug.
The fix is to remove readyQueue.push from wake, so that we only
push to the readyQueue. The fetch side of scheduling only looks at
the futureQueue once the readyQueue is empty, thus the identified
starvation won't happen.
A unit-test demonstrates the fix using the single-threaded harness and
expects that two tasks of differing priorities get executed, rather
than the wake() starving the low-priority task.
This test drives:
- ExecutorPool::schedule
- ExecutorPool::reschedule
- ExecutorPool::wake
These are all the methods which can add tasks into the scheduler
queue.
The fetch side is also covered:
- ExecutorPool::fetchNextTask
Change-Id: Ie797a637ce4e7066e3155751ff467bc65d083646
Reviewed-on: http://review.couchbase.org/65385
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Dave Rigby [Tue, 5 Jul 2016 17:08:16 +0000 (17:08 +0000)]
Merge "Merge remote-tracking branch 'couchbase/sherlock' into watson" into watson
Dave Rigby [Tue, 14 Jun 2016 13:49:09 +0000 (14:49 +0100)]
MB-20046: ep_store_test: Use the correct dbname instead of 'test'
While EventuallyPersistentStoreTest declares a test_dbname variable,
and attempts to delete any files in this directory at the start of the
run, the variable isn't added to the actual config string pased into
EPEngine, resulting in us using the default dbname ('test'), and hence
failing to delete any previous data files.
Fix by adding the dbname to the test config.
Change-Id: I768850277ee3888c0d02bb823203569ff968ee3a
Reviewed-on: http://review.couchbase.org/65300
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Jim Walker [Sun, 26 Jun 2016 19:52:10 +0000 (20:52 +0100)]
MB-18453: Give all tasks their own stats and priority
MB-18453 identified that tasks have copied & pasted
constructors which leads to some tasks all having the
same Priority object.
The fallout of this is that many tasks now all contribute
to the same histogram for runtime and scheduling waittime.
When debugging issues which lead to MB-18453 it is near
impossible at times to know which real task was delayed
as the stats can be attributed to many tasks.
This commit introduces makes all tasks have their own ID
and thus their own histograms and also makes it easier
for new tasks to be created without forgetting to create
a new Priority instance.
tasks.defs.h is a new file that captures every sub-class
of GlobalTask and shows the priority of all tasks.
TASK macros are now used to generate various switch
statements and enums used in task accounting.
The new system is not strict, MyTask could still be
assigned the priority/id of OldTask, however this
flexibility can be useful in some circumstances.
Note this patch has changed ep_testsuite test_item_pager
to increase the max_size value in the test config. This
is because this patch increases the baseline heap usage
of a bucket as we've increased the number of Histogram
object allocated by EventuallyPersistentStore.
Prior to this patch 28 were allocated, with this patch
51 are allocated (1 per task). Each Histogram<hrtime_t
is approx 1568 bytes (on OSX clang build).
The new max_size is 2.5MiB
Change-Id: I209c67945b964023615af37a12f83ca97142ce53
Reviewed-on: http://review.couchbase.org/65253
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Wed, 4 May 2016 09:49:59 +0000 (10:49 +0100)]
[BP] MB-18580: Wait for VB state to be persisted before starting tests
Intermittent test failures (across multiple tests) have been seen
where we fail to read the number of items in vbucket disk file:
terminate called after throwing an instance of 'std::invalid_argument'
what(): CouchKVStore::getDbFileInfo: Failed to open database file for vBucket = 1 rev = 1 with error:no such file
The issue is that we do not correctly wait for the vBucket files to be
created before starting a test. We /attempt/ to wait in test_setup,
waiting for ep_vb_snapshot_total to be non-zero, however this stat is
not updated when vBuckets are written to disk, instead only when the
vb state snapshot occurs.
To fix this, create a new histogram stat - ep_vb_persist_state_total -
which records how long the actual persist takes (and counts then at
the same time). Change test_setup to check for this stat becoming 1
before continuing.
Results in two new stats:
* disk_persist_vbstate - timing histogram of how long vbState
operations took.
* ep_persist_vbstate_total - count of how many VBStatePersists have
occurred.
Change-Id: Ic24e6cdb51a98ea6fa65005158242bfcf44225d0
Reviewed-on: http://review.couchbase.org/65298
Reviewed-by: Dave Rigby <daver@couchbase.com>
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Jim Walker [Tue, 14 Jun 2016 15:16:10 +0000 (16:16 +0100)]
MB-18452: Extra refactoring and single-threaded test
Some extra refactoring applied to watson branch and
a single threaded test utilising the watson+
single-threaded unit test harness.
Change-Id: I3028c079e448552987268206ed2664c10933085a
Reviewed-on: http://review.couchbase.org/64956
Well-Formed: buildbot <build@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Jim Walker [Wed, 29 Jun 2016 07:57:47 +0000 (08:57 +0100)]
MB-19948: enable disabled meta-data tests.
With MB-19948 these tests no longer fail
valgrind, so can be enabled.
Change-Id: Ida628a3dd48de243703ebc282b84dc23d5a69ac6
Reviewed-on: http://review.couchbase.org/65324
Well-Formed: buildbot <build@couchbase.com>
Reviewed-by: Will Gardner <will.gardner@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Dave Rigby [Thu, 30 Jun 2016 14:30:04 +0000 (15:30 +0100)]
Merge remote-tracking branch 'couchbase/sherlock' into watson
* couchbase/sherlock:
MB-19843: Modify the end_seqno in DCP stream request after checking for rollback
Change-Id: I6ebfed2f2046c2e6079125f7b015fc9e3ac032cd
Dave Rigby [Wed, 29 Jun 2016 09:17:13 +0000 (10:17 +0100)]
Merge remote-tracking branch 'couchbase/3.0.x' into sherlock
* couchbase/3.0.x:
MB-19359: [2] Address lock inversion with vb's state lock and snapshot lock
MB-19359: [1] Address lock inversion with vb's state lock and snapshot lock
Change-Id: Ia068af9b26e8a4b980bf22341fa43ac5452aca60