ep-engine.git
5 years agoFix compilation issue on windows 54/57454/4 3.1.3 v3.1.3
abhinavdangeti [Fri, 4 Dec 2015 00:15:47 +0000 (16:15 -0800)]
Fix compilation issue on windows

<http://factory.couchbase.com/job/win_cs_build/ws/couchbase\ep-engine\test
s\ep_testsuite.cc(4958)> : error C2782: 'void checkeqfn(T,T,const char
        ,const char ,const int)' : template parameter 'T' is ambiguous

<http://factory.couchbase.com/job/win_cs_build/ws/couchbase\ep-engine\test
s\ep_testsuite.cc(74)> : see declaration of 'checkeqfn'
could be 'unsigned __int64'
or       'unsigned long'
NMAKE : fatal error U1077:
'C:\PROGRA~2\MICROS~2.0\VC\bin\amd64\cl.exe' :
return code '0x2'

Change-Id: I9a0bf5bd74276ebe9ac6a709302704a2bab06c25
Reviewed-on: http://review.couchbase.org/57454
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
5 years ago[BP] MB-16915: Remove cyclic reference between DcpConsumer and PassiveStream. 49/57449/2
Manu Dhundi [Thu, 3 Dec 2015 21:32:35 +0000 (13:32 -0800)]
[BP] MB-16915: Remove cyclic reference between DcpConsumer and PassiveStream.

DcpConsumer holds a reference to PassiveStream and vice versa. We must
make sure that one of them (DcpConsumer here) releases the reference
to another in a function other than the object destructor.

Change-Id: I8e5c262bc5ac50342f85ba80d481987a26a7a21d
Reviewed-on: http://review.couchbase.org/57429
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-on: http://review.couchbase.org/57449
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years ago[BP] MB-16915: RollbackTask to hold ref count ptr for DCP consumer instead of raw ptr 48/57448/2
Manu Dhundi [Thu, 3 Dec 2015 01:19:51 +0000 (17:19 -0800)]
[BP] MB-16915: RollbackTask to hold ref count ptr for DCP consumer instead of raw ptr

Rollback task is spawned when a DCP consumer is asked to rollback by a DCP
producer. Rollback runs in background and there is a possibility that the DCP
consumer object gets deleted before rollback task completes. We can avoid this
if RollbackTask holds a ref counted ptr of DCP consumer instead of a raw ptr.

Change-Id: I00c1bced0ec445226e64e6f7647a3bfbfb063f94
Reviewed-on: http://review.couchbase.org/57427
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-on: http://review.couchbase.org/57448
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years ago[BP] MB-16915: Use refcounted pointers on producer/consumer 47/57447/2
Jim Walker [Mon, 30 Nov 2015 13:31:59 +0000 (13:31 +0000)]
[BP] MB-16915: Use refcounted pointers on producer/consumer

Prevents a race/crash occuring when the DcpProducer is destroyed
and there are backfill tasks running/pending.

The test case reveals the probem when run under valgrind as
a series of invalid reads of freed memory. E.g.

==40673== Thread 17:
==40673== Invalid read of size 8
==40673==    at 0x71A3CEE: DCPBackfill::run() (dcp-stream.cc:175)
==40673==    by 0x717215C: ExecutorThread::run() (executorthread.cc:110)
==40673==    by 0x7172868: launch_executor_thread (executorthread.cc:34)
==40673==    by 0x503EC67: platform_thread_wrap (cb_pthreads.c:24)
==40673==    by 0x524A181: start_thread (pthread_create.c:312)
==40673==    by 0x555A47C: clone (clone.S:111)
==40673==  Address 0x64c2380 is 48 bytes inside a block of size 384 free'd
==40673==    at 0x4C2C2BC: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==40673==    by 0x718C4ED: DcpConnMap::manageConnections() (atomic.h:430)
==40673==    by 0x71906A5: ConnManager::run() (connmap.cc:151)
==40673==    by 0x717215C: ExecutorThread::run() (executorthread.cc:110)
==40673==    by 0x7172868: launch_executor_thread (executorthread.cc:34)
==40673==    by 0x503EC67: platform_thread_wrap (cb_pthreads.c:24)
==40673==    by 0x524A181: start_thread (pthread_create.c:312)
==40673==    by 0x555A47C: clone (clone.S:111)

Change-Id: I32a7dfd10daa4565b9cbb4c8142ed8f71c13ca31
Reviewed-on: http://review.couchbase.org/57296
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-on: http://review.couchbase.org/57447
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16686: Remove sanity check while adding TAP over DCP 64/56564/3 v3.1.2
abhinavdangeti [Fri, 30 Oct 2015 17:11:46 +0000 (10:11 -0700)]
MB-16686: Remove sanity check while adding TAP over DCP

This check isn't accurate as certain TAP messages from
the producer carry no vbucket information - initialized to
zero (expected), as they aren't vbucket specific operations.
In such a scenario, if the TAP consumer needs to be created,
it wouldn't be allowed to if a DCP passive stream exists
for vbucket 0. This would break an online upgrade.

Change-Id: I310b9cf4dbaf652c233cba02de7ca72469efa89d
Reviewed-on: http://review.couchbase.org/56564
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-15171: [BP] Initialize dcpConnMap_ to NULL in engine constructor 71/56171/3
Sriram Ganesan [Thu, 28 May 2015 01:51:23 +0000 (18:51 -0700)]
MB-15171: [BP] Initialize dcpConnMap_ to NULL in engine constructor

Not initializing this variable to NULL can cause access to an
invalid pointer during engine destroy.

Change-Id: Icc5d848f7826bb6331deb40b4832efcf64622dea
Reviewed-on: http://review.couchbase.org/51492
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-on: http://review.couchbase.org/56171

5 years agoMB-14825: [BP] While trying to stream next checkpoint item, check if vbucket is valid 70/56170/3
Manu Dhundi [Thu, 7 May 2015 01:30:24 +0000 (18:30 -0700)]
MB-14825: [BP] While trying to stream next checkpoint item, check if vbucket is valid

If a vbucket is deleted in middle of a DCP connection streaming a checkpoint
item, we should handle such a scenario in a graceful manner.

Change-Id: I24fe52adc572f504f492f015f82fc8d5e0325925
Reviewed-on: http://review.couchbase.org/50674
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-on: http://review.couchbase.org/56170
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-16500 [BP]: Address data race in DcpConsumer, by acquiring readyMutex 65/56065/4
abhinavdangeti [Wed, 7 Oct 2015 21:49:41 +0000 (14:49 -0700)]
MB-16500 [BP]: Address data race in DcpConsumer, by acquiring readyMutex

WARNING: ThreadSanitizer: data race (pid=27652)

  Write of size 8 at 0x7d08000443c0 by main thread (mutexes: write M57876):
    #0 operator delete(void*) <null>:0 (engine_testapp+0x000000050e7b)
    #1 __gnu_cxx::new_allocator<std::_List_node<unsigned short> >::deallocate(std::_List_node<unsigned short>*, unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/ext/new_allocator.h:110 (ep.so+0x00000005d69a)
    #2 DcpConsumer::step(dcp_message_producers*) /home/abhinav/couchbase/ep-engine/src/dcp/consumer.cc:516 (ep.so+0x00000005c5cc)
    #3 EvpDcpStep(engine_interface*, void const*, dcp_message_producers*) /home/abhinav/couchbase/ep-engine/src/ep_engine.cc:1479 (ep.so+0x0000000b480b)
    #4 mock_dcp_step(engine_interface*, void const*, dcp_message_producers*) /home/abhinav/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:476 (engine_testapp+0x0000000bb055)
    #5 dcp_step(engine_interface*, engine_interface_v1*, void const*) /home/abhinav/couchbase/ep-engine/tests/ep_test_apis.cc:1219 (ep_testsuite.so+0x0000000b61bd)
    #6 test_chk_manager_rollback(engine_interface*, engine_interface_v1*) /home/abhinav/couchbase/ep-engine/tests/ep_testsuite.cc:5526 (ep_testsuite.so+0x0000000809b4)
    #7 execute_test(test, char const*, char const*) /home/abhinav/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:1090 (engine_testapp+0x0000000b952c)
    #8 __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287 (libc.so.6+0x000000021ec4)

  Previous write of size 8 at 0x7d08000443c0 by thread T16:
    #0 operator new(unsigned long) <null>:0 (engine_testapp+0x00000005090d)
    #1 __gnu_cxx::new_allocator<std::_List_node<unsigned short> >::allocate(unsigned long, void const*) /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/ext/new_allocator.h:104 (ep.so+0x00000005f265)
    #2 PassiveStream::reconnectStream(RCPtr<VBucket>&, unsigned int, unsigned long) /home/abhinav/couchbase/ep-engine/src/dcp/stream.cc:1104 (ep.so+0x000000076f5f)
    #3 DcpConsumer::doRollback(unsigned int, unsigned short, unsigned long) /home/abhinav/couchbase/ep-engine/src/dcp/consumer.cc:676 (ep.so+0x00000005db67)
    #4 RollbackTask::run() /home/abhinav/couchbase/ep-engine/src/dcp/consumer.cc:574 (ep.so+0x00000005d9d4)
    #5 ExecutorThread::run() /home/abhinav/couchbase/ep-engine/src/executorthread.cc:115 (ep.so+0x0000000f834c)
    #6 launch_executor_thread(void*) /home/abhinav/couchbase/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000f7eb5)
    #7 platform_thread_wrap /home/abhinav/couchbase/platform/src/cb_pthreads.c:23 (libplatform.so.0.1.0+0x000000003d71)

Change-Id: I196a78e54bf8014967a51cdb081126597153f77b
Reviewed-on: http://review.couchbase.org/55881
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-on: http://review.couchbase.org/56065
Reviewed-by: Dave Rigby <daver@couchbase.com>
5 years agoMB-16500 [BP]: Removing unnecessary locking in consumer code 80/56080/3
abhinavdangeti [Thu, 13 Aug 2015 18:46:35 +0000 (11:46 -0700)]
MB-16500 [BP]: Removing unnecessary locking in consumer code

streamMutex is to protect the ready list, but not the streams list.

The front end operations: addStream, closeStream, handleResponse, step
- wouldn't race with each other over the streams list, as multiple
memcached threads will not serve a single cookie.

The back end operations: processBufferedMessages (doesn't grab lock any
way), doRollback just read from streams list.

An addstream (front end op) is the only one that updates streams, and
this wouldn't update when a rollback is in progress.

Therefore, renaming the streamMutex lock in DCPConsumer to readyMutex
which is more apt for its operation - guarding the ready list.

Change-Id: Ia342d7243fef4b97b729aa94fdc64ad020711589
Reviewed-on: http://review.couchbase.org/54406
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Reviewed-on: http://review.couchbase.org/56080
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
5 years agoMB-16500 [BP]: MB-16496 Fix the race on vbucket state between persistVBState() and... 68/56068/5
Chiyoung Seo [Fri, 9 Oct 2015 18:39:10 +0000 (11:39 -0700)]
MB-16500 [BP]: MB-16496 Fix the race on vbucket state between persistVBState() and compactVB()

The following data race is reported by thread sanitizer:

WARNING: ThreadSanitizer: data race (pid=29921)
  Write of size 8 at 0x7d680001f580 by thread T5 (mutexes: write M12734):
    #0 VBucket::setPurgeSeqno() ep-engine/src/vbucket.h:215:9 (ep.so+0x000000086204)
    #1 EventuallyPersistentStore::compactVBucket() ep-engine/src/ep.cc:1584 (ep.so+0x000000086204)
    #2 CompactVBucketTask::run() ep-engine/src/tasks.cc:94:12 (ep.so+0x00000012971e)
    #3 ExecutorThread::run() ep-engine/src/executorthread.cc:115:26 (ep.so+0x0000000ea41d)
    #4 launch_executor_thread() ep-engine/src/executorthread.cc:33:9 (ep.so+0x0000000e9fe5)
    #5 platform_thread_wrap platform/src/cb_pthreads.c:23:5 (libplatform.so.0.1.0+0x000000004161)

  Previous read of size 8 at 0x7d680001f580 by thread T7:
    #0 VBucket::getPurgeSeqno() ep-engine/src/vbucket.h:211:16 (ep.so+0x0000000821d3)
    #1 EventuallyPersistentStore::persistVBState() ep-engine/src/ep.cc:1217 (ep.so+0x0000000821d3)
    #2 VBStatePersistTask::run() ep-engine/src/tasks.cc:86:12 (ep.so+0x000000129636)
    #3 ExecutorThread::run() ep-engine/src/executorthread.cc:115:26 (ep.so+0x0000000ea41d)
    #4 launch_executor_thread() ep-engine/src/executorthread.cc:33:9 (ep.so+0x0000000e9fe5)
    #5 platform_thread_wrap platform/src/cb_pthreads.c:23:5 (libplatform.so.0.1.0+0x000000004161)

  Location is heap block of size 1392 at 0x7d680001f200 allocated by main thread:
    #0 operator new() <null> (engine_testapp+0x00000045cded)
    #1 EventuallyPersistentStore::setVBucketState() ep-engine/src/ep.cc:1300:30 (ep.so+0x000000082b1a)
    #2 EventuallyPersistentEngine::setVBucketState() ep-engine/src/ep_engine.h:718:16 (ep.so+0x0000000ca308)
    #3 setVBucket()) ep-engine/src/ep_engine.cc:884 (ep.so+0x0000000ca308)
    #4 processUnknownCommand()) ep-engine/src/ep_engine.cc:1178 (ep.so+0x0000000ca308)
    #5 EvpUnknownCommand()) ep-engine/src/ep_engine.cc:1389:38 (ep.so+0x0000000aafc8)
    #6 mock_unknown_command()) memcached/programs/engine_testapp/engine_testapp.cc:380:19 (engine_testapp+0x0000004c56b9)
    #7 set_vbucket_state() ep-engine/tests/ep_test_apis.cc:607:9 (ep_testsuite.so+0x0000000a3a4b)
    #8 test_setup() ep-engine/tests/ep_testsuite_common.cc:146:28 (ep_testsuite.so+0x00000009cdda)
    #9 execute_test() memcached/programs/engine_testapp/engine_testapp.cc:1085:47 (engine_testapp+0x0000004c4103)
    #10 main memcached/programs/engine_testapp/engine_testapp.cc:1439 (engine_testapp+0x0000004c4103)

To address the above issue, vbucket states should be read after grabbing
the vbucket writer lock in EPStore::persistVBState().

Change-Id: I5a42b3e15a1cf5c941d399897bc68d6f35746ff3
Reviewed-on: http://review.couchbase.org/55972
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Reviewed-on: http://review.couchbase.org/56068
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16500 [BP]: Address possible data race in checkpoint remover 67/56067/4
abhinavdangeti [Thu, 8 Oct 2015 22:14:25 +0000 (15:14 -0700)]
MB-16500 [BP]: Address possible data race in checkpoint remover

WARNING: ThreadSanitizer: data race (pid=102986)

  Read of size 1 at 0x7d180000c298 by thread T17:
    #0 ClosedUnrefCheckpointRemoverTask::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/checkpoint_remover.cc:139 (ep.so+0x00000003d4a6)
    #1 ExecutorThread::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:115 (ep.so+0x0000000f9c4c)
    #2 launch_executor_thread(void*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000f9815)
    #3 platform_thread_wrap /home/couchbase/.ccache/tmp/cb_pthread.tmp.9515862b2292.83537.i:0 (libplatform.so.0.1.0+0x0000000041b1)

  Previous write of size 1 at 0x7d180000c298 by thread T18:
    #0 CheckpointVisitor::complete() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/checkpoint_remover.cc:69 (ep.so+0x00000003e0a1)
    #1 VBCBAdaptor::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/ep.cc:3789 (ep.so+0x00000009d64a)
    #2 ExecutorThread::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:115 (ep.so+0x0000000f9c4c)
    #3 launch_executor_thread(void*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000f9815)
    #4 platform_thread_wrap /home/couchbase/.ccache/tmp/cb_pthread.tmp.9515862b2292.83537.i:0 (libplatform.so.0.1.0+0x0000000041b1)

Change-Id: I8579d5af259490e41028f57f302e547b0826fa61
Reviewed-on: http://review.couchbase.org/55937
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-on: http://review.couchbase.org/56067

5 years agoMB-16500 [BP]: Address possible data race in item/expiry pagers 66/56066/4
abhinavdangeti [Thu, 8 Oct 2015 22:05:58 +0000 (15:05 -0700)]
MB-16500 [BP]: Address possible data race in item/expiry pagers

WARNING: ThreadSanitizer: data race (pid=102450)

  Write of size 1 at 0x7d180000c2f8 by thread T17:
    #0 PagingVisitor::complete() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/item_pager.cc:175 (ep.so+0x000000106c57)
    #1 VBCBAdaptor::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/ep.cc:3789 (ep.so+0x00000009d64a)
    #2 ExecutorThread::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:115 (ep.so+0x0000000f9c4c)
    #3 launch_executor_thread(void*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000f9815)
    #4 platform_thread_wrap /home/couchbase/.ccache/tmp/cb_pthread.tmp.9515862b2292.83537.i:0 (libplatform.so.0.1.0+0x0000000041b1)

  Previous read of size 1 at 0x7d180000c2f8 by thread T18:
    #0 ExpiredItemPager::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/item_pager.cc:334 (ep.so+0x0000001053c6)
    #1 ExecutorThread::run() /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:115 (ep.so+0x0000000f9c4c)
    #2 launch_executor_thread(void*) /home/couchbase/jenkins/workspace/ep-engine-threadsanitizer-master/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000f9815)
    #3 platform_thread_wrap /home/couchbase/.ccache/tmp/cb_pthread.tmp.9515862b2292.83537.i:0 (libplatform.so.0.1.0+0x0000000041b1)

Change-Id: Iebfe280c95847ee80b2d80d08b0eb340f40663d9
Reviewed-on: http://review.couchbase.org/55935
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-on: http://review.couchbase.org/56066

5 years agoMB-16500 [BP]: Fix potenial deadlock around Connmap::relaseLock / connLock 64/56064/2
Dave Rigby [Wed, 7 Oct 2015 11:23:39 +0000 (11:23 +0000)]
MB-16500 [BP]: Fix potenial deadlock around Connmap::relaseLock / connLock

As reported by ThreadSanitizer (see below), we have a lock inversion
creating a potential deadlock in Connmap, related to how we shutdown
connections:

There exists a cycle in lock order graph:

 M2176 (Connmap::releaseLock in ConnMap::notifyPausedConnection() connmap.cc:235) =>
   M128093 (mock_server::conn_struct::mutex) =>
     M2177 (Connmap::connsLock in TapConnMap::shutdownAllConnections(), connmap.cc:770) =>
       M2176 (Connmap::releaseLock in TapConnMap::shutdownAllConnections(), connmap.cc:777) DEADLOCK!

The problem appears to be that in TapConnMap::shutdownAllConnections()
we first acquire {connsLock}, then acquire {releaseLock}; all while
holding the cookie lock.

Fix is to drop releaseLock in shutdownAllConnections() once we've
released any references, *then* acquire the connLock to actually clear
out the connection map.

ThreadSanitizer report follows (irrelevent parts removed):

WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=1087)

  Cycle in lock order graph: M2176 (0x7d840001b810) => M128093 (0x7d280000efa0) => M2177 (0x7d840001b850) => M2176

  Mutex M128093 acquired here while holding mutex M2176 in thread T10:
    <cut>

  Mutex M2176 previously acquired by the same thread here:
    #0 pthread_mutex_lock <null> ()
    ...
    #5 ConnMap::notifyPausedConnection() ep-engine/src/connmap.cc:235 ()
    <cut>

  Mutex M2177 acquired here while holding mutex M128093 in main thread:
    #0 pthread_mutex_lock <null> ()
    ...
    #5 TapConnMap::newProducer() ep-engine/src/connmap.cc:378 ()
    #6 EventuallyPersistentEngine::createTapQueue() ep-engine/src/ep_engine.cc:2663:23 ()
    <cut>

  Mutex M128093 previously acquired by the same thread here:
    #0 pthread_mutex_lock <null> ()
    #1 cb_mutex_enter platform/src/cb_pthreads.c:115:14 ()
    #2 lock_mock_cookie memcached/programs/engine_testapp/mock_server.c:436:4 ()
    #3 test_tap_stream() ep-engine/tests/ep_testsuite.cc:6751:5 ()
    #4 execute_test() memcached/programs/engine_testapp/engine_testapp.cc:1090:19 ()
    #5 main memcached/programs/engine_testapp/engine_testapp.cc:1439 ()

  Mutex M2176 acquired here while holding mutex M2177 in main thread:
    #0 pthread_mutex_lock <null> ()
    ...
    #5 TapConnMap::shutdownAllConnections() ep-engine/src/connmap.cc:777 ()
    #6 EventuallyPersistentEngine::destroy() ep-engine/src/ep_engine.cc:2190:9 ()
    <cut>

  Mutex M2177 previously acquired by the same thread here:
    #0 pthread_mutex_lock <null> ()
    ...
    #5 TapConnMap::shutdownAllConnections() ep-engine/src/connmap.cc:770 ()
    #6 EventuallyPersistentEngine::destroy() ep-engine/src/ep_engine.cc:2190:9 ()
    <cut>

Change-Id: Ic9a4f028b202277729df025333ce630be056903d
Reviewed-on: http://review.couchbase.org/55863
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-on: http://review.couchbase.org/56064

5 years agoMB-16500 [BP]: Address possible data race in Notifiable: setSuspended 63/56063/2
abhinavdangeti [Tue, 6 Oct 2015 00:47:23 +0000 (17:47 -0700)]
MB-16500 [BP]: Address possible data race in Notifiable: setSuspended

WARNING: ThreadSanitizer: data race (pid=19078)

  Write of size 1 at 0x7d600000f078 by thread T10 (mutexes: write M21717):
    #0 Notifiable::setSuspended(bool) /home/abhinav/couchbase/ep-engine/src/tapconnection.h:764 (ep.so+0x00000005fe6a)
    #1 TapProducer::suspendedConnection_UNLOCKED(bool) /home/abhinav/couchbase/ep-engine/src/tapconnection.cc:717 (ep.so+0x00000013b24e)
    #2 ExecutorThread::run() /home/abhinav/couchbase/ep-engine/src/executorthread.cc:112 (ep.so+0x0000000f89f6)
    #3 launch_executor_thread(void*) /home/abhinav/couchbase/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000f8595)
    #4 platform_thread_wrap /home/abhinav/couchbase/platform/src/cb_pthreads.c:23 (libplatform.so.0.1.0+0x000000003d31)

  Previous read of size 1 at 0x7d600000f078 by main thread (mutexes: write M21715):
    #0 Notifiable::isSuspended() /home/abhinav/couchbase/ep-engine/src/tapconnection.h:768 (ep.so+0x0000000dfdf2)
    #1 EventuallyPersistentEngine::walkTapQueue(void const*, void**, void**, unsigned short*, unsigned char*, unsigned short*, unsigned int*, unsigned short*) /home/abhinav/couchbase/ep-engine/src/ep_engine.cc:2531 (ep.so+0x0000000b7ecc)
    #2 EvpTapIterator(engine_interface*, void const*, void**, void**, unsigned short*, unsigned char*, unsigned short*, unsigned int*, unsigned short*) /home/abhinav/couchbase/ep-engine/src/ep_engine.cc:1440 (ep.so+0x0000000d5ad7)
    #3 mock_tap_iterator(engine_interface*, void const*, void**, void**, unsigned short*, unsigned char*, unsigned short*, unsigned int*, unsigned short*) /home/abhinav/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:81 (engine_testapp+0x0000000bbda2)
    #4 test_tap_ack_stream(engine_interface*, engine_interface_v1*) /home/abhinav/couchbase/ep-engine/tests/ep_testsuite.cc:7353 (ep_testsuite.so+0x0000000504a7)
    #5 execute_test(test, char const*, char const*) /home/abhinav/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:1090 (engine_testapp+0x0000000b946c)
    #6 __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287 (libc.so.6+0x000000021ec4)

Change-Id: I596c93c048767911b052861193822ca17270a5dd
Reviewed-on: http://review.couchbase.org/55792
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-on: http://review.couchbase.org/56063

5 years agoMB-16500 [BP]: Address possible data race in ConnHandler: lastWalkTime 62/56062/2
abhinavdangeti [Mon, 5 Oct 2015 22:11:24 +0000 (15:11 -0700)]
MB-16500 [BP]: Address possible data race in ConnHandler: lastWalkTime

WARNING: ThreadSanitizer: data race (pid=26185)

  Write of size 4 at 0x7d4c0000a154 by main thread:
    #0 ConnHandler::setLastWalkTime() /home/abhinav/couchbase/ep-engine/src/tapconnection.h:356 (ep.so+0x000000065641)
    #1 EvpDcpStep(engine_interface*, void const*, dcp_message_producers*) /home/abhinav/couchbase/ep-engine/src/ep_engine.cc:1478 (ep.so+0x0000000b4d5b)
    #2 mock_dcp_step(engine_interface*, void const*, dcp_message_producers*) /home/abhinav/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:476 (engine_testapp+0x0000000baf95)
    #3 dcp_stream(engine_interface*, engine_interface_v1*, char const*, void const*, unsigned short, unsigned int, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, int, int, int, int, bool, bool, unsigned char, bool, unsigned long*, bool) /home/abhinav/couchbase/ep-engine/tests/ep_testsuite.cc:4164 (ep_testsuite.so+0x0000000990df)
    #4 test_dcp_producer_stream_req_dgm(engine_interface*, engine_interface_v1*) /home/abhinav/couchbase/ep-engine/tests/ep_testsuite.cc:4564 (ep_testsuite.so+0x000000077634)
    #5 execute_test(test, char const*, char const*) /home/abhinav/couchbase/memcached/programs/engine_testapp/engine_testapp.cc:1090 (engine_testapp+0x0000000b946c)
    #6 __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287 (libc.so.6+0x000000021ec4)

  Previous read of size 4 at 0x7d4c0000a154 by thread T10 (mutexes: write M1367):
    #0 ConnHandler::getLastWalkTime() /home/abhinav/couchbase/ep-engine/src/tapconnection.h:360 (ep.so+0x000000049cbe)
    #1 ConnManager::run() /home/abhinav/couchbase/ep-engine/src/connmap.cc:150 (ep.so+0x00000005031e)
    #2 ExecutorThread::run() /home/abhinav/couchbase/ep-engine/src/executorthread.cc:112 (ep.so+0x0000000f8746)
    #3 launch_executor_thread(void*) /home/abhinav/couchbase/ep-engine/src/executorthread.cc:33 (ep.so+0x0000000f82e5)
    #4 platform_thread_wrap /home/abhinav/couchbase/platform/src/cb_pthreads.c:23 (libplatform.so.0.1.0+0x000000003d31)

Change-Id: I2c5024afde6cb749aad901572bfd68734f6d7d5d
Reviewed-on: http://review.couchbase.org/55780
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-on: http://review.couchbase.org/56062

5 years agoMB-16357: Interlock expiry and vbucket state changes 46/55646/8
Jim Walker [Thu, 1 Oct 2015 16:12:19 +0000 (17:12 +0100)]
MB-16357: Interlock expiry and vbucket state changes

Expiry should only occur whilst the vbucket is active.
Background tasks performing expiry deletion must stop
driving deletions when the vb changes status to !active.

Using a reader/writer lock the core deleteExpiredItem
function which is used by both compaction driven expiry
and the item pager are now interlocked with vbucket::setState()

Change-Id: I19d30c3d7855778613ccb4534a042c0daf627b8c
Reviewed-on: http://review.couchbase.org/55646
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-16434: In consumer stream get bytes cleared atomically. 08/55708/2
Manu Dhundi [Thu, 1 Oct 2015 22:38:14 +0000 (15:38 -0700)]
MB-16434: In consumer stream get bytes cleared atomically.

When a comsumer stream is set to dead we clear the consumer buffer and
notify the producer of the number of bytes cleared. Clearing the
consumer buffer and the accounting of the bytes cleared should be done
atomically

Change-Id: I602d5307c6c2bbd3dc7f03f1d6c43cbe294389ac
Reviewed-on: http://review.couchbase.org/55708
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16434: In setDead release streamMutex before cleaning up stream buffer 07/55707/5
Manu Dhundi [Thu, 1 Oct 2015 22:35:47 +0000 (15:35 -0700)]
MB-16434: In setDead release streamMutex before cleaning up stream buffer

To avoid lock order inversion in DCP passive stream we must release streamMutex
before acquiring bufMutex. This is because while processing incoming mutations
on dcp consumer we acquire bufMutex first and then streamMutex.

Change-Id: I129d014dc3a7ec91416af04851419782b89cea23
Reviewed-on: http://review.couchbase.org/55707
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years ago[BP] MB-14382 Increase the default number of ht_locks by a factor of approx. 10 05/55705/3
Chiyoung Seo [Fri, 27 Mar 2015 21:58:27 +0000 (14:58 -0700)]
[BP] MB-14382 Increase the default number of ht_locks by a factor of approx. 10

This change is backported from sherlock.

From the recent perf test results, we observed the lock contention on hash table
buckets when a hash table scanning task (i.e., expiry pager, item pager,
accecss scanner, etc.) is running.

As a long term solution, we may need to implement resizing the number of
hash table locks dynamically at runtime.

Change-Id: Ic7b9f951f58fe7190d8d0d23fb62979057e545ac
Reviewed-on: http://review.couchbase.org/48869
Reviewed-by: Michael Wiederhold <mike@couchbase.com>
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-on: http://review.couchbase.org/55705
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16421: BGFetch to restore an item that is non-resident 80/55680/3
abhinavdangeti [Thu, 1 Oct 2015 00:40:14 +0000 (17:40 -0700)]
MB-16421: BGFetch to restore an item that is non-resident

In a full eviction scenario, a bgfetch is to restore an item
not just if the hash table item is a temp-initial item, but
if the hash table item is non-resident as well.

Change-Id: I127b0cbe7034133a656b046cd4022635be23aedd
Reviewed-on: http://review.couchbase.org/55680
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16402: Ensure objectregistry has an engine when 94/55594/2
Jim Walker [Tue, 29 Sep 2015 09:04:31 +0000 (10:04 +0100)]
MB-16402:  Ensure objectregistry has an engine when
deleting the VBucketMemoryDeletionTask.

Ensure the VBucketMemoryDeletionTask is finished before shutting down
to avoid the vbucket deletion occuring on the task with no engine pointer
in thread-local storage (for ObjectRegistry).

This is a backport to 3.0.x of MB-14041

Change-Id: I63caf59bce0e89ed166bffcbd2d0965a91656725
Reviewed-on: http://review.couchbase.org/55594
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16310: Use metaKeyIndex for the chk_pt meta items 01/55401/2
Manu Dhundi [Mon, 21 Sep 2015 17:32:11 +0000 (10:32 -0700)]
MB-16310: Use metaKeyIndex for the chk_pt meta items

Fixing the bug in http://review.couchbase.org/#/c/55324/

Change-Id: I67f6c70e5d497a12ff70a836215d44a2f908d261
Reviewed-on: http://review.couchbase.org/55401
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
5 years agoMB-16310: Use separate key index for chk_pt meta keys and application keys 24/55324/4
Manu Dhundi [Fri, 18 Sep 2015 18:04:34 +0000 (11:04 -0700)]
MB-16310: Use separate key index for chk_pt meta keys and application keys

Checkpoint uses meta items "dummy_key", "checkpoint_start", "checkpoint_end".
If application tries to store keys with the same names, then we get runtime
errors. This solution addresses the problem by maintaining separate
key indices for the checkpoint meta keys and application keys.

Change-Id: I38aa22ac007bcfe4c9064b73930d08827604a923
Reviewed-on: http://review.couchbase.org/55324
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16093: Add logs for closing passive stream and stream end status 46/55146/3 v3.1.1
Manu Dhundi [Fri, 11 Sep 2015 00:07:08 +0000 (17:07 -0700)]
MB-16093: Add logs for closing passive stream and stream end status

Change-Id: I7afd17e6a86c90ffae302c13f3269a587da674ca
Reviewed-on: http://review.couchbase.org/55146
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16159: DCP consumer to explicitly notify memcached to get flow ctl buffer ack 96/54996/2
Manu Dhundi [Wed, 2 Sep 2015 20:47:19 +0000 (13:47 -0700)]
MB-16159: DCP consumer to explicitly notify memcached to get flow ctl buffer ack

In DCP consumer, when sufficient bytes are drained it is good to notify
memcached to get the flow control out. Previous method of waiting for Connection
Manager daemon task to notify memcached will cause delays while streaming
large items.

Change-Id: If71c9186f3062755d5c301817ec76f9f7eca5dc7
Reviewed-on: http://review.couchbase.org/54996
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16178: Removing sanity check while adding DCP on TAP 64/54964/7
abhinavdangeti [Wed, 2 Sep 2015 17:47:03 +0000 (10:47 -0700)]
MB-16178: Removing sanity check while adding DCP on TAP

Removing the sanity check which would prevent:
- Adding a DCP passive stream for a vbucket when a TAP consumer is
still connected.

Removing this sanity check as there is no real threat to the server
of it going into an inconsistent state during the upgrade.

Change-Id: If7643b2ebc21404dd4edb984718b322e411e28bc
Reviewed-on: http://review.couchbase.org/54964
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Aliaksey Artamonau <aliaksiej.artamonau@gmail.com>
5 years agoMB-16042: Do not process invalid snapshot markers 12/54712/5
abhinavdangeti [Thu, 20 Aug 2015 19:09:49 +0000 (12:09 -0700)]
MB-16042: Do not process invalid snapshot markers

Snapshot markers with start > end, are to be
considered as INVALID.

Change-Id: Ibe1922dc388992b830cec7687e0010e5fd26e982
Reviewed-on: http://review.couchbase.org/54712
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years ago[BP] MB-16045: Dcp Mutations/Deletions with seq number 0 are invalid 11/54711/5
abhinavdangeti [Thu, 20 Aug 2015 00:19:58 +0000 (17:19 -0700)]
[BP] MB-16045: Dcp Mutations/Deletions with seq number 0 are invalid

If mutations or deletions are received at a DCP consumer whose
sequence numbers are ZERO (malicious), they need to be dropped
and the error code returned is to be EINVAL.

Change-Id: I920bf969027fae912a5e86164d235d1110f7688b
Reviewed-on: http://review.couchbase.org/54711
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-16042: [BP] Handling erroneous DCP snapshot markers 10/54710/5
abhinavdangeti [Fri, 14 Aug 2015 20:24:05 +0000 (13:24 -0700)]
MB-16042: [BP] Handling erroneous DCP snapshot markers

A snapshot marker whose start seqno and end seqno
are both lesser than the last received mutation's
sequence number then, we're better off dropping it.

Change-Id: Ic33abae37eb164f212d4306f99c9029535dcb42c
Reviewed-on: http://review.couchbase.org/54710
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years ago[BP] MB-16044: Handling erroneous DCP mutations/deletions 09/54709/5
abhinavdangeti [Mon, 17 Aug 2015 17:47:45 +0000 (10:47 -0700)]
[BP] MB-16044: Handling erroneous DCP mutations/deletions

Adding some sanity-check code which ensures that
erroneous mutations and deletions are dropped when
sent in from an autonomous producer which could
inject assertions otherwise.

Change-Id: I48b68783314133e3cf3e1e5b77a61ee043e73115
Reviewed-on: http://review.couchbase.org/54709
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16131: [BP] Initialize last_seqno for a passive stream accurately 08/54708/4
abhinavdangeti [Fri, 14 Aug 2015 17:42:28 +0000 (10:42 -0700)]
MB-16131: [BP] Initialize last_seqno for a passive stream accurately

last_seqno for a passive stream is to point to the
vbucket high_seqno to ensure that erroneous packets
are handled correctly when received at the consumer.

Change-Id: I077ad5b2ca08c3d4bfb81237b46f259a60d3c4dc
Reviewed-on: http://review.couchbase.org/54708
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-16125: Do not wait for certain tasks to shutdown 06/54706/2
abhinavdangeti [Thu, 20 Aug 2015 18:08:09 +0000 (11:08 -0700)]
MB-16125: Do not wait for certain tasks to shutdown

- Access scanner
- Vbucket compaction

+ Additional refactoring in tasks.h for a parameter
to indicate its meaning - completeBeforeShutdown

Change-Id: I68ac8364177733559926f0ee87acd3d2852e3585
Reviewed-on: http://review.couchbase.org/54706
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-15926: Do not add passive stream if tap consumer exists & vice-versa 56/54556/6
abhinavdangeti [Wed, 19 Aug 2015 21:28:58 +0000 (14:28 -0700)]
MB-15926: Do not add passive stream if tap consumer exists & vice-versa

Do not allow the creation of a DCP passive stream for a vbucket
even if a tap consumer is still connected. Similarly, do not allow
the creation of a tap consumer if a DCP passive stream is still
connected for the vbucket.

Change-Id: Ie7ea059cb512ac550fece437a6526081a4ee3fdd
Reviewed-on: http://review.couchbase.org/54556
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-by: Sriram Ganesan <sriram@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-15926: Do not allow multiple passive streams for the same vbucket 54/54154/4
Manu Dhundi [Thu, 6 Aug 2015 01:44:44 +0000 (18:44 -0700)]
MB-15926: Do not allow multiple passive streams for the same vbucket

If there are multiple requests (across different DCP consumer conns)
to add stream for a vbucket, we honor only one (first) request
Change-Id: I488e23d69174f20f4913d072484420bc450f4168
Reviewed-on: http://review.couchbase.org/54154
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-15914: File deletions to be handled only by RW instances 50/54150/6
abhinavdangeti [Wed, 5 Aug 2015 23:17:40 +0000 (16:17 -0700)]
MB-15914: File deletions to be handled only by RW instances

Adding sanity checks to ensure that file deletions are
handled by only read-write instances of the underlying
store and not the read-only instances. Log warnings
when a read-only instance is denied permission to delete
a file.

Change-Id: I166e8a5f2664b7d40fc184ef70573a027c07715a
Reviewed-on: http://review.couchbase.org/54150
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-15914: [Backport] Don't find files in couchkvstore lazily 45/53945/3
Mike Wiederhold [Fri, 10 Oct 2014 22:51:11 +0000 (15:51 -0700)]
MB-15914: [Backport] Don't find files in couchkvstore lazily

We no longer need to do this because we now have full control of
the engine and this code path is much more deterministic. This
will also fix potential races in updating from dbFileRevMap.

Change-Id: I5c4b0552f279b1e7d0a695071ae503f464891b32
Reviewed-on: http://review.couchbase.org/43118
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Michael Wiederhold <mike@couchbase.com>
Reviewed-on: http://review.couchbase.org/53945
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-15837 Fix to tap_notify_set_vbucket_state unit test. 34/53834/3
Chiyoung Seo [Tue, 28 Jul 2015 23:22:18 +0000 (16:22 -0700)]
MB-15837 Fix to tap_notify_set_vbucket_state unit test.

Change-Id: Ia7af2243a41b86987a76f06b1b1e30dda6a479b0
Reviewed-on: http://review.couchbase.org/53834
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-15837 Generate a new vbucket UUID for TAP-based VBucket takeover. 84/53784/4
Chiyoung Seo [Tue, 28 Jul 2015 05:27:10 +0000 (22:27 -0700)]
MB-15837 Generate a new vbucket UUID for TAP-based VBucket takeover.

A new vbucket UUID should be generated for each TAP-based vbucket takeover
completion. Otherwise, it can cause potential data loss after fully
switching from TAP to DCP, because TAP-based replication doesn't synchronize
sequence numbers between active and replica vbuckets.

For more details, please refer to
https://issues.couchbase.com/browse/MB-15837

Change-Id: Id8931bd110417065b244f10c71e18d0b5d47f6d2
Reviewed-on: http://review.couchbase.org/53784
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
5 years agoMB-15826: Handle lower values of snap_st_seqno by DCP client more effectively 35/53635/6
Manu Dhundi [Sat, 25 Jul 2015 01:18:29 +0000 (18:18 -0700)]
MB-15826: Handle lower values of snap_st_seqno by DCP client more effectively

If a DCP client passes snap_st_seqno < start_seqno when
start_seqno == snap_end_seqno, then DCP can be more efficient by setting
snap_start_seqno = start_seqno.
Change-Id: Ie59cfed23e9e3855ef0eca6d3b609a53db65c36f
Reviewed-on: http://review.couchbase.org/53635
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Michael Wiederhold <mike@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-15609 Don't create a new checkpoint on replica vbucket for each snapshot 33/53433/3 v3.1.0
Chiyoung Seo [Mon, 20 Jul 2015 19:18:25 +0000 (12:18 -0700)]
MB-15609 Don't create a new checkpoint on replica vbucket for each snapshot

There was a regression on the disk write queue size that was caused by
the fix (http://review.couchbase.org/#/c/51682/). The main reason was that
less deduplication in the disk write queue was incurred by more frequent
checkpoint creations in replica vbuckets. To resolve this regression,
this change makes sure that a new checkpoint in a replica vbucket should be
created when the memory usage is getting under pressure.

Change-Id: I9db5e1336c9950387f9d492b864fc3c88333c379
Reviewed-on: http://review.couchbase.org/53433
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-15404: Update stat call in ep testsuite to track the items in DCP readyQ. 02/52602/4
Manu Dhundi [Fri, 26 Jun 2015 23:16:34 +0000 (16:16 -0700)]
MB-15404: Update stat call in ep testsuite to track the items in DCP readyQ.

Change-Id: Idc10ef1ba39d86062597307daf0c5db800407946
Reviewed-on: http://review.couchbase.org/52602
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
Tested-by: Manu Dhundi <manu@couchbase.com>
5 years agoMB-15404: Add stat to track the items in DCP readyQ. 17/52517/8
Manu Dhundi [Thu, 25 Jun 2015 23:50:52 +0000 (16:50 -0700)]
MB-15404: Add stat to track the items in DCP readyQ.

To better debug memory used by DCP, add stats to debug DCP readyQ.
Also add "lastReadSeqNo" (from the disk) to stats.
Change-Id: If0dbb64944549d846084219acc6a793218b4ad23
Reviewed-on: http://review.couchbase.org/52517
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-15367: Use simplejson instead of json for backward compatibility 16/52516/2
Bin Cui [Thu, 18 Jun 2015 16:45:47 +0000 (09:45 -0700)]
MB-15367: Use simplejson instead of json for backward compatibility

Change-Id: I602e04ba1e71f6a325ac8fec1684b67025b8e230
Reviewed-on: http://review.couchbase.org/52234
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Dave Rigby <daver@couchbase.com>
Reviewed-on: http://review.couchbase.org/52516
Reviewed-by: Bin Cui <bin.cui@gmail.com>
Tested-by: Bin Cui <bin.cui@gmail.com>
5 years agoMB-15413: Consumer to handle snapshot end correctly 02/52402/2
abhinavdangeti [Tue, 23 Jun 2015 00:33:08 +0000 (17:33 -0700)]
MB-15413: Consumer to handle snapshot end correctly

The DCPConsumer should handle the snapshot end correctly,
and create a new checkpoint only if the last mutation or
deletion returned an ENGINE_SUCCESS.

Change-Id: I52c437d6cf28fd9a8150de6770885a4a2308b34d
Reviewed-on: http://review.couchbase.org/52402
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoSet the logging level to WARNING if memUsed is too high 65/52265/4
abhinavdangeti [Thu, 18 Jun 2015 23:06:45 +0000 (16:06 -0700)]
Set the logging level to WARNING if memUsed is too high

Context: DCPBackfill

Change-Id: Ica0f730c6fea98f684f8809f194896ea937e2b1c
Reviewed-on: http://review.couchbase.org/52265
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-15292: Fix issue in module_tests/atomic_test 61/52261/2
abhinavdangeti [Thu, 18 Jun 2015 22:32:07 +0000 (15:32 -0700)]
MB-15292: Fix issue in module_tests/atomic_test

Change-Id: I8ef42c7bfd971260f36f2aaa13e40686a7b3dd82
Reviewed-on: http://review.couchbase.org/52261
Reviewed-by: Sriram Ganesan <sriram@couchbase.com>
Tested-by: buildbot <build@couchbase.com>
5 years agoMB-15292: Make CouchbaseAtomic::store() atomic 73/51973/4
Dave Rigby [Thu, 11 Jun 2015 18:56:49 +0000 (11:56 -0700)]
MB-15292: Make CouchbaseAtomic::store() atomic

As identified by ThreadSantizer:

WARNING: ThreadSanitizer: data race (pid=59833)
  Write of size 8 at 0x7d240000d3f0 by thread T8:
    #0 CouchbaseAtomic<unsigned long>::store(unsigned long const&, memory_order) /root/couchbase-3.0/ep-engine/src/atomic.h:84 (ep.so+0x0000000414ef)
    #1 CouchbaseAtomic<unsigned long>::operator=(unsigned long const&) /root/couchbase-3.0/ep-engine/src/atomic.h:105 (ep.so+0x0000000401f5)
    #2 Warmup::scheduleEstimateDatabaseItemCount() /root/couchbase-3.0/ep-engine/src/warmup.cc:500 (ep.so+0x000000277991)
    #3 Warmup::step() /root/couchbase-3.0/ep-engine/src/warmup.cc:816 (ep.so+0x000000275124)
    #4 Warmup::transition(int, bool) /root/couchbase-3.0/ep-engine/src/warmup.cc:853 (ep.so+0x0000002754ff)
    #5 Warmup::createVBuckets(unsigned short) /root/couchbase-3.0/ep-engine/src/warmup.cc:491 (ep.so+0x00000027785f)
    #6 WarmupCreateVBuckets::run() /root/couchbase-3.0/ep-engine/src/warmup.h:234 (ep.so+0x00000028cde9)
    #7 ExecutorThread::run() /root/couchbase-3.0/ep-engine/src/executorthread.cc:110 (ep.so+0x0000001a2581)
    #8 launch_executor_thread(void*) /root/couchbase-3.0/ep-engine/src/executorthread.cc:34 (ep.so+0x0000001a1a5a)
    #9 platform_thread_wrap /root/couchbase-3.0/platform/src/cb_pthreads.c:19 (libplatform.so.0.1.0+0x000000002f14)

  Previous atomic write of size 8 at 0x7d240000d3f0 by main thread (mutexes: write M670470284968504712):
    #0 __tsan_atomic64_fetch_add <null>:0 (engine_testapp+0x00000008cb48)
    #1 CouchbaseAtomic<unsigned long>::load(memory_order) const /root/couchbase-3.0/ep-engine/src/atomic.h:77 (ep.so+0x0000000446b4)
    #2 CouchbaseAtomic<unsigned long>::operator unsigned long() const /root/couchbase-3.0/ep-engine/src/atomic.h:101 (ep.so+0x000000044575)
    #3 Warmup::addStats(void (*)(char const*, unsigned short, char const*, unsigned int, void const*), void const*) const /root/couchbase-3.0/ep-engine/src/warmup.cc:901 (ep.so+0x00000027d4ea)
    #4 EventuallyPersistentEngine::getStats(void const*, char const*, int, void (*)(char const*, unsigned short, char const*, unsigned int, void const*)) /root/couchbase-3.0/ep-engine/src/ep_engine.cc:4405 (ep.so+0x0000001155a9)
    #5 EvpGetStats(engine_interface*, void const*, char const*, int, void (*)(char const*, unsigned short, char const*, unsigned int, void const*)) /root/couchbase-3.0/ep-engine/src/ep_engine.cc:214 (ep.so+0x0000000fa102)
    #6 mock_get_stats /root/couchbase-3.0/memcached/programs/engine_testapp/engine_testapp.c:194 (engine_testapp+0x0000000aeecd)
    #7 wait_for_warmup_complete(engine_interface*, engine_interface_v1*) /root/couchbase-3.0/ep-engine/tests/ep_test_apis.cc:864 (ep_perfsuite.so+0x00000001b1cb)
    #8 test_setup(engine_interface*, engine_interface_v1*) /root/couchbase-3.0/ep-engine/tests/ep_testsuite_common.cc:128 (ep_perfsuite.so+0x00000000dc03)
    #9 execute_test /root/couchbase-3.0/memcached/programs/engine_testapp/engine_testapp.c:1037 (engine_testapp+0x0000000ab85a)
    #10 main /root/couchbase-3.0/memcached/programs/engine_testapp/engine_testapp.c:1296 (engine_testapp+0x0000000a9a19)

Change-Id: I260942712ca471c0d2e0fa3ebc4793d694f9b237
Reviewed-on: http://review.couchbase.org/51973
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-15292: Make CouchbaseAtomic::load() atomic 68/51968/4
Dave Rigby [Thu, 11 Jun 2015 18:36:44 +0000 (11:36 -0700)]
MB-15292: Make CouchbaseAtomic::load() atomic

Identified by ThreadSanitizer running ep_perfsuite.so:

WARNING: ThreadSanitizer: data race (pid=51118)
  Atomic write of size 1 at 0x7d4400009d58 by main thread (mutexes: write M5599):
    #0 __tsan_atomic8_compare_exchange_val <null>:0 (engine_testapp+0x000000093f50)
    #1 CouchbaseAtomic<bool>::compare_exchange_strong(bool&, bool, memory_order) /root/couchbase-3.0/ep-engine/src/atomic.h:92 (ep.so+0x00000005575d)
    #2 Flusher::notifyFlushEvent() /root/couchbase-3.0/ep-engine/src/flusher.h:88 (ep.so+0x0000000c6fec)
    #3 EventuallyPersistentStore::queueDirty(RCPtr<VBucket>&, StoredValue*, LockHolder*, bool, bool, bool) /root/couchbase-3.0/ep-engine/src/ep.cc:2826 (ep.so+0x00000009c796)
    #4 EventuallyPersistentStore::add(Item const&, void const*) /root/couchbase-3.0/ep-engine/src/ep.cc:728 (ep.so+0x00000009f673)
    #5 EventuallyPersistentEngine::store(void const*, void*, unsigned long*, ENGINE_STORE_OPERATION, unsigned short) /root/couchbase-3.0/ep-engine/src/ep_engine.cc:2135 (ep.so+0x000000100980)
    #6 EvpStore(engine_interface*, void const*, void*, unsigned long*, ENGINE_STORE_OPERATION, unsigned short) /root/couchbase-3.0/ep-engine/src/ep_engine.cc:229 (ep.so+0x0000000fa21d)
    #7 mock_store /root/couchbase-3.0/memcached/programs/engine_testapp/engine_testapp.c:227 (engine_testapp+0x0000000ade2e)
    #8 storeCasVb11(engine_interface*, engine_interface_v1*, void const*, ENGINE_STORE_OPERATION, char const*, char const*, unsigned long, unsigned int, void**, unsigned long, unsigned short, unsigned int, unsigned char) /root/couchbase-3.0/ep-engine/tests/ep_test_apis.cc:654 (ep_perfsuite.so+0x000000018ec3)
    #9 perf_latency(engine_interface*, engine_interface_v1*, char const*) /root/couchbase-3.0/ep-engine/tests/ep_perfsuite.cc:104 (ep_perfsuite.so+0x0000000097e2)
    #10 perf_latency_baseline(engine_interface*, engine_interface_v1*) /root/couchbase-3.0/ep-engine/tests/ep_perfsuite.cc:169 (ep_perfsuite.so+0x0000000090b7)
    #11 execute_test /root/couchbase-3.0/memcached/programs/engine_testapp/engine_testapp.c:1042 (engine_testapp+0x0000000ab933)
    #12 main /root/couchbase-3.0/memcached/programs/engine_testapp/engine_testapp.c:1296 (engine_testapp+0x0000000a9a19)

  Previous read of size 1 at 0x7d4400009d58 by thread T20:
    #0 CouchbaseAtomic<bool>::load(memory_order) const /root/couchbase-3.0/ep-engine/src/atomic.h:79 (ep.so+0x00000005288c)
    #1 Flusher::canSnooze() /root/couchbase-3.0/ep-engine/src/flusher.h:104 (ep.so+0x00000018e92b)
    #2 Flusher::computeMinSleepTime() /root/couchbase-3.0/ep-engine/src/flusher.cc:239 (ep.so+0x00000018dc87)
    #3 Flusher::step(GlobalTask*) /root/couchbase-3.0/ep-engine/src/flusher.cc:187 (ep.so+0x00000018cb35)

Change-Id: Ie32ca8fa4e662e1244362cbdb0cb2573f80665e2
Reviewed-on: http://review.couchbase.org/51968
Tested-by: buildbot <build@couchbase.com>
Reviewed-by: Jim Walker <jim@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-14256: Add method to get all sequence numbers 58/51858/4
Trond Norbye [Wed, 15 Apr 2015 08:55:28 +0000 (10:55 +0200)]
MB-14256: Add method to get all sequence numbers

This is a backport of commit cb1bd68b8b771b86d7da310f5b42a9ca417570d0 [1]
from 4.x to 3.x which makes `getAllVBucketSequenceNumbers` C++03 compatible.

[1]: https://github.com/couchbase/ep-engine/commit/cb1bd68b8b771b86d7da310f5b42a9ca417570d0

Conflicts:
        src/ep_engine.cc
src/ep_engine.h

Change-Id: I1c3181b249035f75f9afe891b049304114e9e496
Reviewed-on: http://review.couchbase.org/51858
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-15213: Disable DCP flow control if buffer size passed by client is zero 06/51706/2
Manu Dhundi [Wed, 3 Jun 2015 01:21:42 +0000 (18:21 -0700)]
MB-15213: Disable DCP flow control if buffer size passed by client is zero

We have documented that if DCP client sets flow control buffer size to zero
the DCP producer would not do flow control. So if the flow control buffer
size is set to zero, the producer does not start flow control or disables the
flow control setup before.

Change-Id: I8746c7b65e82f59c268ed4aa6081b35d04571006
Reviewed-on: http://review.couchbase.org/51706
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Manu Dhundi <manu@couchbase.com>
5 years agoMB-15206: Check and add new checkpoint upon receiving snapshot end 82/51682/3
abhinavdangeti [Tue, 2 Jun 2015 18:44:14 +0000 (11:44 -0700)]
MB-15206: Check and add new checkpoint upon receiving snapshot end

In the DCP consumer, upon receiving a snapshot end message,
check and add a new checkpoint for the replica vbucket, to
ensure that items that are in the current checkpoint do not
take up a lot of memory. Any old unreferenced checkpoints
will be removed by the checkpoint-remover and item-pager
daemons.

Change-Id: I9eb61fb9e71661e4245de9f92f595a9300abffb9
Reviewed-on: http://review.couchbase.org/51682
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
5 years agoMB-14496 Fix to the type mismatch bug in a file's rev number. 90/50390/2
Chiyoung Seo [Wed, 29 Apr 2015 17:18:39 +0000 (10:18 -0700)]
MB-14496 Fix to the type mismatch bug in a file's rev number.

Change-Id: I3e455d6cee9ac24651510badc116a8284418b118
Reviewed-on: http://review.couchbase.org/50390
Reviewed-by: Sundararaman Sridharan <sundar@couchbase.com>
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years ago[Backport] MB-14374: Properly find the latest rollback point 10/50010/3
Mike Wiederhold [Fri, 10 Apr 2015 19:06:20 +0000 (12:06 -0700)]
[Backport] MB-14374: Properly find the latest rollback point

If we have received a full snapshot on disk then we want to use the
snapshot end sequence number, but if we are in the middle of a
snapshot then we want to use the snapshot start sequence number. We
can figure out what to use by checking the last sequence number
persisted.

Change-Id: I4da5f8078e5021c22ba28ca5c8ff8f1ece44731e
Reviewed-on: http://review.couchbase.org/50010
Reviewed-by: Michael Wiederhold <mike@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-14003: Set the bySeqno of the HashTable item correctly 60/48560/6
abhinavdangeti [Fri, 20 Mar 2015 20:31:52 +0000 (13:31 -0700)]
MB-14003: Set the bySeqno of the HashTable item correctly

Context: addTAPBackfillItem

Change-Id: I825635cb50b4130dca311ee247cf157c09a76d92
Reviewed-on: http://review.couchbase.org/48560
Reviewed-by: Michael Wiederhold <mike@couchbase.com>
Reviewed-by: Sundararaman Sridharan <sundar@couchbase.com>
Reviewed-by: Sriram Ganesan <sriram@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
5 years ago[BP] MB-13359: Get DCP next snapshot if there are meta items, but no mutations 62/47962/2
Mike Wiederhold [Fri, 13 Feb 2015 02:23:54 +0000 (18:23 -0800)]
[BP] MB-13359: Get DCP next snapshot if there are meta items, but no mutations

If we only get a snapshot end message when we call getItemsForCursor
then we will consider the snapshot to be empty and pause the stream.
Since a snapshot end message can only be in a closed checkpoint this
means that it's possible that we are pausing the stream for no
reason.

Change-Id: I29b8603287b41401fd6f5c1e4d4f185611d5b583
Reviewed-on: http://review.couchbase.org/47087
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Tested-by: Michael Wiederhold <mike@couchbase.com>
Reviewed-on: http://review.couchbase.org/47962
Tested-by: abhinav dangeti <abhinav@couchbase.com>
Reviewed-by: Michael Wiederhold <mike@couchbase.com>
5 years agoMB-13757: Acquire snapshot lock before updating highSeq & snapshot seqs 11/47811/5
abhinavdangeti [Fri, 6 Mar 2015 01:24:09 +0000 (17:24 -0800)]
MB-13757: Acquire snapshot lock before updating highSeq & snapshot seqs

Acquire snapshot lock before updating highSeqno and then the snapshot
sequence numbers for TAP, to avoid the flusher racing with queueDirty
where only highSeqno is updated but not the snapshot sequence numbers.

Change-Id: I2d7c5932c1d4bda316047236218f4ca9336946a3
Reviewed-on: http://review.couchbase.org/47811
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
5 years agoMB-13792: Acquire lock before changing checkpointManager's lastBySeqno 65/47665/5
abhinavdangeti [Thu, 5 Mar 2015 20:23:19 +0000 (12:23 -0800)]
MB-13792: Acquire lock before changing checkpointManager's lastBySeqno

Change-Id: I55e381418b0a5b89704f2e9912caabfa4df8d15c
Reviewed-on: http://review.couchbase.org/47665
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-by: Michael Wiederhold <mike@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
5 years agoMerge remote-tracking branch 'gerrit/3.0.1' into 3.0.3 48/47248/2
abhinavdangeti [Tue, 24 Feb 2015 02:20:49 +0000 (18:20 -0800)]
Merge remote-tracking branch 'gerrit/3.0.1' into 3.0.3

+ MB-13386: Ensure that purging the highSeqno doesn't happen

Change-Id: Ieacb1d86d747a275181bb238145fd206a14be140

5 years agoMB-13386: Ensure that purging the highSeqno doesn't happen 41/47241/5 3.0.1
abhinavdangeti [Tue, 24 Feb 2015 01:04:19 +0000 (17:04 -0800)]
MB-13386: Ensure that purging the highSeqno doesn't happen

+ The highest seqno is to not be purged for DCP.
+ This change ensures that items are still queued
for deletion if found to be expired.

Change-Id: I8102d5f61989523efc4f39b70b225c05cdd1b128
Reviewed-on: http://review.couchbase.org/47241
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Reviewed-by: Michael Wiederhold <mike@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
5 years agoFix build breakage on ubuntu 07/47107/2
abhinavdangeti [Fri, 20 Feb 2015 02:38:14 +0000 (18:38 -0800)]
Fix build breakage on ubuntu

Change-Id: Ic8d7b9c17086a612b55e2c84a26259081d29c8d4
Reviewed-on: http://review.couchbase.org/47107
Reviewed-by: Manu Dhundi <manu@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
5 years agoMB-13479: Indicate rollback to DCP clients if purge_seqno > snap_start_seqno 94/47094/4
Manu Dhundi [Thu, 19 Feb 2015 23:10:31 +0000 (15:10 -0800)]
MB-13479: Indicate rollback to DCP clients if purge_seqno > snap_start_seqno

The replica may not get all the items if there are purged items in the active
node. Hence this solution proposes to indicate a rollback to seqno 0 in case
the purge_seqno > snap_start_seqno.
Note: Since the occurrence of this scenario is pretty rare, rolling back to 0
may not be a bad thing.

Change-Id: I5c8403115110be136df5d4cb4e2704edc2a4c9e4
Reviewed-on: http://review.couchbase.org/47094
Tested-by: Manu Dhundi <manu@couchbase.com>
Reviewed-by: Michael Wiederhold <mike@couchbase.com>
5 years agoMerge remote-tracking branch 'couchbase/3.0.1' into couchbase/3.0.3 41/47041/1
Manu Dhundi [Thu, 19 Feb 2015 01:11:30 +0000 (17:11 -0800)]
Merge remote-tracking branch 'couchbase/3.0.1' into couchbase/3.0.3

Change-Id: I3d214e5b9b7a60831dcb685a6ef85a4ee0996f3e

5 years agoMB-13386: Do not purge the item with highest sequence number from db. 35/47035/2
Manu Dhundi [Wed, 18 Feb 2015 23:07:18 +0000 (15:07 -0800)]
MB-13386: Do not purge the item with highest sequence number from db.

When the highest seq number that DCP is supposed to read from
the db is the last seq number in the db and is purged,
DCP backfill does not know about it and hence waits for it.
This results in DCP connection hang.
To solve this problem, during compaction we do not purge the
item with last(highest) sequence number in the db.

Change-Id: Ib83c335da3f7c0a952e4b760309276f73bff4ccf
Reviewed-on: http://review.couchbase.org/47035
Reviewed-by: Michael Wiederhold <mike@couchbase.com>
Tested-by: Manu Dhundi <manu@couchbase.com>
5 years agoMB-10370: Replications' set/delWithMetas to use replication threshold 07/46907/3
abhinavdangeti [Mon, 16 Feb 2015 23:55:42 +0000 (15:55 -0800)]
MB-10370: Replications' set/delWithMetas to use replication threshold

The setWithMetas and deleteWithMetas issue by consumers for
intra-cluster replication will need to work on tapThrottleThreshold
as opposed to mutation_memory_threshold.

Change-Id: I576c9f9961e03e70430d58a192854c6faa14156d
Reviewed-on: http://review.couchbase.org/46907
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
5 years agoMB-11527: Allow tuning of compaction_write_queue_cap 42/46642/3
abhinavdangeti [Wed, 11 Feb 2015 01:02:21 +0000 (17:02 -0800)]
MB-11527: Allow tuning of compaction_write_queue_cap

Configuration parameter: compaction_write_queue_cap
Can be set in runtime through cbepctl

Change-Id: Id35899865509f5de13a9565802aeec7f84a71f3d
Reviewed-on: http://review.couchbase.org/46642
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
5 years agoMB-13205: Setting appropriate memory thresholds 39/46639/4
abhinavdangeti [Tue, 10 Feb 2015 03:16:25 +0000 (19:16 -0800)]
MB-13205: Setting appropriate memory thresholds

mutation_mem_threshold: 93%
backfill_mem_threshold: 96%
replication_mem_threhold: 99%

Change-Id: Ic2abb80266c1e32d55e21e5359fe0ce99e53551f
Reviewed-on: http://review.couchbase.org/46639
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
5 years agoConditionally delete expired items during compaction 38/46638/4
abhinavdangeti [Tue, 10 Feb 2015 19:59:00 +0000 (11:59 -0800)]
Conditionally delete expired items during compaction

Delete expired items during compaction if and only if
memory usage is lesser than threshold
(compaction_exp_mem_threshold) and diskqueue size is
lesser than tap_throttle_queue_cap.

Change-Id: I256b127b32050dc0a1e395cacb369353a2fe0565
Reviewed-on: http://review.couchbase.org/46638
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
5 years ago[Backport] Disable the disk write queue size cap for replication backoff. 14/46614/3
Chiyoung Seo [Wed, 4 Feb 2015 06:44:34 +0000 (22:44 -0800)]
[Backport] Disable the disk write queue size cap for replication backoff.

The disk write queue size cap for replication backoff was determined in
the very early versions of Couchbase Server that were deployed in
small-to-medium sized cluster with spining disks.

In our recent benchmark results, it turns out that disabling or setting
the disk write queue cap to a large value shows much better performance
behaviors in large-scale clusters with SSDs. As Couchbase Server is
deployed in such environments more and more, this change disables the
write queue size cap, but still supports a replication backoff by
checking the memory usage.

Note that the disk write queue size cap is still configurable at runtime.

Change-Id: Iedf711ad1c3bef61ca954f83f802b4a647b9ec88
Reviewed-on: http://review.couchbase.org/46367
Reviewed-by: Michael Wiederhold <mike@couchbase.com>
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-on: http://review.couchbase.org/46614
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
5 years ago[Backport] MB-13286: New DcpProducer to have paused status set to true 11/46511/2
abhinavdangeti [Wed, 4 Feb 2015 19:15:52 +0000 (11:15 -0800)]
[Backport] MB-13286: New DcpProducer to have paused status set to true

DcpOpen will create a new DcpProducer and if one with a
similar name already exists (e.g - during rebalance),
this new producer will replace the older one, and the
paused status will need to be set to true, to let the
notification for the new connection to be sent to
memcached.

Change-Id: I40c22601a7d29141741608339c58caa486a698b2
Reviewed-on: http://review.couchbase.org/46374
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
Reviewed-on: http://review.couchbase.org/46511

6 years agoMB-12673: Check for new checkpoint items after takeoverSend 18/43918/2 v3.0.2
abhinavdangeti [Wed, 3 Dec 2014 03:35:25 +0000 (19:35 -0800)]
MB-12673: Check for new checkpoint items after takeoverSend

Check for any new checkpoint items when the readyQ is found to
be empty in the takeoverSendPhase, before setting the old
vbucket to dead and the new vbucket to active.

Change-Id: I2a6ddacc711f5db42a1e3c575ae18d0b2b3126bd
Reviewed-on: http://review.couchbase.org/43918
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
6 years agoMB-12751: Return KEY EEXISTS for locked objects 25/43625/2
Trond Norbye [Mon, 24 Nov 2014 21:57:29 +0000 (22:57 +0100)]
MB-12751: Return KEY EEXISTS for locked objects

Change-Id: I5eb8a24337a81fed861cc8140bbdcf1a132acd90
Reviewed-on: http://review.couchbase.org/43625
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
Tested-by: Trond Norbye <trond.norbye@gmail.com>
6 years agoMB-12647: Ensure CAS value will always be unique 71/43271/3
Sriram Ganesan [Fri, 14 Nov 2014 22:45:58 +0000 (14:45 -0800)]
MB-12647: Ensure CAS value will always be unique

In windows, the gethrtime() returns the same timestamp when 2 requests
are made in a very short interval, thus causing the same CAS to return.
This can race conditions with 2 consecutive requests, resulting in
data corruption.

Change-Id: I4f396de3f14129504ca406ebb8d4c7a9f3a89bd8
Reviewed-on: http://review.couchbase.org/43271
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Chiyoung Seo <chiyoung@couchbase.com>
6 years agoMB-12576: Limit writer threads to 4 60/43060/7
Sundar Sridharan [Tue, 11 Nov 2014 06:59:44 +0000 (22:59 -0800)]
MB-12576: Limit writer threads to 4

Having more than 4 writers increases bgfetch latencies in DGM
This change selectively reverts 32a166c511d7b242433011a875402e1278300add
Change-Id: Icdb996622237747e759c52751f2c8e613c9ba262
Reviewed-on: http://review.couchbase.org/43060
Tested-by: Sundararaman Sridharan <sundar@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
6 years agoMB-12562: Avoid terminating all writer threads 80/42880/6
abhinavdangeti [Thu, 6 Nov 2014 05:10:29 +0000 (21:10 -0800)]
MB-12562: Avoid terminating all writer threads

With incremental writer thread scheduling, we
need to make sure that all writer threads aren't
deleted before deleting all buckets.

Scenario
- maxWriters is 4
- create bucket 1, numWriters=1
- create bucket 2, numWriters=2
- create bucket 3, numWriters=3
- create bucket 4, numWriters=4
- create bucket 5, numWriters=4
- delete bucket 5, numWriters=3 =>with change, 4
- delete bucket 4, numWriters=2 =>with change, 3
- delete bucket 3, numWriters=1 =>with change, 2
- delete bucket 2, numWriters=0 =>with change, 1
- delete bucket 1, numWriters=0 =>with change, 0

Change-Id: Ib9d45f7acb9177924612547538aa74ca9dd49c20
Reviewed-on: http://review.couchbase.org/42880
Reviewed-by: Sundararaman Sridharan <sundar@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
6 years agoMB-12551: Add all arguments to noop log message 62/42862/2
Mike Wiederhold [Thu, 6 Nov 2014 00:58:19 +0000 (16:58 -0800)]
MB-12551: Add all arguments to noop log message

Change-Id: Id6e8aac2bf9e398dc8c929eb122651884d27b7c1
Reviewed-on: http://review.couchbase.org/42862
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Michael Wiederhold <mike@couchbase.com>
6 years agoMB-12483: Make sure vbucket_state initializes all variables 29/42729/3
Mike Wiederhold [Mon, 3 Nov 2014 22:00:15 +0000 (14:00 -0800)]
MB-12483: Make sure vbucket_state initializes all variables

We need to make sure that this data is always completely filled
out so that we don't accidentally write garbage data to disk.

Change-Id: I196e7ca9f5bada8e0df90ddb01b6e952650bed56
Reviewed-on: http://review.couchbase.org/42729
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Tested-by: Michael Wiederhold <mike@couchbase.com>
6 years agoMB-12483: Remove the default constructor in the vbucket_state struct 28/42728/2
Mike Wiederhold [Mon, 3 Nov 2014 20:19:04 +0000 (12:19 -0800)]
MB-12483: Remove the default constructor in the vbucket_state struct

The default constructor can leave uninitialized fields and we persist
this structure to disk. In oreder to prevent garbage from being written
we should remove the default constructor.

Change-Id: I028fac1dc112bb454779a30f695eb180278455df
Reviewed-on: http://review.couchbase.org/42728
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Michael Wiederhold <mike@couchbase.com>
6 years agoMB-12483: Don't allow recreation during vbucket deletion 77/42677/2
Mike Wiederhold [Fri, 31 Oct 2014 18:44:12 +0000 (11:44 -0700)]
MB-12483: Don't allow recreation during vbucket deletion

When we delete a vbucket we have an option to recreate the file
immediately. Doing this is incorrect because we will not know
what the failover log of the new vbucket looks like until we
actually create it in memory. This can lead to a situation where
there is no failover log and as a result the local doc json will
be invalid. If the server is shutdown right after this happens and
then is restarted the vbuckets might be created with garbage values
in some of their fields.

Change-Id: I70e6335af68746aeac49a336da5e33b70dfcfe0e
Reviewed-on: http://review.couchbase.org/42677
Reviewed-by: Sundararaman Sridharan <sundar@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Michael Wiederhold <mike@couchbase.com>
6 years agoMB-12279: Incrementally spawn writer threads for buckets 52/42552/3
Sundar Sridharan [Tue, 28 Oct 2014 23:02:16 +0000 (16:02 -0700)]
MB-12279: Incrementally spawn writer threads for buckets

This change is needed to mitigate the high bgfetch latency
observed in heavy Data-Greater-than-Memory scenarios because
having high number of writing threads slows down disk read
performance.
Also mitigates MB-11143 slowdown in single HDD case

NOTE: This may slow down disk persistence in fast SSDs by default
NOTE: cbepctl can still be used to dynamically tune writers at runtime

Change-Id: Iddf0d3094f38b66ba8c0e09d6d6a307d15b38d56
Reviewed-on: http://review.couchbase.org/42552
Tested-by: Sundararaman Sridharan <sundar@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
6 years agoMB-12305: Don't update the snap start/end seqno in tap mutation 03/42503/3
Mike Wiederhold [Thu, 23 Oct 2014 17:57:48 +0000 (10:57 -0700)]
MB-12305: Don't update the snap start/end seqno in tap mutation

We do this in the queueDirty function so this code is not needed. It
also appeared to cause a race in updating the snapshot start and end
sequence numbers so removing it should solve the bug linked above.

Change-Id: Ia8fa36df958be9147ea208ba9ebd78496048ebb4
Reviewed-on: http://review.couchbase.org/42503
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Michael Wiederhold <mike@couchbase.com>
6 years agoMB-12398: log incorrect timeout values 87/42387/2
Trond Norbye [Thu, 23 Oct 2014 12:31:12 +0000 (14:31 +0200)]
MB-12398: log incorrect timeout values

The loglevel DEBUG won't appear in any logs, so customers using
this in production won't get the information anywhere (the server
just change their requested lock time without telling the user).

I guess a better behavior here would be to return ERANGE in
this case, but that would potentially break user applications
and isn't something we should do in a patch revision.

Change-Id: Ib6a37deceb2755b9fea53b0542a13fb2e16a3261
Reviewed-on: http://review.couchbase.org/42387
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
6 years agoHide the -a argument in cbstats 97/42297/4
Mike Wiederhold [Mon, 20 Oct 2014 20:51:10 +0000 (13:51 -0700)]
Hide the -a argument in cbstats

The -a argument is used by cbcollectinfo so we can't just
get rid of it.

Change-Id: I0f04c2d454c117271b44176df904e90a938965e9
Reviewed-on: http://review.couchbase.org/42297
Reviewed-by: Michael Wiederhold <mike@couchbase.com>
Tested-by: Michael Wiederhold <mike@couchbase.com>
6 years agoMB-12226: Handle ENOENT error during unlink 73/42173/3
Sriram Ganesan [Wed, 15 Oct 2014 21:12:06 +0000 (14:12 -0700)]
MB-12226: Handle ENOENT error during unlink

If the file is not found during an unlink, it shouldn't be added
to the pending file deletions queue.

Change-Id: Ief306277bbbc946ae18e39dd4819f811f12ea76c
Reviewed-on: http://review.couchbase.org/42173
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Sriram Ganesan <sriram@couchbase.com>
6 years agoMB-12271: Set the default dcp producer noop interval to 20 seconds 36/42136/4
Mike Wiederhold [Tue, 14 Oct 2014 18:10:21 +0000 (11:10 -0700)]
MB-12271: Set the default dcp producer noop interval to 20 seconds

This is a fix to support backwards compatibility between 3.0 and
3.0.1+ versions of Couchbase. The problem is that 3.0 has a noop
interval of 20 seconds by default and 3.0.1 has a default noop
interval of 200 seconds. In 3.0.1 the consumer explicitly sets
the noop interval of the producer so in 3.0.1+ clusters the 20
second default will be overriden by the consumer, but in the
scenario when a 3.0.1+ node connects to a 3.0 node the 20 second
default will remain.

Change-Id: I2e18e9ad68037f3a82abe5167f2bca89f381f318
Reviewed-on: http://review.couchbase.org/42136
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Michael Wiederhold <mike@couchbase.com>
6 years agoMB-11454: Add an option to output json in cbstats 84/42084/2
Mike Wiederhold [Mon, 13 Oct 2014 17:56:14 +0000 (10:56 -0700)]
MB-11454: Add an option to output json in cbstats

Change-Id: I0adcb16adde82ddf3671e4edc0118061d684f2d4
Reviewed-on: http://review.couchbase.org/42084
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
Tested-by: Michael Wiederhold <mike@couchbase.com>
6 years agoMB-12271: Set noop interval individually for dcp connections 49/42049/3
Mike Wiederhold [Sat, 11 Oct 2014 00:00:53 +0000 (17:00 -0700)]
MB-12271: Set noop interval individually for dcp connections

We need to do this because it might be the case that two servers
have their respective noop intervals set to different values. If
they are this can cause the connections to be disconnected because
each side is expecting to see a noop at different times.

Change-Id: I6ff475ccba407547e7285fa431b86ad9bf9cdc24
Reviewed-on: http://review.couchbase.org/42049
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Michael Wiederhold <mike@couchbase.com>
6 years agoAccount for incorrect entries through cbepctl 83/42083/3
abhinavdangeti [Mon, 13 Oct 2014 17:44:14 +0000 (10:44 -0700)]
Account for incorrect entries through cbepctl

Context: access_scanner_enabled
Relates to: http://review.couchbase.org/#/c/40884/

Change-Id: I35134c91ed03f6ba6093cfd71270484beca1a4cd
Reviewed-on: http://review.couchbase.org/42083
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
6 years agoMB-11642: Change the priority based on the type of dcp connection 58/41758/3
Mike Wiederhold [Tue, 23 Sep 2014 23:06:48 +0000 (16:06 -0700)]
MB-11642: Change the priority based on the type of dcp connection

Change-Id: I1b9a6846879385308bee3920bfa182fc41e39b4f
Reviewed-on: http://review.couchbase.org/41758
Reviewed-by: Sundararaman Sridharan <sundar@couchbase.com>
Tested-by: Michael Wiederhold <mike@couchbase.com>
6 years agouse optparse instead of arg_parse cbvdiff for compatibility 52/41752/4
Sundar Sridharan [Mon, 29 Sep 2014 21:32:52 +0000 (14:32 -0700)]
use optparse instead of arg_parse cbvdiff for compatibility

Change-Id: I604354823c71b91a167646c01e2bb6ea9d8c8822
Reviewed-on: http://review.couchbase.org/41752
Tested-by: Sundararaman Sridharan <sundar@couchbase.com>
Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
6 years agoMB-12268: Adding a debug log, when clusterConfig is updated 46/41746/2
abhinavdangeti [Mon, 29 Sep 2014 18:38:23 +0000 (11:38 -0700)]
MB-12268: Adding a debug log, when clusterConfig is updated

Change-Id: I0cbb70789f84bddc644bdc4ee59c1e86e1821220
Reviewed-on: http://review.couchbase.org/41746
Reviewed-by: Sundararaman Sridharan <sundar@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
6 years agoMB-12160: Use appropriate CAS for comparison with locked items 94/41594/8
Sriram Ganesan [Tue, 23 Sep 2014 23:27:16 +0000 (16:27 -0700)]
MB-12160: Use appropriate CAS for comparison with locked items

In the case of setWithMeta/deleteWithMeta commands, the locked item's
CAS value needs to be compared with the incoming mutation's CAS.

Change-Id: Id12a3c4717b18bc41c3d4b7dded99ea215179e9d
Reviewed-on: http://review.couchbase.org/41594
Tested-by: Sriram Ganesan <sriram@couchbase.com>
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
6 years agoAccount for keylength while allocating buffer for allKeys 95/41595/3
abhinavdangeti [Wed, 24 Sep 2014 00:17:21 +0000 (17:17 -0700)]
Account for keylength while allocating buffer for allKeys

While increasing the buffersize (when needed) when the
AllKeysAPI is invoked, we will need to account for two
additional bytes while allocating the key, for its length.

Change-Id: Iba68c4ae7bccae20f97d4e98350d5105093c3487
Reviewed-on: http://review.couchbase.org/41595
Reviewed-by: Sundararaman Sridharan <sundar@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
6 years agoMB-12117: Release hashtable locks before disk IO 72/41572/4
abhinavdangeti [Wed, 24 Sep 2014 18:29:10 +0000 (11:29 -0700)]
MB-12117: Release hashtable locks before disk IO

While in access log generation, we'll need to release
all hashtable partition locks before we create new
entries in the mutation log.

Change-Id: Ic3dd0a02452b51ee742e30a0f268b86f9ab6205b
Reviewed-on: http://review.couchbase.org/41572
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
6 years agoMB-11999: Preferential loading of active/replica vbuckets 68/41468/3
abhinavdangeti [Thu, 18 Sep 2014 22:44:32 +0000 (15:44 -0700)]
MB-11999: Preferential loading of active/replica vbuckets

During warmup, order the vbucket list such that
active vbuckets get 60% preference while replica
vbuckets get 40% preference.

Example:
In a 4 node cluster (DGM), approx. resident ratios:
1. Before warmup
    active:     36%     34%     34%     41%
    replica:    33%     35%     35%     27%
2. After warmup
    active:     40%     42%     41%     46%
    replica:    31%     29%     29%     24%

Change-Id: I60e0427bca58530247086d730135ebb4be70bb84
Reviewed-on: http://review.couchbase.org/41468
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: abhinav dangeti <abhinav@couchbase.com>
6 years agoMerge remote-tracking branch 'gerrit/2.5.0' into 3.0.1 60/41460/2
Mike Wiederhold [Wed, 17 Sep 2014 19:32:20 +0000 (12:32 -0700)]
Merge remote-tracking branch 'gerrit/2.5.0' into 3.0.1

Change-Id: I59c45877fc783e30c86dcfd7d303c731cde433fc

6 years agoFix deadlock in checkpoint persistence command 63/41363/2
Mike Wiederhold [Thu, 11 Sep 2014 18:46:00 +0000 (11:46 -0700)]
Fix deadlock in checkpoint persistence command

We need to release the hpChkMutex before notifying memcached in
order to prevent a deadlock. The deadlock occurs when the flusher
trys to notify memcached of checkpoint persistence at the same
time a memcached worker thread is trying to add a new checkpoint
persistence request.

Change-Id: Ida313f5b39ef0e063dee9882410cd0a19ce55292
Reviewed-on: http://review.couchbase.org/41363
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Michael Wiederhold <mike@couchbase.com>
6 years agoMerge remote-tracking branch 'gerrit/3.0' into 3.0.1 18/41318/1
Mike Wiederhold [Wed, 10 Sep 2014 18:34:38 +0000 (11:34 -0700)]
Merge remote-tracking branch 'gerrit/3.0' into 3.0.1

Change-Id: I195452c3d0684dff5ff673efd00a18b9d70da3cf

6 years agoMB-12137: Don't update the current snapshot during persistence 89/41289/2 3.0 v3.0.0
Mike Wiederhold [Tue, 9 Sep 2014 21:31:07 +0000 (14:31 -0700)]
MB-12137: Don't update the current snapshot during persistence

If we update the current snapshot only when we persist items to
disk then we may miss out on updates to the current snapshot
when there are vbucket state changes.

Change-Id: Iea3139afb669bdf32e6b4d98e8b3515cafe578dc
Reviewed-on: http://review.couchbase.org/41289
Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
Tested-by: Michael Wiederhold <mike@couchbase.com>
6 years agoFix build breakage on Linux 91/41291/2
Sriram Ganesan [Tue, 9 Sep 2014 23:25:11 +0000 (16:25 -0700)]
Fix build breakage on Linux

Remove usage of constant strings for maintaining file deletion queue

Change-Id: I1992343ff65f923b6d3ffbf7a25932b936e3ff7c
Reviewed-on: http://review.couchbase.org/41291
Reviewed-by: Sundararaman Sridharan <sundar@couchbase.com>
Tested-by: Sriram Ganesan <sriram@couchbase.com>
6 years agoHandle unlink/remove failures 68/41268/5
Sriram Ganesan [Fri, 5 Sep 2014 23:01:46 +0000 (16:01 -0700)]
Handle unlink/remove failures

The unlink/remove function can fail if there is another process that
has an open file handle on that file. In such cases, we need to retry
the unlink periodically in the flusher task until we get rid of those
files.

Change-Id: I4bfcf29b3fa866ec4946db658a245c722f3725ce
Reviewed-on: http://review.couchbase.org/41268
Tested-by: Sriram Ganesan <sriram@couchbase.com>
Reviewed-by: Sundararaman Sridharan <sundar@couchbase.com>