1 # Eventually Persistent Engine
3 Code in ep-engine is executing in a multithreaded environment, two classes of
6 1. memcached's threads, for servicing a client and calling in via the
7 [engine API] (https://github.com/couchbase/memcached/blob/master/include/memcached/engine.h)
8 2. ep-engine's threads, for running tasks such as the document expiry pager
9 (see subclasses of `GlobalTasks`).
11 ## Synchronisation Primitives
13 There are three mutual-exclusion primitives available in ep-engine.
15 1. `Mutex` exclusive lock - [mutex.h](./src/mutex.h)
16 2. `RWLock` shared, reader/writer lock - [rwlock.h](./src/rwlock.h)
17 3. `SpinLock` 1-byte exclusive lock - [atomix.h](./src/atomic.h)
19 A conditional-variable is also available called `SyncObject`
20 [syncobject.h](./src/syncobject.h). `SyncObject` glues a `Mutex` and
21 conditional-variable together in one object.
23 These primitives are managed via RAII wrappers - [locks.h](./src/locks.h).
25 1. `LockHolder` - for acquiring a `Mutex` or `SyncObject`.
26 2. `MultiLockHolder` - for acquiring an array of `Mutex` or `SyncObject`.
27 3. `WriterLockHolder` - for acquiring write access to a `RWLock`.
28 4. `ReaderLockHolder` - for acquiring read access to a `RWLock`.
29 5. `SpinLockHolder` - for acquiring a `SpinLock`.
32 The general style is to create a `LockHolder` when you need to acquire a
33 `Mutex`, the constructor will acquire and when the `LockHolder` goes out of
34 scope, the destructor will release the `Mutex`. For certain use-cases the
35 caller can explicitly lock/unlock a `Mutex` via the `LockHolder` class.
40 LockHolder lockHolder(&mutex);
46 LockHolder lockHolder(&mutex);
56 A `MultiLockHolder` allows an array of locks to be conveniently acquired and
57 released, and similarly to `LockHolder` the caller can choose to manually
58 lock/unlock at any time (with all locks locked/unlocked via one call).
64 MultiLockHolder lockHolder(&mutexes, 10);
65 for (int ii = 0; ii < 10; ii++) {
66 objects[ii].doStuff();
74 `RWLock` allows many readers to acquire it and exclusive access for a writer.
75 `ReadLockHolder` acquires the lock for a reader and `WriteLockHolder` acquires
76 the lock for a writer. Neither classes enable manual lock/unlock, all
77 acquisitions and release are performed via the constructor and destructor.
84 ReaderLockHolder rlh(&rwLock);
85 if (thing.getData()) {
91 WriterLockHolder wlh(&rwLock);
98 `SyncObject` inherits from `Mutex` and is thus managed via a `LockHolder` or
99 `MultiLockHolder`. The `SyncObject` provides the conditional-variable
100 synchronisation primitive enabling threads to block and be woken.
102 The wait/wakeOne/wake method is provided by the `SyncObject`.
104 Note that `wake` will wake up a single blocking thread, `wakeOne` will wake up
105 every thread that is blocking on the `SyncObject`.
108 SyncObject syncObject;
109 bool sleeping = false;
111 LockHolder lockHolder(&syncObject);
113 syncObject.wait(); // the mutex is released and the thread put to sleep
114 // when wait returns the mutex is reacquired
119 LockHolder lockHolder(&syncObject);
121 syncObject.notifyOne();
128 A `SpinLock` uses a single byte for the lock and our own code to spin until the
129 lock is acquired. The intention for this lock is for low contention locks.
131 The RAII pattern is just like for a Mutex.
137 SpinLockHolder lockHolder(&spinLock);
143 ## _UNLOCKED convention
145 ep-engine has a function naming convention that indicates the function should
146 be called with a lock acquired.
148 For example the following `doStuff_UNLOCKED` method indicates that it expect a
149 lock to be held before the function is called. What lock should be acquired
150 before calling is not defined by the convention.
153 void Object::doStuff_UNLOCKED() {
157 LockHolder lockHolder(&mutex);
162 ## Thread Local Storage (ObjectRegistry).
164 Threads in ep-engine are servicing buckets and when a thread is dispatched to
165 serve a bucket, the pointer to the `EventuallyPersistentEngine` representing
166 the bucket is placed into thread local storage, this avoids the need for the
167 pointer to be passed along the chain of execution as a formal parameter.
169 Both threads servicing frontend operations (memcached's threads) and ep-engine's
170 own task threads will save the bucket's engine pointer before calling down into
173 Calling `ObjectRegistry::onSwitchThread(enginePtr)` will save the `enginePtr`
174 in thread-local-storage so that subsequent task code can retrieve the pointer
175 with `ObjectRegistry::getCurrentEngine()`.
179 A task is created by creating a sub-class (the `run()` method is the entry point
180 of the task) of the `GlobalTask` class and it is scheduled onto one of 4 task
181 queue types. Each task should be declared in `src/tasks.defs.h` using the TASK
182 macro. Using this macro ensures correct generation of a task-type ID, priority,
183 task name and ultimately ensures each task gets its own scheduling statistics.
185 The recipe is simple.
187 ### Add your task's class name with its priority into `src/tasks.defs.h`
188 * A lower value priority is 'higher'.
190 TASK(MyNewTask, 1) // MyNewTask has priority 1.
193 ### Create your class and set its ID using `MY_TASK_ID`.
196 class MyNewTask : public GlobalTask {
198 MyNewTask(EventuallyPersistentEngine* e)
199 : GlobalTask(e/*engine/,
200 MY_TASK_ID(MyNewTask),
205 ### Define pure-virtual methods in `MyNewTask`
208 The run method is invoked when the task is executed. The method should return
209 true if it should be scheduled again. If false is returned, the instance of the
210 task is never re-scheduled and will deleted once all references to the instance are
216 return schedule again?;
220 * Define the `getDescription` method to aid debugging and statistics.
222 std::string getDescription() {
223 return "A brief description of what MyNewTask does";
227 ### Schedule your task to the desired queue.
229 ExTask myNewTask = new MyNewTask(&engine);
230 myNewTaskId = ExecutorPool::get()->schedule(myNewTask, NONIO_TASK_IDX);
233 The 4 task queue types are:
234 * Readers - `READER_TASK_IDX`
235 * Tasks that should primarily only read from 'disk'. They generally read from
236 the vbucket database files, for example background fetch of a non-resident document.
237 * Writers (they are allowed to read too) `WRITER_TASK_IDX`
238 * Tasks that should primarily only write to 'disk'. They generally write to
239 the vbucket database files, for example when flushing the write queue.
240 * Auxilliary IO `AUXIO_TASK_IDX`
241 * Tasks that read and write 'disk', but not necessarily the vbucket data files.
242 * Non IO `NONIO_TASK_IDX`
243 * Tasks that do not perform 'disk' I/O.
247 The snooze value of the task sets when the task should be executed. The initial snooze
248 value is set when constructing `GlobalTask`. A value of 0.0 means attempt to execute
249 the task as soon as scheduled and 5.0 would be 5 seconds from being scheduled
250 (scheduled meaning when `ExecutorPool::get()->schedule(...)` is called).
252 The `run()` function can also call `snooze(double snoozeAmount)` to set how long
253 before the task is rescheduled.
255 It is **best practice** for most tasks to actually do a sleep forever from their run function:
261 Using `INT_MAX` means sleep forever and tasks should always sleep until they have
262 real work todo. Tasks **should not periodically poll for work** with a snooze of
266 When a task has work todo, some other function should be waking the task using the wake method.
269 ExecutorPool::get()->wake(myNewTaskId)`