Skip to content
Snippets Groups Projects
cancellation.md 10.8 KiB
Newer Older
  • Learn to ignore specific revisions
  • # Cancellation
    Sometimes, requests take a long time to service and clients disconnect
    before Synapse produces a response. To avoid wasting resources, Synapse
    can cancel request processing for select endpoints marked with the
    `@cancellable` decorator.
    
    Synapse makes use of Twisted's `Deferred.cancel()` feature to make
    cancellation work. The `@cancellable` decorator does nothing by itself
    and merely acts as a flag, signalling to developers and other code alike
    that a method can be cancelled.
    
    ## Enabling cancellation for an endpoint
    1. Check that the endpoint method, and any `async` functions in its call
       tree handle cancellation correctly. See
       [Handling cancellation correctly](#handling-cancellation-correctly)
       for a list of things to look out for.
    2. Add the `@cancellable` decorator to the `on_GET/POST/PUT/DELETE`
       method. It's not recommended to make non-`GET` methods cancellable,
       since cancellation midway through some database updates is less
       likely to be handled correctly.
    
    ## Mechanics
    There are two stages to cancellation: downward propagation of a
    `cancel()` call, followed by upwards propagation of a `CancelledError`
    out of a blocked `await`.
    Both Twisted and asyncio have a cancellation mechanism.
    
    |               | Method              | Exception                               | Exception inherits from |
    |---------------|---------------------|-----------------------------------------|-------------------------|
    | Twisted       | `Deferred.cancel()` | `twisted.internet.defer.CancelledError` | `Exception` (!)         |
    | asyncio       | `Task.cancel()`     | `asyncio.CancelledError`                | `BaseException`         |
    
    ### Deferred.cancel()
    When Synapse starts handling a request, it runs the async method
    responsible for handling it using `defer.ensureDeferred`, which returns
    a `Deferred`. For example:
    
    ```python
    def do_something() -> Deferred[None]:
        ...
    
    @cancellable
    async def on_GET() -> Tuple[int, JsonDict]:
        d = make_deferred_yieldable(do_something())
        await d
        return 200, {}
    
    request = defer.ensureDeferred(on_GET())
    ```
    
    When a client disconnects early, Synapse checks for the presence of the
    `@cancellable` decorator on `on_GET`. Since `on_GET` is cancellable,
    `Deferred.cancel()` is called on the `Deferred` from
    `defer.ensureDeferred`, ie. `request`. Twisted knows which `Deferred`
    `request` is waiting on and passes the `cancel()` call on to `d`.
    
    The `Deferred` being waited on, `d`, may have its own handling for
    `cancel()` and pass the call on to other `Deferred`s.
    
    Eventually, a `Deferred` handles the `cancel()` call by resolving itself
    with a `CancelledError`.
    
    ### CancelledError
    The `CancelledError` gets raised out of the `await` and bubbles up, as
    per normal Python exception handling.
    
    ## Handling cancellation correctly
    In general, when writing code that might be subject to cancellation, two
    things must be considered:
     * The effect of `CancelledError`s raised out of `await`s.
     * The effect of `Deferred`s being `cancel()`ed.
    
    Examples of code that handles cancellation incorrectly include:
     * `try-except` blocks which swallow `CancelledError`s.
     * Code that shares the same `Deferred`, which may be cancelled, between
       multiple requests.
     * Code that starts some processing that's exempt from cancellation, but
       uses a logging context from cancellable code. The logging context
       will be finished upon cancellation, while the uncancelled processing
       is still using it.
    
    Some common patterns are listed below in more detail.
    
    ### `async` function calls
    Most functions in Synapse are relatively straightforward from a
    cancellation standpoint: they don't do anything with `Deferred`s and
    purely call and `await` other `async` functions.
    
    An `async` function handles cancellation correctly if its own code
    handles cancellation correctly and all the async function it calls
    handle cancellation correctly. For example:
    ```python
    async def do_two_things() -> None:
        check_something()
        await do_something()
        await do_something_else()
    ```
    `do_two_things` handles cancellation correctly if `do_something` and
    `do_something_else` handle cancellation correctly.
    
    That is, when checking whether a function handles cancellation
    correctly, its implementation and all its `async` function calls need to
    be checked, recursively.
    
    As `check_something` is not `async`, it does not need to be checked.
    
    ### CancelledErrors
    Because Twisted's `CancelledError`s are `Exception`s, it's easy to
    accidentally catch and suppress them. Care must be taken to ensure that
    `CancelledError`s are allowed to propagate upwards.
    
    <table width="100%">
    <tr>
    <td width="50%" valign="top">
    
    **Bad**:
    ```python
    try:
        await do_something()
    except Exception:
        # `CancelledError` gets swallowed here.
        logger.info(...)
    ```
    </td>
    <td width="50%" valign="top">
    
    **Good**:
    ```python
    try:
        await do_something()
    except CancelledError:
        raise
    except Exception:
        logger.info(...)
    ```
    </td>
    </tr>
    <tr>
    <td width="50%" valign="top">
    
    **OK**:
    ```python
    try:
        check_something()
        # A `CancelledError` won't ever be raised here.
    except Exception:
        logger.info(...)
    ```
    </td>
    <td width="50%" valign="top">
    
    **Good**:
    ```python
    try:
        await do_something()
    except ValueError:
        logger.info(...)
    ```
    </td>
    </tr>
    </table>
    
    #### defer.gatherResults
    `defer.gatherResults` produces a `Deferred` which:
     * broadcasts `cancel()` calls to every `Deferred` being waited on.
     * wraps the first exception it sees in a `FirstError`.
    
    Together, this means that `CancelledError`s will be wrapped in
    a `FirstError` unless unwrapped. Such `FirstError`s are liable to be
    swallowed, so they must be unwrapped.
    
    <table width="100%">
    <tr>
    <td width="50%" valign="top">
    
    **Bad**:
    ```python
    async def do_something() -> None:
        await make_deferred_yieldable(
            defer.gatherResults([...], consumeErrors=True)
        )
    
    try:
        await do_something()
    except CancelledError:
        raise
    except Exception:
        # `FirstError(CancelledError)` gets swallowed here.
        logger.info(...)
    ```
    
    </td>
    <td width="50%" valign="top">
    
    **Good**:
    ```python
    async def do_something() -> None:
        await make_deferred_yieldable(
            defer.gatherResults([...], consumeErrors=True)
        ).addErrback(unwrapFirstError)
    
    try:
        await do_something()
    except CancelledError:
        raise
    except Exception:
        logger.info(...)
    ```
    </td>
    </tr>
    </table>
    
    ### Creation of `Deferred`s
    If a function creates a `Deferred`, the effect of cancelling it must be considered. `Deferred`s that get shared are likely to have unintended behaviour when cancelled.
    
    <table width="100%">
    <tr>
    <td width="50%" valign="top">
    
    **Bad**:
    ```python
    cache: Dict[str, Deferred[None]] = {}
    
    def wait_for_room(room_id: str) -> Deferred[None]:
        deferred = cache.get(room_id)
        if deferred is None:
            deferred = Deferred()
            cache[room_id] = deferred
        # `deferred` can have multiple waiters.
        # All of them will observe a `CancelledError`
        # if any one of them is cancelled.
        return make_deferred_yieldable(deferred)
    
    # Request 1
    await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
    # Request 2
    await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
    ```
    </td>
    <td width="50%" valign="top">
    
    **Good**:
    ```python
    cache: Dict[str, Deferred[None]] = {}
    
    def wait_for_room(room_id: str) -> Deferred[None]:
        deferred = cache.get(room_id)
        if deferred is None:
            deferred = Deferred()
            cache[room_id] = deferred
        # `deferred` will never be cancelled now.
        # A `CancelledError` will still come out of
        # the `await`.
        # `delay_cancellation` may also be used.
        return make_deferred_yieldable(stop_cancellation(deferred))
    
    # Request 1
    await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
    # Request 2
    await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
    ```
    </td>
    </tr>
    <tr>
    <td width="50%" valign="top">
    </td>
    <td width="50%" valign="top">
    
    **Good**:
    ```python
    cache: Dict[str, List[Deferred[None]]] = {}
    
    def wait_for_room(room_id: str) -> Deferred[None]:
        if room_id not in cache:
            cache[room_id] = []
        # Each request gets its own `Deferred` to wait on.
        deferred = Deferred()
        cache[room_id]].append(deferred)
        return make_deferred_yieldable(deferred)
    
    # Request 1
    await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
    # Request 2
    await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
    ```
    </td>
    </table>
    
    ### Uncancelled processing
    Some `async` functions may kick off some `async` processing which is
    intentionally protected from cancellation, by `stop_cancellation` or
    other means. If the `async` processing inherits the logcontext of the
    request which initiated it, care must be taken to ensure that the
    logcontext is not finished before the `async` processing completes.
    
    <table width="100%">
    <tr>
    <td width="50%" valign="top">
    
    **Bad**:
    ```python
    cache: Optional[ObservableDeferred[None]] = None
    
    async def do_something_else(
        to_resolve: Deferred[None]
    ) -> None:
        await ...
        logger.info("done!")
        to_resolve.callback(None)
    
    async def do_something() -> None:
        if not cache:
            to_resolve = Deferred()
            cache = ObservableDeferred(to_resolve)
            # `do_something_else` will never be cancelled and
            # can outlive the `request-1` logging context.
            run_in_background(do_something_else, to_resolve)
    
        await make_deferred_yieldable(cache.observe())
    
    with LoggingContext("request-1"):
        await do_something()
    ```
    </td>
    <td width="50%" valign="top">
    
    **Good**:
    ```python
    cache: Optional[ObservableDeferred[None]] = None
    
    async def do_something_else(
        to_resolve: Deferred[None]
    ) -> None:
        await ...
        logger.info("done!")
        to_resolve.callback(None)
    
    async def do_something() -> None:
        if not cache:
            to_resolve = Deferred()
            cache = ObservableDeferred(to_resolve)
            run_in_background(do_something_else, to_resolve)
            # We'll wait until `do_something_else` is
            # done before raising a `CancelledError`.
            await make_deferred_yieldable(
                delay_cancellation(cache.observe())
            )
        else:
            await make_deferred_yieldable(cache.observe())
    
    with LoggingContext("request-1"):
        await do_something()
    ```
    </td>
    </tr>
    <tr>
    <td width="50%">
    
    **OK**:
    ```python
    cache: Optional[ObservableDeferred[None]] = None
    
    async def do_something_else(
        to_resolve: Deferred[None]
    ) -> None:
        await ...
        logger.info("done!")
        to_resolve.callback(None)
    
    async def do_something() -> None:
        if not cache:
            to_resolve = Deferred()
            cache = ObservableDeferred(to_resolve)
            # `do_something_else` will get its own independent
            # logging context. `request-1` will not count any
            # metrics from `do_something_else`.
            run_as_background_process(
                "do_something_else",
                do_something_else,
                to_resolve,
            )
    
        await make_deferred_yieldable(cache.observe())
    
    with LoggingContext("request-1"):
        await do_something()
    ```
    </td>
    <td width="50%">
    </td>
    </tr>
    </table>