Skip to content
Snippets Groups Projects
  1. Oct 19, 2022
    • Erik Johnston's avatar
      1.70.0rc1 · 15a240f1
      Erik Johnston authored
      15a240f1
    • Eric Eastwood's avatar
      Fix MSC3030 `/timestamp_to_event` returning `outliers` that it has no idea... · fa8616e6
      Eric Eastwood authored
      Fix MSC3030 `/timestamp_to_event` returning `outliers` that it has no idea whether are near a gap or not (#14215)
      
      Fix MSC3030 `/timestamp_to_event` endpoint returning `outliers` that it has no idea whether are near a gap or not (and therefore unable to determine whether it's actually the closest event). The reason Synapse doesn't know whether an `outlier` is next to a gap is because our gap checks rely on entries in the `event_edges`, `event_forward_extremeties`, and `event_backward_extremities` tables which is [not the case for `outliers`](https://github.com/matrix-org/synapse/blob/2c63cdcc3f1aa4625e947de3c23e0a8133c61286/docs/development/room-dag-concepts.md#outliers).
      
      Also fixes MSC3030 Complement `can_paginate_after_getting_remote_event_from_timestamp_to_event_endpoint` test flake.  Although this acted flakey in Complement, if `sync_partial_state` raced and beat us before `/timestamp_to_event`, then even if we retried the failing `/context` request it wouldn't work until we made this Synapse change. With this PR, Synapse will never return an `outlier` event so that test will always go and ask over federation.
      
      Fix  https://github.com/matrix-org/synapse/issues/13944
      
      
      ### Why did this fail before? Why was it flakey?
      
      Sleuthing the server logs on the [CI failure](https://github.com/matrix-org/synapse/actions/runs/3149623842/jobs/5121449357#step:5:5805), it looks like `hs2:/timestamp_to_event` found `$NP6-oU7mIFVyhtKfGvfrEQX949hQX-T-gvuauG6eurU` as an `outlier` event locally. Then when we went and asked for it via `/context`, since it's an `outlier`, it was filtered out of the results -> `You don't have permission to access that event.`
      
      This is reproducible when `sync_partial_state` races and persists `$NP6-oU7mIFVyhtKfGvfrEQX949hQX-T-gvuauG6eurU` as an `outlier` before we evaluate `get_event_for_timestamp(...)`. To consistently reproduce locally, just add a delay at the [start of `get_event_for_timestamp(...)`](https://github.com/matrix-org/synapse/blob/cb20b885cb4bd1648581dd043a184d86fc8c7a00/synapse/handlers/room.py#L1470-L1496) so it always runs after `sync_partial_state` completes.
      
      ```py
      from twisted.internet import task as twisted_task
      d = twisted_task.deferLater(self.hs.get_reactor(), 3.5)
      await d
      ```
      
      In a run where it passes, on `hs2`, `get_event_for_timestamp(...)` finds a different event locally which is next to a gap and we request from a closer one from `hs1` which gets backfilled. And since the backfilled event is not an `outlier`, it's returned as expected during `/context`.
      
      With this PR, Synapse will never return an `outlier` event so that test will always go and ask over federation.
      Unverified
      fa8616e6
  2. Oct 18, 2022
  3. Oct 17, 2022
  4. Oct 15, 2022
    • Eric Eastwood's avatar
      Stop getting missing `prev_events` after we already know their signature is invalid (#13816) · 40bb37eb
      Eric Eastwood authored
      While https://github.com/matrix-org/synapse/pull/13635 stops us from doing the slow thing after we've already done it once, this PR stops us from doing one of the slow things in the first place.
      
      Related to
       - https://github.com/matrix-org/synapse/issues/13622
          - https://github.com/matrix-org/synapse/pull/13635
       - https://github.com/matrix-org/synapse/issues/13676
      
      Part of https://github.com/matrix-org/synapse/issues/13356
      
      Follow-up to https://github.com/matrix-org/synapse/pull/13815 which tracks event signature failures.
      
      With this PR, we avoid the call to the costly `_get_state_ids_after_missing_prev_event` because the signature failure will count as an attempt before and we filter events based on the backoff before calling `_get_state_ids_after_missing_prev_event` now.
      
      For example, this will save us 156s out of the 185s total that this `matrix.org` `/messages` request. If you want to see the full Jaeger trace of this, you can drag and drop this `trace.json` into your own Jaeger, https://gist.github.com/MadLittleMods/4b12d0d0afe88c2f65ffcc907306b761
      
      To explain this exact scenario around `/messages` -> backfill, we call `/backfill` and first check the signatures of the 100 events. We see bad signature for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` and `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` (both member events). Then we process the 98 events remaining that have valid signatures but one of the events references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event`. So we have to do the whole `_get_state_ids_after_missing_prev_event` rigmarole which pulls in those same events which fail again because the signatures are still invalid.
      
       - `backfill`
          - `outgoing-federation-request` `/backfill`
          - `_check_sigs_and_hash_and_fetch`
             - `_check_sigs_and_hash_and_fetch_one` for each event received over backfill
                - :exclamation: `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
                - :exclamation: `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
         - `_process_pulled_events`
            - `_process_pulled_event` for each validated event
               - :exclamation: Event `$Q0iMdqtz3IJYfZQU2Xk2WjB5NDF8Gg8cFSYYyKQgKJ0` references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event` which is missing so we try to get it
                  - `_get_state_ids_after_missing_prev_event`
                     - `outgoing-federation-request` `/state_ids`
                     - :exclamation: `get_pdu` for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` which fails the signature check again
                     - :exclamation: `get_pdu` for `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` which fails the signature check
      Unverified
      40bb37eb
  5. Oct 14, 2022
Loading