Skip to content
Snippets Groups Projects
  1. Jan 28, 2025
  2. Jan 27, 2025
    • Eric Eastwood's avatar
      Fix join being denied after being invited over federation (#18075) · 6ec5e13e
      Eric Eastwood authored
      This also happens for rejecting an invite. Basically, any out-of-band membership transition where we first get the membership as an `outlier` and then rely on federation filling us in to de-outlier it.
      
      This PR mainly addresses automated test flakiness, bots/scripts, and options within Synapse like [`auto_accept_invites`](https://element-hq.github.io/synapse/v1.122/usage/configuration/config_documentation.html#auto_accept_invites) that are able to react quickly (before federation is able to push us events), but also helps in generic scenarios where federation is lagging.
      
      I initially thought this might be a Synapse consistency issue (see issues labeled with [`Z-Read-After-Write`](https://github.com/matrix-org/synapse/labels/Z-Read-After-Write)) but it seems to be an event auth logic problem. Workers probably do increase the number of possible race condition scenarios that make this visible though (replication and cache invalidation lag).
      
      Fix https://github.com/element-hq/synapse/issues/15012
      (probably fixes https://github.com/matrix-org/synapse/issues/15012 (https://github.com/element-hq/synapse/issues/15012))
      Related to https://github.com/matrix-org/matrix-spec/issues/2062
      
      Problems:
      
       1. We don't consider [out-of-band membership](https://github.com/element-hq/synapse/blob/develop/docs/development/room-dag-concepts.md#out-of-band-membership-events) (outliers) in our `event_auth` logic even though we expose them in `/sync`.
       1. (This PR doesn't address this point) Perhaps we should consider authing events in the persistence queue as events already in the queue could allow subsequent events to be allowed (events come through many channels: federation transaction, remote invite, remote join, local send). But this doesn't save us in the case where the event is more delayed over federation.
      
      
      ### What happened before?
      
      I wrote some Complement test that stresses this exact scenario and reproduces the problem: https://github.com/matrix-org/complement/pull/757
      
      ```
      COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1 COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh -run TestSynapseConsistency
      ```
      
      
      We have `hs1` and `hs2` running in monolith mode (no workers):
      
       1. `@charlie1:hs2` is invited and joins the room:
           1. `hs1` invites `@charlie1:hs2` to a room which we receive on `hs2` as `PUT /_matrix/federation/v1/invite/{roomId}/{eventId}` (`on_invite_request(...)`) and the invite membership is persisted as an outlier. The `room_memberships` and `local_current_membership` database tables are also updated which means they are visible down `/sync` at this point.
           1. `@charlie1:hs2` decides to join because it saw the invite down `/sync`. Because `hs2` is not yet in the room, this happens as a remote join `make_join`/`send_join` which comes back with all of the auth events needed to auth successfully and now `@charlie1:hs2` is successfully joined to the room.
       1. `@charlie2:hs2` is invited and and tries to join the room:
           1. `hs1` invites `@charlie2:hs2` to the room which we receive on `hs2` as `PUT /_matrix/federation/v1/invite/{roomId}/{eventId}` (`on_invite_request(...)`) and the invite membership is persisted as an outlier. The `room_memberships` and `local_current_membership` database tables are also updated which means they are visible down `/sync` at this point.
           1. Because `hs2` is already participating in the room, we also see the invite come over federation in a transaction and we start processing it (not done yet, see below)
           1. `@charlie2:hs2` decides to join because it saw the invite down `/sync`. Because `hs2`, is already in the room, this happens as a local join but we deny the event because our `event_auth` logic thinks that we have no membership in the room :x: (expected to be able to join because we saw the invite down `/sync`)
           1. We finally finish processing the `@charlie2:hs2` invite event from and de-outlier it.
               - If this finished before we tried to join we would have been fine but this is the race condition that makes this situation visible.
      
      
      Logs for `hs2`:
      
      ```
      :ballot_box: on_invite_request: handling event <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=False>
      :flashlight: _store_room_members_txn update room_memberships: <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=True>
      :flashlight: _store_room_members_txn update local_current_membership: <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=True>
      :incoming_envelope: Notifying about new event <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=True>
      :white_check_mark: on_invite_request: handled event <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=True>
      :magnet: do_invite_join for @user-2-charlie1:hs2 in !sfZVBdLUezpPWetrol:hs1
      :flashlight: _store_room_members_txn update room_memberships: <FrozenEventV3 event_id=$bwv8LxFnqfpsw_rhR7OrTjtz09gaJ23MqstKOcs7ygA, type=m.room.member, state_key=@user-1-alice:hs1, membership=join, outlier=True>
      :flashlight: _store_room_members_txn update room_memberships: <FrozenEventV3 event_id=$oju1ts3G3pz5O62IesrxX5is4LxAwU3WPr4xvid5ijI, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=join, outlier=False>
      :incoming_envelope: Notifying about new event <FrozenEventV3 event_id=$oju1ts3G3pz5O62IesrxX5is4LxAwU3WPr4xvid5ijI, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=join, outlier=False>
      
      ...
      
      :ballot_box: on_invite_request: handling event <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=False>
      :flashlight: _store_room_members_txn update room_memberships: <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=True>
      :flashlight: _store_room_members_txn update local_current_membership: <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=True>
      :incoming_envelope: Notifying about new event <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=True>
      :white_check_mark: on_invite_request: handled event <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=True>
      :mailbox_with_mail: handling received PDU in room !sfZVBdLUezpPWetrol:hs1: <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=False>
      :postbox: handle_new_client_event: handling <FrozenEventV3 event_id=$WNVDTQrxy5tCdPQHMyHyIn7tE4NWqKsZ8Bn8R4WbBSA, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=join, outlier=False>
      :x: Denying new event <FrozenEventV3 event_id=$WNVDTQrxy5tCdPQHMyHyIn7tE4NWqKsZ8Bn8R4WbBSA, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=join, outlier=False> because 403: You are not invited to this room.
      synapse.http.server - 130 - INFO - POST-16 - <SynapseRequest at 0x7f460c91fbf0 method='POST' uri='/_matrix/client/v3/join/%21sfZVBdLUezpPWetrol:hs1?server_name=hs1' clientproto='HTTP/1.0' site='8080'> SynapseError: 403 - You are not invited to this room.
      :incoming_envelope: Notifying about new event <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=False>
      :white_check_mark: handled received PDU in room !sfZVBdLUezpPWetrol:hs1: <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=False>
      ```
      Unverified
      6ec5e13e
  3. Jan 24, 2025
  4. Jan 21, 2025
  5. Jan 08, 2025
  6. Jan 06, 2025
  7. Jan 03, 2025
  8. Dec 18, 2024
    • Andrew Morgan's avatar
      Fix mypy errors on Twisted 24.11.0 (#17998) · 3eb92369
      Andrew Morgan authored
      Fixes various `mypy` errors associated with Twisted `24.11.0`.
      
      Hopefully addresses https://github.com/element-hq/synapse/issues/17075,
      though I've yet to test against `trunk`.
      
      Changes should be compatible with our currently pinned Twisted version
      of `24.7.0`.
      Unverified
      3eb92369
    • Andrew Morgan's avatar
      Bump mypy from 1.11.2 to 1.12.1 and fix new typechecking errors (#17999) · f1b0f9a4
      Andrew Morgan authored
      Supersedes https://github.com/element-hq/synapse/pull/17958.
      
      Awkwardly, the changes made to fix the mypy errors in 1.12.1 cause
      errors in 1.11.2. So you'll need to update your mypy version to 1.12.1
      to eliminate typechecking errors during developing.
      Unverified
      f1b0f9a4
    • cynhr's avatar
      Add email.tlsname config option (#17849) · f1ecf466
      cynhr authored
      The existing `email.smtp_host` config option is used for two distinct
      purposes: it is resolved into the IP address to connect to, and used to
      (request via SNI and) validate the server's certificate if TLS is
      enabled. This new option allows specifying a different name for the
      second purpose.
      
      This is especially helpful, if `email.smtp_host` isn't a global FQDN,
      but something that resolves only locally (e.g. "localhost" to connect
      through the loopback interface, or some other internally routed name),
      that one cannot get a valid certificate for.
      Alternatives would of course be to specify a global FQDN as
      `email.smtp_host`, or to disable TLS entirely, both of which might be
      undesirable, depending on the SMTP server configuration.
      Unverified
      f1ecf466
  9. Dec 17, 2024
    • V02460's avatar
      Add `macaroon_secret_key_path` config option (#17983) · 57bf4494
      V02460 authored
      Another config option on my quest to a `*_path` variant for every
      secret. This time it’s `macaroon_secret_key_path`.
      
      Reading secrets from files has the security advantage of separating the secrets from the config. It also simplifies secrets management in Kubernetes. Also useful to NixOS users.
      Unverified
      57bf4494
  10. Dec 16, 2024
    • Shay's avatar
      Add some useful endpoints to Admin API (#17948) · 8208186e
      Shay authored
      - Fetch the number of invites the provided user has sent after a given
      timestamp
      - Fetch the number of rooms the provided user has joined after a given
      timestamp, regardless if they have left/been banned from the rooms
      subsequently
      - Get report IDs of event reports where the provided user was the sender
      of the reported event
      Unverified
      8208186e
  11. Dec 13, 2024
  12. Dec 04, 2024
  13. Dec 03, 2024
  14. Dec 02, 2024
  15. Nov 29, 2024
  16. Nov 28, 2024
    • Richard van der Hoff's avatar
      Fix new scheduled tasks jumping the queue (#17962) · d80cd57c
      Richard van der Hoff authored
      Currently, when a new scheduled task is added and its scheduled time has
      already passed, we set it to ACTIVE. This is problematic, because it
      means it will jump the queue ahead of all other SCHEDULED tasks;
      furthermore, if the Synapse process gets restarted, it will jump ahead
      of any ACTIVE tasks which have been started but are taking a while to
      run.
      
      Instead, we leave it set to SCHEDULED, but kick off a call to
      `_launch_scheduled_tasks`, which will decide if we actually have
      capacity to start a new task, and start the newly-added task if so.
      Unverified
      d80cd57c
  17. Nov 25, 2024
    • Erik Johnston's avatar
      Fix up logic for delaying sending read receipts over federation. (#17933) · 3943d2fd
      Erik Johnston authored
      For context of why we delay read receipts, see
      https://github.com/matrix-org/synapse/issues/4730.
      
      Element Web often sends read receipts in quick succession, if it reloads
      the timeline it'll send one for the last message in the old timeline and
      again for the last message in the new timeline. This caused remote users
      to see a read receipt for older messages come through quickly, but then
      the second read receipt taking a while to arrive for the most recent
      message.
      
      There are two things going on in this PR:
      1. There was a mismatch between seconds and milliseconds, and so we
      ended up delaying for far longer than intended.
      2. Changing the logic to reuse the `DestinationWakeupQueue` (used for
      presence)
      
      The changes in logic are:
      - Treat the first receipt and subsequent receipts in a room in the same
      way
      - Whitelist certain classes of receipts to never delay being sent, i.e.
      receipts in small rooms, receipts for events that were sent within the
      last 60s, and sending receipts to the event sender's server.
      - The maximum delay a receipt can have before being sent to a server is
      30s, and we'll send out receipts to remotes at least at 50Hz (by
      default)
      
      The upshot is that this should make receipts feel more snappy over
      federation.
      
      This new logic should send roughly between 10%–20% of transactions
      immediately on matrix.org.
      Unverified
      3943d2fd
  18. Nov 22, 2024
  19. Nov 20, 2024
  20. Nov 19, 2024
  21. Nov 13, 2024
  22. Nov 12, 2024
  23. Nov 08, 2024
    • Erik Johnston's avatar
      Fix MSC4222 returning full state (#17915) · cacd4fd7
      Erik Johnston authored
      There was a bug that meant we would return the full state of the room on
      incremental syncs when using lazy loaded members and there were no
      entries in the timeline.
      
      This was due to trying to use `state_filter or state_filter.all()` as a
      short hand for handling `None` case, however `state_filter` implements
      `__bool__` so if the state filter was empty it would be set to full.
      
      c.f. MSC4222 and #17888
      Unverified
      cacd4fd7
  24. Nov 07, 2024
  25. Nov 06, 2024
  26. Nov 05, 2024
  27. Nov 04, 2024
  28. Oct 31, 2024
Loading