Commits · 1c26acd815a8609314991e539dd99ceb2b9b1b43 · Timo Ley / synapse

Aug 30, 2022

Fix bug where we wedge media plugins if clients disconnect early (#13660) · 1c26acd8

Erik Johnston authored 2 years ago

We incorrectly didn't use the returned `Responder` if the client had
disconnected, which meant that the resource used by the Responder
wasn't correctly released.

In particular, this exhausted the thread pools so that *all* requests
timed out.

Unverified

1c26acd8

Do not wait for background updates to complete do expire URL cache. (#13657) · 303b40b9

Patrick Cloke authored 2 years ago

Media downloaded as part of a URL preview is normally deleted after two days.
However, while a background database migration is running, the process is
stopped. A long-running database migration can therefore cause the media
store to fill up with old preview files.

This logic was added in #2697 to make sure that we didn't try to run the expiry
without an index on `local_media_repository.created_ts`; the original logic that
needs that index was added in #2478 (in `get_url_cache_media_before`, as
amended by 93247a42), and is still present.

Given that the background update was added before Synapse v1.0.0, just drop
this check and assume the index exists.

Unverified

303b40b9

Speed up inserting `event_push_actions_staging`. (#13634) · 20df96a7
Patrick Cloke authored 2 years ago
```
By using `execute_values` instead of `execute_batch`.
```
Unverified

20df96a7
Fix that user cannot `/forget` rooms after the last member has left (#13546) · 682dfcfc
Dirk Klimpel authored 2 years ago

Unverified

682dfcfc

Optimize how we calculate `likely_domains` during backfill (#13575) · 51d732db

Eric Eastwood authored 2 years ago

Optimize how we calculate `likely_domains` during backfill because I've seen this take 17s in production just to `get_current_state` which is used to `get_domains_from_state` (see case [*2. Loading tons of events* in the `/messages` investigation issue](https://github.com/matrix-org/synapse/issues/13356)).

There are 3 ways we currently calculate hosts that are in the room:

 1. `get_current_state` -> `get_domains_from_state`
    - Used in `backfill` to calculate `likely_domains` and `/timestamp_to_event` because it was cargo-culted from `backfill`
    - This one is being eliminated in favor of `get_current_hosts_in_room` in this PR 🕳️
 1. `get_current_hosts_in_room`
    - Used for other federation things like sending read receipts and typing indicators
 1. `get_hosts_in_room_at_events`
    - Used when pushing out events over federation to other servers in the `_process_event_queue_loop`

Fix https://github.com/matrix-org/synapse/issues/13626

Part of https://github.com/matrix-org/synapse/issues/13356

Mentioned in [internal doc](https://docs.google.com/document/d/1lvUoVfYUiy6UaHB6Rb4HicjaJAU40-APue9Q4vzuW3c/edit#bookmark=id.2tvwz3yhcafh)


### Query performance

#### Before

The query from `get_current_state` sucks just because we have to get all 80k events. And we see almost the exact same performance locally trying to get all of these events (16s vs 17s):
```
synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 16035.612 ms (00:16.036)

synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 4243.237 ms (00:04.243)
```

But what about `get_current_hosts_in_room`: When there is 8M rows in the `current_state_events` table, the previous query in `get_current_hosts_in_room` took 13s from complete freshness (when the events were first added). But takes 930ms after a Postgres restart or 390ms if running back to back to back.

```sh
$ psql synapse
synapse=# \timing on
synapse=# SELECT COUNT(DISTINCT substring(state_key FROM '@[^:]*:(.*)$'))
FROM current_state_events
WHERE
    type = 'm.room.member'
    AND membership = 'join'
    AND room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
 count
-------
  4130
(1 row)

Time: 13181.598 ms (00:13.182)

synapse=# SELECT COUNT(*) from current_state_events where room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
 count
-------
 80814

synapse=# SELECT COUNT(*) from current_state_events;
  count
---------
 8162847

synapse=# SELECT pg_size_pretty( pg_total_relation_size('current_state_events') );
 pg_size_pretty
----------------
 4702 MB
```

#### After

I'm not sure how long it takes from complete freshness as I only really get that opportunity once (maybe restarting computer but that's cumbersome) and it's not really relevant to normal operating times. Maybe you get closer to the fresh times the more access variability there is so that Postgres caches aren't as exact. Update: The longest I've seen this run for is 6.4s and 4.5s after a computer restart.

After a Postgres restart, it takes 330ms and running back to back takes 260ms.

```sh
$ psql synapse
synapse=# \timing on
Timing is on.
synapse=# SELECT
    substring(c.state_key FROM '@[^:]*:(.*)$') as host
FROM current_state_events c
/* Get the depth of the event from the events table */
INNER JOIN events AS e USING (event_id)
WHERE
    c.type = 'm.room.member'
    AND c.membership = 'join'
    AND c.room_id = '!OGEhHVWSdvArJzumhm:matrix.org'
GROUP BY host
ORDER BY min(e.depth) ASC;
Time: 333.800 ms
```

#### Going further

To improve things further we could add a `limit` parameter to `get_current_hosts_in_room`. Realistically, we don't need 4k domains to choose from because there is no way we're going to query that many before we a) probably get an answer or b) we give up. 

Another thing we can do is optimize the query to use a index skip scan:

 - https://wiki.postgresql.org/wiki/Loose_indexscan
 - Index Skip Scan, https://commitfest.postgresql.org/37/1741/
 - https://www.timescale.com/blog/how-we-made-distinct-queries-up-to-8000x-faster-on-postgresql/

Unverified

51d732db

Aug 28, 2022

Print complement failure results last (#13639) · 4f6de33f

Richard van der Hoff authored 2 years ago

Since github always scrolls to the bottom of any test output, let's put the
failed tests last and hide any successful packages.

Unverified

4f6de33f

Aug 26, 2022

Improve documentation around user registration (#13640) · c4e29b69
Richard van der Hoff authored 2 years ago
```
Update a bunch of the documentation for user registration, add some cross
links, etc.
```
Unverified

c4e29b69

Generate missing configuration files at startup (#13615) · 5e5c8150

Richard van der Hoff authored 2 years ago

If things like the signing key file are missing, let's just try to generate
them on startup.

Again, this is useful for k8s-like deployments where we just want to generate
keys on the first run.

Unverified

5e5c8150

Update debhelper (#13594) · 998e2118

Jörg Behrmann authored 2 years ago


* Update debian packaging to debhelper version 12

Don't call dh_installinit anymore, because it has been deprecated, and use
dh_installsystemd instead of dh_systemd_enable for the same reason.

Signed-off-by: Jörg Behrmann <behrmann@physik.fu-berlin.de>

* Drop preinst script

It was used for reasons of interactions of dh_systemd_start and dh_installinit,
which have both be deprecated

Signed-off-by: Jörg Behrmann <behrmann@physik.fu-berlin.de>

* Drop /etc/default file

It was no longer being installed.

* Remove debian/compat file

This is managed by the control file nowadays

Unverified

998e2118

Move the execution of the retention purge_jobs to the main worker (#13632) · 967d7bad
Brad Murray authored 2 years ago
```
Fixes #9927

Signed-off-by: Brad Murray <brad@beeper.com>
```
Unverified

967d7bad

Aug 25, 2022

Debian packaging: explicitly allocate a group for the system user (#13593) · 978666a0

Jörg Behrmann authored 2 years ago


Otherwise the files of the synapse user are readable by the nobody user, which
is unsafe.

Signed-off-by: Jörg Behrmann <behrmann@physik.fu-berlin.de>

Unverified

978666a0

Support `registration_shared_secret` in a file (#13614) · d092e6f3
Richard van der Hoff authored 2 years ago
```
A new `registration_shared_secret_path` option. This is kinda handy for k8s deployments and things.
```
Unverified

d092e6f3
register_new_matrix_user: read server url from config (#13616) · a2ce6144
Richard van der Hoff authored 2 years ago
```
Fixes https://github.com/matrix-org/synapse/issues/3672:
`https://localhost:8448` is virtually never right.
```
Unverified

a2ce6144

Update automation for incoming issues (#13629) · a2824465

Kat Gerasimova authored 2 years ago

GitHub appears to be deprecating addProjectNextItem by not allowing it to be used alongside projectV2 to get the project ID, so switching to using addProjectV2ItemById instead.

Unverified

a2824465

Aug 24, 2022

Comment about a better future where we can get the state diff between two events (#13586) · 0bf180cb

Eric Eastwood authored 2 years ago

Split off from https://github.com/matrix-org/synapse/pull/13561

Part of https://github.com/matrix-org/synapse/issues/13356

Mentioned in [internal doc](https://docs.google.com/document/d/1lvUoVfYUiy6UaHB6Rb4HicjaJAU40-APue9Q4vzuW3c/edit#bookmark=id.2tvwz3yhcafh)

Unverified

0bf180cb

Rename `event_map` to `unpersisted_events` (#13603) · c406d50d
David Robertson authored 2 years ago

Unverified

c406d50d
Update `get_users_in_room` mis-use to get hosts with dedicated `get_current_hosts_in_room` (#13605) · 1a209efd
Eric Eastwood authored 2 years ago
```
See https://github.com/matrix-org/synapse/pull/13575#discussion_r953023755
```
Unverified

1a209efd

Directly lookup local membership instead of getting all members in a room... · d58615c8

Eric Eastwood authored 2 years ago

Directly lookup local membership instead of getting all members in a room first (`get_users_in_room` mis-use) (#13608)

See https://github.com/matrix-org/synapse/pull/13575#discussion_r953023755

Unverified

d58615c8

When loading current ids, sort by `stream_id` to avoid incorrect overwrite and... · b93bd95e

Eric Eastwood authored 2 years ago

When loading current ids, sort by `stream_id` to avoid incorrect overwrite and avoid errors caused by sorting alphabetical instance name which can be `null` (#13585)

When loading current ids, sort by stream ID so that we don't want to overwrite the `current_position` of an instance to a lower stream ID than we're actually at ([discussion](https://github.com/matrix-org/synapse/pull/13585#discussion_r951795379)). Previously, it sorted alphabetically by instance name which can be `null` and throw errors but more importantly, accomplishes nothing.

Fixes the following startup error which is why I started looking into this area:

```
$ poetry run synapse_homeserver --config-path homeserver.yaml
****************************************************************
 Error during initialisation:
    '<' not supported between instances of 'NoneType' and 'str'
 There may be more information in the logs.
****************************************************************
```

Somehow my database ended up looking like the following, notice the `instance_name` is `null` in the db, and we can't sort `NoneType` things. Another question is why do we see the `instance_name` as `null` sometimes instead of `master` in monolith mode?
```
$ psql synapse
synapse=# SELECT * FROM stream_positions;
   stream_name   | instance_name | stream_id
-----------------+---------------+-----------
 account_data    | master        |      1242
 events          | master        |      1787
 to_device       | master        |        58
 presence_stream | master        |    485638
 receipts        | master        |       341
 backfill        | master        |   -139106
(6 rows)
synapse=# SELECT instance_name, stream_id FROM receipts_linearized;
 instance_name | stream_id
---------------+-----------
               |       211
               |         3
               |         4
               |       212
               |       213
               |       224
               |       228
               |       164
               |       313
               |       253
               |        38
               |       321
               |       324
               |       189
               |       192
               |       193
               |       194
               |       195
               |       197
               |       198
               |       275
               |        79
               |       339
               |       340
               |        82
               |       341
               |        84
               |        85
               |        91
               |       119
```

Unverified

b93bd95e

Use dedicated `get_local_users_in_room` to find local users when calculating... · c807b814

Eric Eastwood authored 2 years ago

Use dedicated `get_local_users_in_room` to find local users when calculating `join_authorised_via_users_server` of a `/make_join` request (#13606)

Use dedicated `get_local_users_in_room` to find local users when calculating `join_authorised_via_users_server` ("the authorising user for joining a restricted room") of a `/make_join` request.

Found while working on https://github.com/matrix-org/synapse/pull/13575#discussion_r953023755 but it's not related.

Unverified

c807b814

First draft of triage_labelled action (#13612) · 371db86a
Andy Balaam authored 2 years ago

Unverified

371db86a
Add experimental configuration option to allow disabling legacy Prometheus metric names. (#13540) · be4250c7
reivilibre authored 2 years ago
```
Co-authored-by: David Robertson <davidr@element.io>
```
Unverified

be4250c7
Add GitHub automation for new issues (#13610) · 2e2040c9
Kat Gerasimova authored 2 years ago
```
Set up automation to move newly opened issues in GitHub to the issue triage board.
```
Unverified

2e2040c9
Rewrite get push actions queries (#13597) · b687010f
Nick Mills-Barrett authored 2 years ago

Unverified

b687010f

Faster Room Joins: fix `/make_knock` blocking indefinitely when the room in... · ba882c03

reivilibre authored 2 years ago

Faster Room Joins: fix `/make_knock` blocking indefinitely when the room in question is a partial-stated room. (#13583)

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

Unverified

ba882c03

Instrument `_check_sigs_and_hash_and_fetch` to trace time spent in child concurrent calls (#13588) · 7af07f97

Eric Eastwood authored 2 years ago

Instrument `_check_sigs_and_hash_and_fetch` to trace time spent in child concurrent calls because I've see `_check_sigs_and_hash_and_fetch` take [10.41s to process 100 events](https://github.com/matrix-org/synapse/issues/13587)

Fix https://github.com/matrix-org/synapse/issues/13587

Part of https://github.com/matrix-org/synapse/issues/13356

Unverified

7af07f97

Aug 23, 2022

Write about the chain cover a little. (#13602) · a25a3700
David Robertson authored 2 years ago
```
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
```
Unverified

a25a3700

Speed up `@cachedList` (#13591) · f7ddfe17

Erik Johnston authored 2 years ago

This speeds things up by ~2x.

The vast majority of the time is now spent in `LruCache` moving things around the linked lists.

We do this via two things:
1. Don't create a deferred per-key during bulk set operations in `DeferredCache`. Instead, only create them if a subsequent caller asks for the key.
2. Add a bulk lookup API to `DeferredCache` rather than use a loop.

Unverified

f7ddfe17

Fix regression caused by #13573 (#13600) · 05c9c736
Erik Johnston authored 2 years ago
```
Broke in #13573. 
```
Unverified

05c9c736

Update openid.md (#13568) · bdfff9c3

nilsKr3 authored 2 years ago

Linking the help article may prevent confusion regarding the creation of the necessary rule using auth0.

Unverified

bdfff9c3

Merge tag 'v1.66.0rc1' into develop · ca3d19b0

David Robertson authored 2 years ago

Synapse 1.66.0rc1 (2022-08-23)
==============================

This release removes the ability for homeservers to delegate email ownership
verification and password reset confirmation to identity servers. This removal
was originally planned for Synapse 1.64, but was later deferred until now.

See the [upgrade notes](https://matrix-org.github.io/synapse/v1.66/upgrade.html#upgrading-to-v1660) for more details.

Features
--------

- Improve validation of request bodies for the following client-server API endpoints: [`/account/password`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountpassword), [`/account/password/email/requestToken`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountpasswordemailrequesttoken), [`/account/deactivate`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountdeactivate) and [`/account/3pid/email/requestToken`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3account3pidemailrequesttoken). ([\#13188](https://github.com/matrix-org/synapse/issues/13188), [\#13563](https://github.com/matrix-org/synapse/issues/13563))
- Add forgotten status to [Room Details Admin API](https://matrix-org.github.io/synapse/latest/admin_api/rooms.html#room-details-api). ([\#13503](https://github.com/matrix-org/synapse/issues/13503))
- Add an experimental implementation for [MSC3852 (Expose user agents on `Device`)](https://github.com/matrix-org/matrix-spec-proposals/pull/3852). ([\#13549](https://github.com/matrix-org/synapse/issues/13549))
- Add `org.matrix.msc2716v4` experimental room version with updated content fields. Part of [MSC2716 (Importing history)](https://github.com/matrix-org/matrix-spec-proposals/pull/2716).  ([\#13551](https://github.com/matrix-org/synapse/issues/13551))
- Add support for compression to federation responses. ([\#13537](https://github.com/matrix-org/synapse/issues/13537))
- Improve performance of sending messages in rooms with thousands of local users. ([\#13522](https://github.com/matrix-org/synapse/issues/13522), [\#13547](https://github.com/matrix-org/synapse/issues/13547))

Bugfixes
--------

- Faster room joins: make `/joined_members` block whilst the room is partial stated. ([\#13514](https://github.com/matrix-org/synapse/issues/13514))
- Fix a bug introduced in Synapse 1.21.0 where the [`/event_reports` Admin API](https://matrix-org.github.io/synapse/develop/admin_api/event_reports.html) could return a total count which was larger than the number of results you can actually query for. ([\#13525](https://github.com/matrix-org/synapse/issues/13525))
- Fix a bug introduced in Synapse 1.52.0 where sending server notices fails if `max_avatar_size` or `allowed_avatar_mimetypes` is set and not `system_mxid_avatar_url`. ([\#13566](https://github.com/matrix-org/synapse/issues/13566))
- Fix a bug where the `opentracing.force_tracing_for_users` config option would not apply to [`/sendToDevice`](https://spec.matrix.org/v1.3/client-server-api/#put_matrixclientv3sendtodeviceeventtypetxnid) and [`/keys/upload`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3keysupload) requests. ([\#13574](https://github.com/matrix-org/synapse/issues/13574))

Improved Documentation
----------------------

- Add `openssl` example for generating registration HMAC digest. ([\#13472](https://github.com/matrix-org/synapse/issues/13472))
- Tidy up Synapse's README. ([\#13491](https://github.com/matrix-org/synapse/issues/13491))
- Document that event purging related to the `redaction_retention_period` config option is executed only every 5 minutes. ([\#13492](https://github.com/matrix-org/synapse/issues/13492))
- Add a warning to retention documentation regarding the possibility of database corruption. ([\#13497](https://github.com/matrix-org/synapse/issues/13497))
- Document that the `DOCKER_BUILDKIT=1` flag is needed to build the docker image. ([\#13515](https://github.com/matrix-org/synapse/issues/13515))
- Add missing links in `user_consent` section of configuration manual. ([\#13536](https://github.com/matrix-org/synapse/issues/13536))
- Fix the doc and some warnings that were referring to the nonexistent `custom_templates_directory` setting (instead of `custom_template_directory`). ([\#13538](https://github.com/matrix-org/synapse/issues/13538))

Deprecations and Removals
-------------------------

- Remove the ability for homeservers to delegate email ownership verification
  and password reset confirmation to identity servers. See [upgrade notes](https://matrix-org.github.io/synapse/v1.66/upgrade.html#upgrading-to-v1660) for more details.

Internal Changes
----------------

- Update the rejected state of events during de-partial-stating. ([\#13459](https://github.com/matrix-org/synapse/issues/13459))
- Avoid blocking lazy-loading `/sync`s during partial joins due to remote memberships. Pull remote memberships from auth events instead of the room state. ([\#13477](https://github.com/matrix-org/synapse/issues/13477))
- Refuse to start when faster joins is enabled on a deployment with workers, since worker configurations are not currently supported. ([\#13531](https://github.com/matrix-org/synapse/issues/13531))

- Allow use of both `@trace` and `@tag_args` stacked on the same function. ([\#13453](https://github.com/matrix-org/synapse/issues/13453))
- Instrument the federation/backfill part of `/messages` for understandable traces in Jaeger. ([\#13489](https://github.com/matrix-org/synapse/issues/13489))
- Instrument `FederationStateIdsServlet` (`/state_ids`) for understandable traces in Jaeger. ([\#13499](https://github.com/matrix-org/synapse/issues/13499), [\#13554](https://github.com/matrix-org/synapse/issues/13554))
- Track HTTP response times over 10 seconds from `/messages` (`synapse_room_message_list_rest_servlet_response_time_seconds`). ([\#13533](https://github.com/matrix-org/synapse/issues/13533))
- Add metrics to track how the rate limiter is affecting requests (sleep/reject). ([\#13534](https://github.com/matrix-org/synapse/issues/13534), [\#13541](https://github.com/matrix-org/synapse/issues/13541))
- Add metrics to time how long it takes us to do backfill processing (`synapse_federation_backfill_processing_before_time_seconds`, `synapse_federation_backfill_processing_after_time_seconds`). ([\#13535](https://github.com/matrix-org/synapse/issues/13535), [\#13584](https://github.com/matrix-org/synapse/issues/13584))
- Add metrics to track rate limiter queue timing (`synapse_rate_limit_queue_wait_time_seconds`). ([\#13544](https://github.com/matrix-org/synapse/issues/13544))
- Update metrics to track `/messages` response time by room size. ([\#13545](https://github.com/matrix-org/synapse/issues/13545))

- Refactor methods in `synapse.api.auth.Auth` to use `Requester` objects everywhere instead of user IDs. ([\#13024](https://github.com/matrix-org/synapse/issues/13024))
- Clean-up tests for notifications. ([\#13471](https://github.com/matrix-org/synapse/issues/13471))
- Add some miscellaneous comments to document sync, especially around `compute_state_delta`. ([\#13474](https://github.com/matrix-org/synapse/issues/13474))
- Use literals in place of `HTTPStatus` constants in tests. ([\#13479](https://github.com/matrix-org/synapse/issues/13479), [\#13488](https://github.com/matrix-org/synapse/issues/13488))
- Add comments about how event push actions are rotated. ([\#13485](https://github.com/matrix-org/synapse/issues/13485))
- Modify HTML template content to better support mobile devices' screen sizes. ([\#13493](https://github.com/matrix-org/synapse/issues/13493))
- Add a linter script which will reject non-strict types in Pydantic models. ([\#13502](https://github.com/matrix-org/synapse/issues/13502))
- Reduce the number of tests using legacy TCP replication. ([\#13543](https://github.com/matrix-org/synapse/issues/13543))
- Allow specifying additional request fields when using the `HomeServerTestCase.login` helper method. ([\#13549](https://github.com/matrix-org/synapse/issues/13549))
- Make `HomeServerTestCase` load any configured homeserver modules automatically. ([\#13558](https://github.com/matrix-org/synapse/issues/13558))

Unverified

ca3d19b0

Speed up fetching large numbers of push rules (#13592) · aec87a0f
Erik Johnston authored 2 years ago

Unverified

aec87a0f
Remove manually-added changelog · ea85a2bf
David Robertson authored 2 years ago

View commits for tag v1.66.0rc1 v1.66.0rc1 Unverified

ea85a2bf
Drop support for delegating email validation, round 2 (#13596) · 956e0154
David Robertson authored 2 years ago

Unverified

956e0154
Cache user IDs instead of profile objects (#13573) · 5e7847dc
Nick Mills-Barrett authored 2 years ago
```
The profile objects are never used and increase cache size significantly.
```
Unverified

5e7847dc
Update changelog · 79281f51
David Robertson authored 2 years ago

Unverified

79281f51
Adjust changelog · f8b9abdc
David Robertson authored 2 years ago

Unverified

f8b9abdc
Describe changes to admin API in 1.66 · d6f56997
David Robertson authored 2 years ago
```
Cross-ref #13525
```
Unverified

d6f56997
1.66.0rc1 · f0b23927
David Robertson authored 2 years ago

Unverified

f0b23927
Fix that sending server notices fail if avatar is `None` (#13566) · 37f329c9
Dirk Klimpel authored 2 years ago
```
Indroduced in #11846.
```
Unverified

37f329c9