Skip to content
Snippets Groups Projects
Unverified Commit 74aa4782 authored by Devon Hudson's avatar Devon Hudson Committed by GitHub
Browse files

Add log message when worker lock timeouts get large (#18124)

This is to help track down a possible, but very rare, worker deadlock
that was seen on matrix.org.
In theory, you could work back from an instance of these new logs to the
approximate time when the lock was obtained and focus the diagnostic
efforts there.

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
parent 816054b0
No related branches found
No related tags found
No related merge requests found
Add log message when worker lock timeouts get large.
...@@ -19,6 +19,7 @@ ...@@ -19,6 +19,7 @@
# #
# #
import logging
import random import random
from types import TracebackType from types import TracebackType
from typing import ( from typing import (
...@@ -269,6 +270,10 @@ class WaitingLock: ...@@ -269,6 +270,10 @@ class WaitingLock:
def _get_next_retry_interval(self) -> float: def _get_next_retry_interval(self) -> float:
next = self._retry_interval next = self._retry_interval
self._retry_interval = max(5, next * 2) self._retry_interval = max(5, next * 2)
if self._retry_interval > 5 * 2 ^ 7: # ~10 minutes
logging.warning(
f"Lock timeout is getting excessive: {self._retry_interval}s. There may be a deadlock."
)
return next * random.uniform(0.9, 1.1) return next * random.uniform(0.9, 1.1)
...@@ -344,4 +349,8 @@ class WaitingMultiLock: ...@@ -344,4 +349,8 @@ class WaitingMultiLock:
def _get_next_retry_interval(self) -> float: def _get_next_retry_interval(self) -> float:
next = self._retry_interval next = self._retry_interval
self._retry_interval = max(5, next * 2) self._retry_interval = max(5, next * 2)
if self._retry_interval > 5 * 2 ^ 7: # ~10 minutes
logging.warning(
f"Lock timeout is getting excessive: {self._retry_interval}s. There may be a deadlock."
)
return next * random.uniform(0.9, 1.1) return next * random.uniform(0.9, 1.1)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment