Skip to content
Snippets Groups Projects
Commit e1170d4e authored by Matthew Hodgson's avatar Matthew Hodgson
Browse files

move matrix-generic content to new matrix-doc git project

parent 81b956c7
No related branches found
No related tags found
No related merge requests found
All matrix-generic documentation now lives in its own project at
github.com/matrix-org/matrix-doc.git
Only Synapse implementation-specific documentation lives here now
(together with some older stuff will be shortly migrated over to matrix-doc)
This diff is collapsed.
Definitions
===========
# *Event* -- A JSON object that represents a piece of information to be
distributed to the the room. The object includes a payload and metadata,
including a `type` used to indicate what the payload is for and how to process
them. It also includes one or more references to previous events.
# *Event graph* -- Events and their references to previous events form a
directed acyclic graph. All events must be a descendant of the first event in a
room, except for a few special circumstances.
# *State event* -- A state event is an event that has a non-null string valued
`state_key` field. It may also include a `prev_state` key referencing exactly
one state event with the same type and state key, in the same event graph.
# *State tree* -- A state tree is a tree formed by a collection of state events
that have the same type and state key (all in the same event graph.
# *State resolution algorithm* -- An algorithm that takes a state tree as input
and selects a single leaf node.
# *Current state event* -- The leaf node of a given state tree that has been
selected by the state resolution algorithm.
# *Room state* / *state dictionary* / *current state* -- A mapping of the pair
(event type, state key) to the current state event for that pair.
# *Room* -- An event graph and its associated state dictionary. An event is in
the room if it is part of the event graph.
# *Topological ordering* -- The partial ordering that can be extracted from the
event graph due to it being a DAG.
(The state definitions are purposely slightly ill-defined, since if we allow
deleting events we might end up with multiple state trees for a given event
type and state key pair.)
Federation specific
-------------------
# *(Persistent data unit) PDU* -- An encoding of an event for distribution of
the server to server protocol.
# *(Ephemeral data unit) EDU* -- A piece of information that is sent between
servers and doesn't encode an event.
Client specific
---------------
# *Child events* -- Events that reference a single event in the same room
independently of the event graph.
# *Collapsed events* -- Events that have all child events that reference it
included in the JSON object.
This document outlines the format for human-readable IDs within matrix.
Overview
--------
UTF-8 is quickly becoming the standard character encoding set on the web. As
such, Matrix requires that all strings MUST be encoded as UTF-8. However,
using Unicode as the character set for human-readable IDs is troublesome. There
are many different characters which appear identical to each other, but would
identify different users. In addition, there are non-printable characters which
cannot be rendered by the end-user. This opens up a security vulnerability with
phishing/spoofing of IDs, commonly known as a homograph attack.
Web browers encountered this problem when International Domain Names were
introduced. A variety of checks were put in place in order to protect users. If
an address failed the check, the raw punycode would be displayed to disambiguate
the address. Similar checks are performed by home servers in Matrix. However,
Matrix does not use punycode representations, and so does not show raw punycode
on a failed check. Instead, home servers must outright reject these misleading
IDs.
Types of human-readable IDs
---------------------------
There are two main human-readable IDs in question:
- Room aliases
- User IDs
Room aliases look like ``#localpart:domain``. These aliases point to opaque
non human-readable room IDs. These pointers can change, so there is already an
issue present with the same ID pointing to a different destination at a later
date.
User IDs look like ``@localpart:domain``. These represent actual end-users, and
unlike room aliases, there is no layer of indirection. This presents a much
greater concern with homograph attacks.
Checks
------
- Similar to web browsers.
- blacklisted chars (e.g. non-printable characters)
- mix of language sets from 'preferred' language not allowed.
- Language sets from CLDR dataset.
- Treated in segments (localpart, domain)
- Additional restrictions for ease of processing IDs.
- Room alias localparts MUST NOT have ``#`` or ``:``.
- User ID localparts MUST NOT have ``@`` or ``:``.
Rejecting
---------
- Home servers MUST reject room aliases which do not pass the check, both on
GETs and PUTs.
- Home servers MUST reject user ID localparts which do not pass the check, both
on creation and on events.
- Any home server whose domain does not pass this check, MUST use their punycode
domain name instead of the IDN, to prevent other home servers rejecting you.
- Error code is ``M_FAILED_HUMAN_ID_CHECK``. (generic enough for both failing
due to homograph attacks, and failing due to including ``:`` s, etc)
- Error message MAY go into further information about which characters were
rejected and why.
- Error message SHOULD contain a ``failed_keys`` key which contains an array
of strings which represent the keys which failed the check e.g::
failed_keys: [ user_id, room_alias ]
Other considerations
--------------------
- Basic security: Informational key on the event attached by HS to say "unsafe
ID". Problem: clients can just ignore it, and since it will appear only very
rarely, easy to forget when implementing clients.
- Moderate security: Requires client handshake. Forces clients to implement
a check, else they cannot communicate with the misleading ID. However, this is
extra overhead in both client implementations and round-trips.
- High security: Outright rejection of the ID at the point of creation /
receiving event. Point of creation rejection is preferable to avoid the ID
entering the system in the first place. However, malicious HSes can just allow
the ID. Hence, other home servers must reject them if they see them in events.
Client never sees the problem ID, provided the HS is correctly implemented.
- High security decided; client doesn't need to worry about it, no additional
protocol complexity aside from rejection of an event.
\ No newline at end of file
===================
Documentation Style
===================
A brief single sentence to describe what this file contains; in this case a
description of the style to write documentation in.
Sections
========
Each section should be separated from the others by two blank lines. Headings
should be underlined using a row of equals signs (===). Paragraphs should be
separated by a single blank line, and wrap to no further than 80 columns.
[[TODO(username): if you want to leave some unanswered questions, notes for
further consideration, or other kinds of comment, use a TODO section. Make sure
to notate it with your name so we know who to ask about it!]]
Subsections
-----------
If required, subsections can use a row of dashes to underline their header. A
single blank line between subsections of a single section.
Bullet Lists
============
* Bullet lists can use asterisks with a single space either side.
* Another blank line between list elements.
Definition Lists
================
Terms:
Start in the first column, ending with a colon
Definitions:
Take a two space indent, following immediately from the term without a blank
line before it, but having a blank line afterwards.
In C-S API > Registration/Login:
Captcha-based
~~~~~~~~~~~~~
:Type:
``m.login.recaptcha``
:Description:
Login is supported by responding to a captcha, in this case Google's
Recaptcha.
To respond to this type, reply with::
{
"type": "m.login.recaptcha",
"challenge": "<challenge token>",
"response": "<user-entered text>"
}
The Recaptcha parameters can be obtained in Javascript by calling::
Recaptcha.get_challenge();
Recaptcha.get_response();
The home server MUST respond with either new credentials, the next stage of the
login process, or a standard error response.
In Events:
Common event fields
-------------------
All events MUST have the following fields:
``event_id``
Type:
String.
Description:
Represents the globally unique ID for this event.
``type``
Type:
String.
Description:
Contains the event type, e.g. ``m.room.message``
``content``
Type:
JSON Object.
Description:
Contains the content of the event. When interacting with the REST API, this is the HTTP body.
``room_id``
Type:
String.
Description:
Contains the ID of the room associated with this event.
``user_id``
Type:
String.
Description:
Contains the fully-qualified ID of the user who *sent* this event.
State events have the additional fields:
``state_key``
Type:
String.
Description:
Contains the state key for this state event. If there is no state key for this state event, this
will be an empty string. The presence of ``state_key`` makes this event a state event.
``required_power_level``
Type:
Integer.
Description:
Contains the minimum power level a user must have before they can update this event.
``prev_content``
Type:
JSON Object.
Description:
Optional. Contains the previous ``content`` for this event. If there is no previous content, this
key will be missing.
.. TODO-spec
How do "age" and "ts" fit in to all this? Which do we expose?
Matrix Specification NOTHAVEs
=============================
This document contains sections of the main specification that have been
temporarily removed, because they specify intentions or aspirations that have
in no way yet been implemented. Rather than outright-deleting them, they have
been moved here so as to stand as an initial version for such time as they
become extant.
Presence
========
Idle Time
---------
As well as the basic ``presence`` field, the presence information can also show
a sense of an "idle timer". This should be maintained individually by the
user's clients, and the home server can take the highest reported time as that
to report. When a user is offline, the home server can still report when the
user was last seen online.
Device Type
-----------
Client devices that may limit the user experience somewhat (such as "mobile"
devices with limited ability to type on a real keyboard or read large amounts of
text) should report this to the home server, as this is also useful information
to report as "presence" if the user cannot be expected to provide a good typed
response to messages.
This diff is collapsed.
State Resolution
================
This section describes why we need state resolution and how it works.
Motivation
-----------
We want to be able to associate some shared state with rooms, e.g. a room name
or members list. This is done by having a current state dictionary that maps
from the pair event type and state key to an event.
However, since the servers involved in the room are distributed we need to be
able to handle the case when two (or more) servers try and update the state at
the same time. This is done via the state resolution algorithm.
State Tree
------------
State events contain a reference to the state it is trying to replace. These
relations form a tree where the current state is one of the leaf nodes.
Note that state events are events, and so are part of the PDU graph. Thus we
can be sure that (modulo the internet being particularly broken) we will see
all state events eventually.
Algorithm requirements
----------------------
We want the algorithm to have the following properties:
- Since we aren't guaranteed what order we receive state events in, except that
we see parents before children, the state resolution algorithm must not depend
on the order and must always come to the same result.
- If we receive a state event whose parent is the current state, then the
algorithm will select it.
- The algorithm does not depend on internal state, ensuring all servers should
come to the same decision.
These three properties mean it is enough to keep track of the current state and
compare it with any new proposed state, rather than having to keep track of all
the leafs of the tree and recomputing across the entire state tree.
Current Implementation
----------------------
The current implementation works as follows: Upon receipt of a newly proposed
state change we first find the common ancestor. Then we take the maximum
across each branch of the users' power levels, if one is higher then it is
selected as the current state. Otherwise, we check if one chain is longer than
the other, if so we choose that one. If that also fails, then we concatenate
all the pdu ids and take a SHA1 hash and compare them to select a common
ancestor.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment