Deploying Draupnir on my Matrix Test Rig¶
This week I'm setting up Draupnir on my matrix test rig, in order to become familiar with Draupnir deployment before I integrate it with PubHubs.
Very glad to be able to use matrix-docker-ansible-deploy's Draupnir setup to automate the majority of the Draupnir deployment.
I also want to automate, with Ansible, as much as possible of the set-up that is required before running that playbook. I aim to document here what I have done and open questions about it. The numbered steps here correspond to the manual instructions in that documentation linked above.
1. Register the bot account¶
I register a matrix account for the bot, using a little Ansible role I wrote, calling it like this in my Ansible playbook (personal one, specific to this installation).
- include_role: name=matrix-synapse-register-user
vars:
matrix_synapse_user:
name: "{{ matrix_draupnir_bot_username }}"
pw: "{{ matrix_draupnir_bot_password }}"
admin: "{{ matrix_draupnir_bot_admin }}"
defining those vars in my inventory file inventories/prod/host_vars/<host>/matrix-draupnir.yml
:
matrix_draupnir_bot_username: "bot.draupnir"
matrix_draupnir_bot_password: !vault [... I use vault encoded values here ...]
matrix_draupnir_bot_admin: false
I published my matrix-synapse-register-user
role. I thought I should, if there isn't already a more widely available alternative. It looks like this, in tasks/main.yml
:
---
- name: "register a matrix synapse user"
command:
argv: "{{ (cmd_check if ansible_check_mode else cmd_real) + args }}"
vars:
cmd_real:
- /matrix/synapse/bin/register-user
cmd_check:
- echo
- "Would run: /matrix/synapse/bin/register-user"
args:
- "{{ matrix_synapse_user.name }}"
- "{{ matrix_synapse_user.pw }}"
- "{{ '1' if matrix_synapse_user.admin|default(false) else '0' }}"
check_mode: false
register: result
changed_when: result.rc == 0
failed_when: result.rc != 0 and 'User ID already taken.' not in result.stdout
# if already registered, errors with (on stdout):
# > Sending registration request...
# > ERROR! Received 400 Bad Request
# > User ID already taken.
2. Get an access token¶
Manually get an access token. OK I know how to do that. First question, though: as that's a speed bump in the road, and seems fragile, has anyone shown interest in making either Draupnir itself or the playbook do a login automatically? Can't see any issue filed about "token" or "login".
Basically these days I know I'm going to have to repeat my steps on a different rig later, tear things down and build them up again, so I always look to automate the whole procedure. (Aware it's sometimes more efficient to go manually at first and automate only when worth the effort. And aware I'm at risk of volunteering myself to contribute.)
Well, I asked about this. The kind folks let me know their thoughts. There are several issues.
When I said "fragile" I meant I wouldn't expect an access token to remain valid forever. At the time this was designed, access tokens were considered permanent until explicitly revoked ("logged out"). Putting an access token into a bot's config was taken for granted, and perhaps still is. However, nowadays the client authentication spec is more complex and tokens may need refreshing. That seems to me to suggest that's no longer good practice. It can still work if we take care that the token is not invalidated.
In one sense using an access token is considered more secure than knowing an account's password. That's the sense in which many systems allow getting an API access token and giving it to an external service that will call the API. However, I think this argument applies to accessing other accounts, not for the bot's own account.
Also, if Draupnir connects through Pantalaimon to get E2EE (as makes sense when it's being run by someone other than the server operator), then Pantalaimon needs the account password to create an E2EE device.
Currently I'm feeling the bot should know its password and we should automate its login. As a lesser step, I will do this in Ansible.
- include_role: name=matrix-login-password
vars:
matrix_login:
hs_cs_api: "{{ matrix_draupnir_hs_cs_api }}"
user: "{{ matrix_draupnir_bot_user_id }}"
password: "{{ matrix_draupnir_bot_password }}"
# output: matrix_login_result
- set_fact:
matrix_draupnir_bot_access_token: "{{ matrix_login_result.access_token }}"
with additional inventory vars:
matrix_draupnir_hs_cs_api: "https://matrix.example.net"
matrix_draupnir_bot_user_id: "@bot.draupnir:example.net"
My matrix-login-password
role has this in its tasks/main.yml
:
- name: "log in to matrix"
uri:
method: POST
url: "{{ matrix_login.hs_cs_api }}/_matrix/client/r0/login"
body:
type: "m.login.password"
identifier:
type: "m.id.user"
user: "{{ matrix_login.user }}"
password: "{{ matrix_login.password }}"
body_format: json
register: _result
changed_when: _result.json.access_token
- set_fact:
matrix_login_result: "{{ _result.json }}"
when: not ansible_check_mode
# matrix_login_result contains at least: access_token, device_id, user_id
# see e.g.: https://spec.matrix.org/v1.8/client-server-api/#login and older versions
[Update: now published too.] (I hesitated because I thought I saw about a year ago someone had already published a set of ansible roles for matrix admin tasks like this. Maybe. Can't find it now.)
3. Make sure the account is free from rate limiting¶
Same for this 'override_ratelimit' step of course. Would be nice to automate.
One way to change this permission is through some other admin account. In that case, "know an access token" is an appropriate way to use that other admin account.
In the case where the bot account itself is configured to be a (matrix) server admin account, then in Synapse's case at least it would already have sufficient permission to use Synapse's admin API to override its own rate limit.
Now about admin APIs. Unfortunately matrix admin APIs are not standardised. Synapse has its admin API, Dendrite has another, and Conduit I gather doesn't have a REST admin API. On Dendrite, "the username has to be specified in dendrite.yaml to disable rate limiting, and personally i hate when anything other than the sysadmins write to configs" said bones_was_here
. The playbook, if instructed to install Dendrite, controls that config file, but Draupnir would (or should) not be able to do that by itself unlike with Synapse.
So, there are lots of cases. Different homeservers, and the playbook being used to install Draupnir with or without also its homeserver, and choice of whether the playbook or Draupnir itself performs this configuration.
Currently I'm think I will automate it for the Synapse admin set-ratelimit API only, as that's the server we use in PubHubs.
- name: "disable rate limiting for Draupnir bot"
uri:
method: POST
url: "{{ matrix_draupnir_hs_cs_api }}/_synapse/admin/v1/users/{{ matrix_draupnir_bot_user_id }}/override_ratelimit"
headers:
# access token must be for a user with synapse admin access;
# can be the bot's if it is an admin, else of another account.
Authorization: "Bearer {{ matrix_draupnir_bot_access_token if matrix_draupnir_bot_admin else matrix_draupnir_some_synapse_admin_account_access_token }}"
body_format: json
body:
messages_per_second: 0
burst_count: 0
4. Create a management room¶
I'm on a roll now. I'll just check the matrix spec for room creation, take a guess at which parameters make most sense for my case, and write it out in Ansible language.
- name: "create a management room for Draupnir bot"
uri:
method: POST
url: "{{ matrix_draupnir_hs_cs_api }}/_matrix/client/v3/createRoom"
headers:
Authorization: "Bearer {{ matrix_draupnir_bot_access_token }}"
body:
name: "{{ matrix_draupnir_management_room_name }}"
creation_content:
m.federate: false
visibility: private
preset: trusted_private_chat
invite: "{{ matrix_draupnir_operator_user_ids }}"
body_format: json
register: _result
when: matrix_draupnir_management_room_id is undefined
# output: _result.json.room_id
- set_fact:
matrix_draupnir_management_room_id: "{{ _result.json.room_id }}"
when: matrix_draupnir_management_room_id is undefined
with inventory vars:
# the management room (bot and its operators); create if room_id undefined
matrix_draupnir_management_room_name: "Draupnir management"
matrix_draupnir_management_room_id: '!xxxxxxxxxxxx:example.net'
Starting Up¶
After running my playbook with the above set-up, and pasting the resulting access token and room id into the corresponding inventory vars (TODO: join the two parts together in a better way than cut-n-paste), here we go with matrix-docker-ansible-deploy:
ansible-playbook .../matrix-docker-ansible-deploy/setup.yml -l example.net --tags=setup-bot-draupnir,start -Dv
In the logs, journalctl -n100 -fu matrix-bot-draupnir.service
:
Starting Matrix Draupnir bot...
Started Matrix Draupnir bot.
[INFO] [index] Starting bot...
[INFO] [index] Resolving management room...
[INFO] [index] Mjolnir is starting up. Use !mjolnir to query status.
[INFO] [ProtectedRoomsConfig] Resolving protected rooms...
[WARN] [ProtectedRoomsConfig] Couldn't find any explicitly protected rooms from Mjolnir's account data, assuming first start. MatrixError: Error during MatrixClient request GET /_matrix/client/v3/user/%40bot.draupnir%3Aexample.net/account_data/org.matrix.mjolnir.protected_rooms: 404 Not Found -- {"errcode":"M_NOT_FOUND","error":"Account data not found"}
[... three of these 'Account data not found' errors ...]
[INFO] [Mjolnir@startup] Checking permissions...
[INFO] [Mjolnir@startup] Syncing lists...
[INFO] [Mjolnir@startup] Startup complete. Now monitoring rooms.
And then I found and joined the management room. I used a Hydrogen web client.
_@bot.draupnir:example.net joined the room_
_bot.draupnir named the room "Draupnir management"_
_admin1 was invited to the room by bot.draupnir_
**bot.draupnir:**
Mjolnir is starting up. Use !mjolnir to query status.
Checking permissions...
All permissions look OK.
Syncing lists...
Done updating rooms - no errors
Startup complete. Now monitoring rooms.
_admin1 joined the room_
It's alive! Perhaps a little confused about its new name. Responds to both !mjolnir
and !draupnir
, either way replying:
Old Commands:
`!mjolnir - Print status information`
`!mjolnir status - Print status information`
[...]
mjolnir commands:`ban <entity> <list> [...reason]` - Bans an entity from the policy list.Parameters:
entity - no description
list - no description
[...]
Well, there we are. The bot's alive. Next it's time for me to learn its commands and put it to work.
Update: All these Ansible roles are now published. TODO: contribute some of this to matrix-docker-ansible-deploy? TODO: re-implement these roles as Ansible modules instead, using a matrix python API such as synadm or mautrix-python?