Failure to remove temp role due to falsely detecting role is not present #229

Closed
opened 2026-02-05 16:23:35 -05:00 by sirdog3355 · 2 comments
sirdog3355 commented 2026-02-05 16:23:35 -05:00 (Migrated from github.com)

Problem

Fleff on Feb 4 at 1:41pm gave himself the @L role using the /trole function. This role was never removed or adjusted by him.

Today, when the role was scheduled for removal, EnduraBot reports the following:

[2026-02-05 18:41:43]:INFO:endurabot.tasks.temp_role_monitor: fleff (137356883664175104) was given [@L] temporarily and the duration has ended. Role detected to have been removed early. Removed status from database.

A review of the server audit logs shows that Fleff did indeed have the role at the time; it was never manipulated. This suggests some kind of bug with the logic at tasks/temp_role_monitor.py.

Possible cause

Now, the server also had this occur without prompting, with large error stacks that I deleted for log brevity:

[2026-02-05 05:02:40]:ERROR:discord.client: Attempting a reconnect in 1.85s
[2026-02-05 05:03:44]:ERROR:discord.client: Attempting a reconnect in 3.46s
[2026-02-05 05:03:48]:ERROR:discord.client: Attempting a reconnect in 1.11s
[2026-02-05 05:04:49]:ERROR:discord.client: Attempting a reconnect in 4.99s
[2026-02-05 05:05:54]:ERROR:discord.client: Attempting a reconnect in 26.96s
[2026-02-05 05:06:29]:INFO:endurabot: Synced 15 commands.

It is possible that the bot state got all messed up because of it. At time of writing 2 users were given temporary roles which are scheduled for removal tomorrow. I'm gonna reboot EnduraBot to give it a kick in the pants.

If those roles are removed correctly after the reboot I'm gonna consider this issue resolved and chock it up to a reboot being needed after those errors. If not, I'll have to do more investigation.

## Problem Fleff on Feb 4 at 1:41pm gave himself the `@L` role using the `/trole` function. This role was never removed or adjusted by him. Today, when the role was scheduled for removal, EnduraBot reports the following: ``` [2026-02-05 18:41:43]:INFO:endurabot.tasks.temp_role_monitor: fleff (137356883664175104) was given [@L] temporarily and the duration has ended. Role detected to have been removed early. Removed status from database. ``` A review of the server audit logs shows that Fleff did indeed have the role at the time; it was never manipulated. This suggests some kind of bug with the logic at `tasks/temp_role_monitor.py`. ## Possible cause Now, the server also had this occur without prompting, with large error stacks that I deleted for log brevity: ``` [2026-02-05 05:02:40]:ERROR:discord.client: Attempting a reconnect in 1.85s [2026-02-05 05:03:44]:ERROR:discord.client: Attempting a reconnect in 3.46s [2026-02-05 05:03:48]:ERROR:discord.client: Attempting a reconnect in 1.11s [2026-02-05 05:04:49]:ERROR:discord.client: Attempting a reconnect in 4.99s [2026-02-05 05:05:54]:ERROR:discord.client: Attempting a reconnect in 26.96s [2026-02-05 05:06:29]:INFO:endurabot: Synced 15 commands. ``` It is possible that the bot state got all messed up because of it. At time of writing 2 users were given temporary roles which are scheduled for removal tomorrow. I'm gonna reboot EnduraBot to give it a kick in the pants. If those roles are removed correctly after the reboot I'm gonna consider this issue resolved and chock it up to a reboot being needed after those errors. If not, I'll have to do more investigation.
sirdog3355 commented 2026-02-06 06:36:47 -05:00 (Migrated from github.com)

The issue occurred again this morning. I've restarted the bot again. That will likely cause the roles to be removed correctly. Reviewing the logs more in-depth seems to indicate the issue is network related. I have come across some solutions related to the Docker image configuration that seem like they could pan out.

I am going to manually adjust the Docker related files in production and give it 48 hours to see if the issue is successfully mitigated. If so, I will make a PR to put them into the code-base.

The issue occurred again this morning. I've restarted the bot again. That will likely cause the roles to be removed correctly. Reviewing the logs more in-depth seems to indicate the issue is network related. I have come across some solutions related to the Docker image configuration that seem like they could pan out. I am going to manually adjust the Docker related files in production and give it 48 hours to see if the issue is successfully mitigated. If so, I will make a PR to put them into the code-base.
sirdog3355 commented 2026-02-07 19:50:19 -05:00 (Migrated from github.com)

Well, this is solidly a networking issue and isn't related to EnduraBot. 24 hours have passed without another instability event.

Well, this is solidly a networking issue and isn't related to EnduraBot. 24 hours have passed without another instability event.
Sign in to join this conversation.
No description provided.