This is the multi-page printable view of this section. Click here to print.
A Minute from the Moderators
Hello Hachydermia! It’s time for, you guessed it, the monthly Moderator Minute! Recently, our founder and former admin stepped down. As sad as we are to see her go, this does provide an excellent segue into the topics that we wanted to cover in this month’s Moderator Minute!
- The big question: will Hachyderm be staying online?
- Moderation on a large instance
- Harm prevention and mitigation on large instance
The big question: will Hachyderm be staying online?
At Hachyderm we have been scaling our Moderation and Infrastructure teams since the Twitter Migration started to land in November 2022. Everyone on both teams is a volunteer, so we intentionally oversized our teams to accommodate high fluctuations in availability. Each team has a team lead and 4 to 10 active members at any given time, meaning Hachyderm is a ~20 person org. What this means for the moderation team specifically is the topic of today’s blog post!
Moderation on a large instance
Large instances like ours are mostly powered by humans and processes. And computers, of course. But mostly humans.
And for a large instance, there needs to be quite a few humans. In our case, most of our current mods volunteered as part of our Call for Volunteers back in December 2022.
How moderators are selected
Moderators must first and foremost be aligned with the ethos of our server. That means they must agree with our stances on no racism, no white supremacy, no homophobia, no transphobia, etc. Beyond that baseline, moderators are also chosen for:
- Their lived experiences and demographics
- Their experience with community moderation
For demographics: it is important to ensure that a wide variety of lived experiences are representative on the moderation team. These collective experiences and voices allow us to discuss, build, and enforce the policies that govern our server. Ensuring that there are multiple backgrounds, including race, gender, orientation, country of origin, language, and so on helps to ensure that multiple perspectives contribute.
For prior experience: there was also an intentional mix of experience levels on the moderation team. Ensuring that there are experienced mods also means that we have the bandwidth to onboard new, less experienced moderators. Being able to lay the groundwork for mentorship and onboarding is crucial for a self-sustaining organization.
Moderation onboarding and continuous improvement
Before moderators can begin acting in their full capacity, they must:
- Agree to the Moderator Covenant
- Be trained on our policies
- This includes the server rules, account policies, etc.
- Practice on inbound tickets
- Practice tickets have feedback from the Head Moderator and the group
The first two points go hand-in-hand. Our server rules outline the allowed and disallowed actions on our server. Our Moderator Covenant governs how we interpret and enforce those rules.
When practicing, new moderators are expected to write their analysis of the ticket including how they understand the situation, what action(s) they would or would not take in the given scenario, as well as why they are making that recommendation. When training the first group of moderators, this portion of the process was intended to last a week, but it worked out so well that we have kept the process. This means it is common for multiple moderators to see an individual ticket and asynchronously discuss prior to taking an action. This informal, consensus-style review has led to our team being able to continue to learn from each individual’s experiences and expertise.
When moderators make a mistake
We do our best, but are human and are thus prone to mistakes. Whenever something like this happens, we:
- May follow up with the impacted user or users
- Will review the policy that led to the error
When we make mistakes, we will always do our best by the user(s) directly impacted. This means that we will take ownership for our mistakes, apologize to the impacted user(s) if doing so will not cause further harm, and also review any relevant policies to ensure it doesn’t happen again.
Harm prevention and mitigation on large instance
How we handle moderation reports is driven by the enforced stance that moderation reports are harm already done. (We mentioned this stance in our recent postmortem as well.) Essentially, this means that if someone has filed a report because harm has been done, then that harm has been done.
How we determine what to do next depends on several factors, including the scope and severity of what has been done. It also depends on the source of the harm (local vs remote).
Using research as a tool for harm prevention
To say it first: in cases of egregious harm that originates on our server, the user is suspended from our server.
Reports of abuse originating from one of our own are exceedingly rare. More commonly, the reports of egregious harm come from remote sources. The worst cases are what you’d likely expect and are easy to suspend federation with.
The Hachyderm Moderation team also does a lot of proactive research regarding the origins of abusive behavior. The goal of our research is to minimize, and perhaps one day fully prevent, these instances’ ability to interact with our instance either directly or indirectly. To achieve this, we research not only the instances that are the sources of abusive behavior and content, but also those actively federating with them. To be clear: active here means active participation. We take this research very seriously and are doing it continuously.
Nurturing safe spaces by requiring active participation in moderation
The previous section focused on how we handle the worst offenders. What about everyone else? When someone Well Actuallys or doesn’t listen, or doesn’t respond properly when a boundary has been specified? (For information about setting and maintaining boundaries, please see our Mental Health doc.)
In situations where the situation being reported is not an egregious source of harm, the Hachyderm Moderation team makes heavy use of Mastodon’s freeze feature. The way we use it is to send the user a message that details what they were reported for, including their posts as needed, and use the freeze to tell them what we need from them to restore normal activity on their account. To prevent moderation issues from going on for long periods of time, users must respond in a given time frame and then perform the required action(s) in a given time frame.
What actions we require are situationally dependent. Most commonly, we request that users delete their own posts. We do this because we want to nurture a community where individuals are aware of, and accountable for, their actions. When moderators simply delete posts, the person who made the post is not required to even give the situation a second thought.
Occasionally, we may nudge the reported user a little further. In these cases, we include some introductory information in our message and also request that the person do a brief search on the topic as a condition of reinstating the account. The request usually looks a bit like this:
We ask that you do a light search on these topics. Only 5-15 min is fine.
The goal of this request is that we can all help make our community a safer place just by taking small steps to increase our awareness of others in our shared space.
If you agree to the above by filing an appeal, we will unfreeze your account. If you have further questions, please reach out to us at email@example.com .
– The Hachyderm Mods
The reason we ask for such a brief search is because we do not expect someone to become an expert overnight. We do expect that all of us take small steps together to learn and grow.
As a point of clarification: we only engage in the freeze and restore pattern if the reported situation does not warrant a more immediate and severe action, such as suspending the user from Hachyderm.
And that’s it for this month’s Moderator Minute! Please feel free to ask the moderation team any questions about the above, either using Hachyderm’s Hachyderm account, email, or our Community Issues. We’ll see you next month! ❤️
Stepping Down From Hachyderm
Stepping Down From Hachyderm
Recently I abruptly removed myself from the “Admin” position of Hachyderm. This has surfaced a number of threads about me, Hachyderm, and the broader Fediverse. Today I would like to offer an apology as well as provide some clarity. There are a lot of rumors going around and I want to address some of them.
Before I get into why I am stepping down as admin, I want to be crystal clear: Hachyderm allows mutual aid. Hachyderm has always allowed mutual aid.
There has never been a point in Hachyderm’s history when mutual aid was not allowed. What we have never allowed on Hachyderm, is spam. Which we see a lot of, including phishing. What we have changed positions on, is corporate and organizational fundraising.
For us to say “Hachyderm does allow mutual aid,” and “Hachyderm does not allow spam or organizational fundraising,” requires the mods to define “Mutual Aid,” “Spam,” and “Organizational fundraising.” Once defined, Hachyderm need to come up with policies for these concepts, and instruct moderators to manage them. We do not always get this right, and we rely on the community’s help to tell us when we get it wrong, and how we can be more precise in our language and our actions.
One of the things I’ve found difficult about engaging in moderation discussions on the Fediverse, is the inability to agree on the facts at hand. I’ve always been very willing to take accountability for my actions, and take feedback on how to improve and make things better for all of our users. But that’s not possible to do when we can’t even agree on the topic at hand, and when we don’t engage with each other. In this most recent incident, I’ve had to spend significantly more time re-stating that Hachyderm does support mutual aid, and less time focusing on improving our policies and communications, so that our anti-spam and anti-organizational fundraising policies don’t harm members of our community that we want to support.
I would be very happy to have conversations about how Hachyderm’s policy, language, and enforcement were wrong, caused harm, and need to be fixed. That is a conversation worth having. I was also happy to have conversations around our stance on organizational fundraising. But the conversations about how I personally “contribute to trans genocide because I don’t support mutual aid,” are untrue, hurtful, and in my opinion, unproductive. Yes, this is extra hurtful to me, as I have experienced homelessness, and my own dependency on mutual aid as a transgender person. This is dear to my heart.
In many cases, decisions by other admins and mods were made on the assumption that I don’t support mutual aid, and this has been hard for me to reconcile.
I want to address something else, as my departure has some folks in our Hachyderm community and the broader Fediverse concerned about who will lead moderation after I leave. Moderation at Hachyderm will continue to be led by our lead moderator, as it has always been. I have never been the moderator of Hachyderm. In fact, on several occasions, I was moderated by the moderation team, including being asked to remove my post on Capitalism. This is as it should be, and I accepted the decisions of the moderation team on each occasion. In a healthy community, no one is above the rules.
Regarding the post on BlueSky, my hope was to push BlueSky towards an open identity provider such that the rest of the Fediverse could leverage the work. This would address the problem with people not owning their own identity/authentication, which is something that is important to many Hachydermians. People are already asking about this, and the AT protocol in general.
Effectively my intention was to utilize BlueSky’s hype and resources on behalf of federated identity such that people can own their own identity. Mastodon also has open issues about this. At this point I am exhausted and am abandoning BlueSky as well. I am taking a break from all social media at this time to protect my mental health.
The Hachyderm service has grown unexpectedly and I have tried my best to build a strong organization to live on without me. I have always intended on stepping down such that the collective could continue without me. Part of relinquishing control involved slowly stepping back one position at a time. It is up to the collective now to manage the service moving forward, and I deeply believe there are wonderful people in place to manage the service.
A Minute from the Moderators
Hello and welcome to April! This month we’ll be reviewing the account verification process we rolled out as well as two more classic moderation topics: how to file a report and what to do if you’re moderated.
- Account Verification
- How to file outstanding moderation reports
- Meter yourself when filing reports
- When you’ve been moderated
Throughout the month of March we started circulating an account verification process that launched. What does this mean, how do we use it, and what does it tell Hachydermians?
Mastodon account verification is like an identity service
Verification in the Mastodon context is similar to an ID verification service.
When you build your profile you have four fields that are labeled “profile
metadata”. When you include a URL that you have a
rel=me link to your
Mastodon profile on, then that URL highlights green with a corresponding
green checkmark. In that case, the URL is verified: confirming that the person
who has control of the account also has control of the domain.
Hachyderm verification makes verification visible on an account profile
Since some specialized accounts are restricted on Hachyderm, we decided to make it more immediately visible which accounts are approved or not. As part of these discussions, we also extended the verification process to even non-restricted specialized accounts.
In order to verify, specialized accounts use the process outlined on our Account Verification page which includes agreeing to the Specialized Account Expectations and using our Community GitHub issues to submit the request. Once approved, we add their Hachyderm account to an approval page we created for this process. For an example of what the end result looks like, take a look at one of our first corporate accounts, Tailscale:
Specialized accounts should be verified
As a reminder, the only accounts we’re currently requiring to be verified are:
- Corporate accounts
- Bot accounts
- Curated accounts
That said, the account verification process is open to all specialized accounts. This includes but is not limited to: non-profits, conferences, meetups, working groups, and other “entity” based accounts.
Account verification is not open to individual users at this time. That said, if you are an independent contractor or similar type of individual / self-run business please read on.
We support small orgs, startups, self-run businesses, non-profits, etc.
Please email us at firstname.lastname@example.org if this applies to your account or an account you would like to create. This is the grey area for all accounts that due to size, model, or “newness” don’t fit cleanly into the account categories we’ve tried to create.
In particular, if you suspect you might fit our criteria for a corporate account but the pricing model would be a burden for you: please still reach out! We’re happy to help and try to figure something out.
How to file outstanding moderation reports
First of all: thank you to everyone for putting your trust in us and for sending reports our way. Reports on any given day or week can vary and include mixtures of spam, on and off server bad behavior, and so on. When you send reports our way, here are the main things to keep in mind so that your reports are effective.
Please see our Reporting and Communication doc, which details Hachyderm specific information, and our Report Feature doc, which shows what we see when we receive a report, for reference.
Always include a description with your own words
You should always include a description with your report. It can be as succinct as “spam” or more descriptive like “account is repeatedly following / unfollowing other users”. You should include a description even if the posts, when included, seem to speak for themselves. If you are reporting content in a language other than English, please supply translations for any dog whistles or other commentary that a translation site will likely miss in a word-for-word translation.
Mastodon also deletes posts from reports more than 30 days old. So in the event that we need to check on a user and/or domain that has been reported more than once, but infrequently, the added context can also help us capture information that is no longer present.
(Almost) Always include relevant posts
If you are reporting a user because of something they have posted, you should (almost) always include the posts themselves. When a post is reported, the post is saved in the report even if the user’s home instance deletes the posts. If the posts are not included, and the user and/or their instance mods delete the posts, then we have an empty report with no additional context.
Please feel free to use your best judgement when choosing to attach posts to a report or not. In the rare situation where you are reporting extreme content, especially with imagery, you can submit a report without posts but please ensure that you have included the context for what we can expect when we investigate the user and/or domain.
Be clear when you are forwarding a report (or not)
When you file reports for users that are off-server, you will have the option to forward the report to the user’s server admins. When a report is not forwarded, only the Hachyderm moderation team sees it. Reports forward to remote instance admins by default. If you are choosing not to forward a report for a remote user, please call it out in your comments. Although we can see when a report isn’t forwarded, the added visibility helps.
There will be times when a reported user’s infraction falls under the purview of their instance moderators and whatever server rules that user has agreed to and may be in violation of. Typically, we will only step in to moderate these situations when we need to de-federate with a remote user and/or instance completely.
Meter yourself when filing reports
We appreciate everyone who takes the time to send us a report so we can work towards keeping the Hachyderm community safe. Make sure when you are doing so that you are being mindful of your own mental health as well. As a moderation team, we are able to load balance the reports that come through to protect us individually from burnout or from seeing content that can strongly, negatively, impact us on a personal level.
Even in situations where there is yet another damaging news cycle, which in turn creates a lot of downstream effects, individuals should avoid taking on what it takes a team to tackle. In these situations, please balance the reports you send with taking steps to separate yourself from continued exposure to that content. For tips and suggestions about how to do this, please see our March Moderator Minute and our Mental Health doc.
When you’ve been moderated
Being moderated is stressful! We understand and do our best to intervene only when required to maintain community safety or when accounts need to be nudged to be in alignment with rules for their account type and/or server rules.
For additional information on the below, please see both our Reporting and Communication doc and our Moderation Actions and Appeals doc.
Take warnings to heart, but they do not require an appeal
Warnings are only used as a way to communicate with you using the admin tools. They are not accrued like a “strike” system, where something happens if you exceed a certain number. Since we only send warnings when an account needs a nudge, either a small rule clarification or similar, they do not need to be appealed. Appeals to warnings will typically receive either no action or a rejection for this reason.
Always include your email when appealing an account restriction
If your account has been restricted in some way, e.g. either frozen or suspended, then you will need to file an appeal to open a dialogue for us to reverse that decision. You should always include how we can email you in your appeal: the admin UI does not let us respond to appeals. We can only accept (repeal) or reject (keep) the decision.
Let us know if we’ve made a mistake
If we’ve made an error in moderating your account: apologies! We do our best, but mistakes can and will happen. If your account has been restricted, please file an appeal the same as in the above: by including the error and your email so we can follow up with you as needed. Once we have the information we need we can reverse the error.
A Minute from the Moderators
We’ve been working hard to build out more of the Community Documentation to help everyone to create a wonderful experience on Hachyderm. For the past month, we’ve focused most heavily on our new How to Hachyderm section. The docs in this section are:
When you are looking at these sections, please be aware that the docs under the How to Hachyderm section are for the socialized norms around each topic and the subset of those norms that we moderate. Documentation around how to implement the features are both under our Mastodon docs section and on the main Mastodon docs. This is particularly relevant to our Content Warning sections: How To Hachyderm Content Warnings is about how content warnings are used here and on the Fediverse, whereas Mastodon User Interface Content Warnings is about where in the post composition UI you click to create a content warning.
Preserving your mental health
In our new Mental Health doc, we focus on ways that you can use the Mastodon tools for constraining content and other information. We structured the doc to answer two specific questions:
- How can people be empowered to set and maintain their own boundaries in a public space (the Fediverse)?
- What are the ways that people can toggle the default “opt-in”?
By default, social media like Mastodon / the Fediverse, opts users in to all federating content. This includes posts, likes, and boosts. Depending on your needs, you may want to opt out of some subsets of that content either on a case-by-case basis, by topic, by source, or by type. Remember:
You can opt out of any content for any reason.
For example, you may want to opt out of displaying media by default because it is a frequent trigger. Perhaps the specific content warnings you need aren’t well socialized. Maybe you are sensitive to animated or moving media. That said, perhaps media isn’t a trigger - you just don’t like it. Regardless of your reason, you can change this setting (outlined in the doc) whenever you wish and however often as meets your needs.
Hashtags and Content Warnings
Our Hashtags and Content Warnings docs are to help Hachydermians better understand both what these features are and the social expectations around them. In both cases, there are some aspects of the feature that people have encountered before: hashtags in particular are very common in social media and content warnings mirror other features that obscure underlying text on sites like Reddit (depending on the subreddit) and tools like Discord.
Both of these features have nuance to how they’re used on the Fediverse that might be new for some. On the Fediverse, and on Hachyderm, there are “reserved hashtags”. These are hashtags that are intended only for a specific, narrow, use. The ones we moderate on Hachyderm are FediBlock, FediHire, and HachyBots. For more about this, please see the doc.
Content warnings are possibly less new in concept. The content warning doc focuses heavily on how to write an effective content warning. Effective content warnings are important as you are creating a situation for someone else to opt in to your content. This requires consent, specifically informed consent. A well written content warning should inform people of the difference between “spoilers”, “Doctor Who spoilers”, and “Doctor Who New Year’s Special Spoilers”. The art of crafting an effective content warning is balancing what information to include while also not making the content warning so transparent that the content warning is the post.
Notably, effective content warnings feature heavily in our Accessible Posting doc.
Our Accessible Posting doc is an introductory guide to different ways to improve inclusion. It is important to recognize there are two main constraints for this guide:
- It is an introductory guide
- The Mastodon tools
As an introductory guide, it does not cover all topics of accessibility. As a guide that focuses on Mastodon, the guide discusses the current Mastodon tools and how to fully utilize them.
As an introductory guide, our Accessibility doc primarily seeks to help users develop more situational awareness for why there are certain socialized patterns for hashtags, content warnings, and posting media. We, as moderators of Hachyderm, do not expect anyone to be an expert on any issue that the doc covers. Rather, we want to help inspire you to continue to learn about others unlike yourself and see ways that you can be an active participant in creating and maintaining a healthy, accessible, space on the Fediverse.
Content warnings feature heavily on this doc. The reason for this is Mastodon is a very visual platform, so the main ways that you are connecting with others who do not have the same experience of visual content is by supplying relevant information.
There will always be more to learn and more, and better, ways to build software. For those interested in improving the accessibility features of Mastodon, we recommend reviewing Mastodon’s CONTRIBUTING document.
More to come
We are always adding more docs! Please check the docs pages frequently for information that may be useful to you. If you have an idea for the docs, or wish to submit a PR for the docs, please do so on our Community repo on GitHub.
April will mark one month since we launched the Nivenly Foundation, Hachyderm’s parent org. Nivenly’s website is continuing to be updated with information about how to sponsor or become a member. For more information about Nivenly, please see Nivenly’s Hello World blog post.
The creation of Nivenly also allowed us to start taking donations for Hachyderm and sell swag. If you are interested in donating, please use either our GitHub Sponsors or one of the other methods that we outline on our Thank You doc. For Hachyderm swag, please check out Nivenly’s swag store .
Decaf Ko-Fi: Launching GitHub Sponsors et al
Since our massive growth at the end of last year, many of you have asked about ways to donate beyond Nóva’s Ko-Fi. There were a few limitations there, notably the need to create an account in order to donate. There were a few milestones we needed to hit before we could do this properly, notably we needed to have an EIN in order to properly receive donations and pay for services (as an entity).
Well that time has come! Read on to learn about how you can support Hachyderm either directly or via Hachyderm’s parent organization, the Nivenly Foundation.
First things first: GitHub Sponsors
Actual Octocat from our approval email
As of today the Hachyderm GitHub Sponsors page is up and accepting donations! Using GitHub Sponsors you can add a custom amount and donate either once or monthly. There are a couple of donation tiers that you can choose from as well if you are interested in shoutouts / thank yous either on Hachyderm or on our Funding and Thank You page. In both cases we’d use your GitHub handle for the shoutout.
The shoutouts and Thank You page
#ThankYouThursday is a hashtag we’re creating today to thank users for their contributions. Most posts for #ThankYouThursday happen on Hachyderm’s Hachyderm account, but higher donations will be elible for shoutouts on Kris Nóva’s Hachyderm.
- $7/mo. and higher
- Get a sponsor badge on your GitHub profile
- $25/mo. and higher or $100 one-time and higher
- Get a sponsor badge on your GitHub profile
- Get a shoutout on the Hachyderm account’s quarterly #ThankYouThursday
- $50/mo. and higher or $300 one-time and higher
- Get a sponsor badge on your GitHub profile
- Get a shoutout on the Kris Nóva’s account’s quarterly #ThankYouThursday
- $1000 one-time and higher
- Get a sponsor badge on your GitHub profile
- Get a shoutout on the Hachyderm account’s quarterly #ThankYouThursday
- Be added to the Thank You List on our Funding page
- $2500 one-time and higher
- Get a sponsor badge on your GitHub profile
- Get a shoutout on Kris Nóva’s quarterly #ThankYouThursday
(All above pricing in USD.)
A couple of important things about the above:
- All public announcements are optional. You can choose to opt-out by having your donation set to private.
- By default we’ll use your GitHub handle for shoutouts. This is easier than reconciling GitHub and Hachyderm handles.
- We may adjust the tiers to make the Thank Yous more frequent.
Right now the above tiers are our best guess, but we may edit the #ThankYouThursday thresholds in particular so that we can keep a sustainable cadence. Thank you for your patience and understanding with this ❤️
And now an update for the Nivenly Foundation
For those who don’t know: the Nivenly Foundation is the non-profit co-op that we’re founding for Hachyderm and other open source projects like Aurae. The big milestone we reached here is that 1 ) we’re an official non-profit with the State of Washington and 2 ) we have a nice, shiny, EIN which allowed us to start accepting donations to both the Nivenly Foundation as well as its two projects: Aurae and Hachyderm. For visibility, here are all the GitHub sponsor links in one place:
It is also possible to give a custom one-time donation to Nivenly via Stripe:
Right now only donations are open for Nivenly, Aurae, and Hachyderm. After we finalize Nivenly’s launch, Nivenly memberships will also be available for individuals, maintainers, and what we call trade memberships for companies, businesses, and business-like entities.
What do Nivenly Memberships mean for donations?
Right now, donations and memberships are separate. That means that you can donate to Hachyderm and, once available, join Nivenly as two separate steps. As Nivenly’s largest project, providing governance and funding for Hachyderm uses almost all of Nivenly’s donations. As we grow and include more projects this is likely to shift over time. As such, we are spinning up an Open Collective page for Nivenly that will manage the memberships and also provide a way for us to be transparent about our budget as we grow. Our next two big milestones:
- What you’ve all been waiting for: the public release of the governance model (almost complete)
- What we definitely need: the finalization of our 501(c)3 paperwork with the IRS (in progress)
As we grow we’ll continue to post updates. Thank you all so much for your patience and participation 💕
P.S. and update: What’s happening with Ko-fi?
We are currently moving away from Kris Nóva’s Ko-fi as a funding source for Nivenly and Hachyderm et al. We’ve created a new Ko-fi account for the Nivenly Foundation itself:
Kris Nóva’s Ko-fi is still live to give people time to migrate Nivenly-specific donations (including those for Hachyderm and Aurae) from her Ko-fi to either GitHub sponsors, Nivenly’s Ko-fi, Stripe or starting a Nivenly co-op general membership via Nivenly’s Open Collective page as those become ready (which should be soon). We’ll still be using Nivenly-specific funds from her Ko-fi for Nivenly for the next 30-60 days and will follow up with an update as we start to stop that (manual 😅) process.
Growth and Sustainability
Thank you to everyone who has been patient with Hachyderm as we have had to make some adjustments to how we do things. Finding ourselves launched into scale has impacted our people more than it has impacted our systems.
I wanted to provide some visibility into our intentions with Hachyderm, our priorities, and immediate initiatives.
We intend on offering transparency reports similar to the November Transparency Report from SFBA Social. It will take us several weeks before we will be able to publish our first one.
The immediate numbers from the administration dashboard are below.
On January 1st, 2023 we will be changing our financial model.
Hachyderm has been operating successfully since April of 2022 by funding our infrastructure from the proceeds of Kris Nóva’s Twitch presence.
In January 2023 we will be rolling out a new financial model intended to be sustainable and transparent for our users. We will be looking into donation and subscription models such as Patreon at that time.
From now until the end of the year, Hachyderm will continue to operate using the proceeds of Kris Nóva’s Twitch streams, and our donations through the ko-fi donation page.
We are considering forming a legal entity to control Hachyderm in January 2023.
At this time we are not considering a for-profit corporation for Hachyderm.
The exact details of what our decision is, will be announced as we come to conviction and seek legal advice.
At this time we do not have any plans to “cap” or limit user registration for Hachyderm.
There is a small chance we might temporarily close registration for small limited periods of time during events such as the DDoS Security Threat.
To be clear, we do not plan on rolling out a formal registration closure for any substantial or planned period of time. Any closure will be as short as possible, and will be opened up as soon as it is safe to do so.
We will be reevaluating this decision continuously. If at any point Hachyderm becomes bloated or unreasonably large we will likely change our decision.
User Registration and Performance
At this time we do not believe that user registration will have an immediate or noticeable impact on the performance of our systems. We do not believe that closing registration will somehow “make Hachyderm faster” or “make the service more reliable”.
We will reevaluating this decision continuously. If at any point the growth patterns of Hachyderm changes we will likely change our decision.
Call for Volunteers
We will be onboarding new moderators and operators in January to help with our service. To help with that, we have created a short Typeform to consolidate all the volunteer offers so it is easier for us to reach back out to you when we’re ready:
The existing teams will be spending the rest of December cleaning up documentation, and building out this community resource in a way that is easy for newcomers to be self sufficent with our services.
As moderators and infrastructure teams reach a point of sustainability, each will announce the path forward for volunteers when they feel the time is right.
The announcements page on this website, will be the source of truth.
Our Promise to Our users
Hachyderm has signed The Mastodon Server Covenant which means we have given our commitment to give users at least 3 months of advance warning in case of shutting down.
My personal promise is that I will do everything in my power to support our users any way I can that does not jeopardize the safety of other users or myself.
We will be forming a broader set of governance and expectation setting for our users as we mature our services and documentation.
I wanted to share a few thoughts on sustainability with Hachyderm.
Part of creating a sustainable service for our users will involve participation from everyone. We are asking that all Hachydermians remind themselves that time, patience, and empathy are some of the most effective ways in creating sustainable services.
There will be some situations where we will have to make difficult decisions with regard to priority. Often times the reason we aren’t immediately responding to an issue isn’t because we are ignoring the issue or oblivious to it. It is because we have to spend our time and effort wisely in order to keep a sustainable posture for the service. We ask for patience as it will sometimes take days or weeks to respond to issues, especially during production infrastructure issues.
We ask that everyone reminds themselves that pressuring our teams is likely counter productive to creating a sustainable environment.
Leaving the Basement
This post has taken several weeks in the making to compile. My hope is that this captures the vast majority of questions people have been asking recently with regard to Hachyderm.
To begin, I would like to start by introducing the state of Hachyderm before the migration, as well as introduce the problems we were experiencing. Next, I will cover the root causes of the problems, and how we found them. Finally, I will discuss the migration strategy, the problems we experienced, and what we got right, and what can be better. I will end with an accurate depiction of how hachyderm exists today.
State of Hachyderm: Before
Hachyderm obtained roughly 30,000 users in 30 days; or roughly 1 new user every 1.5 minutes for the duration of the month of November.
I documented 3 medium articles during the month, each with the assumption that it would be my last for the month.
- November 3rd, 720 users Operating Mastodon, Privacy, and Content
- November 13th, 6,000 users Hachyderm Infrastructure
- November 25th, 25,000 users Experimenting with Federation and Migrating Accounts
Here are the servers that were hosting Hachyderm in the rack in my basement, which later became known as “The Watertower”.
|Hardware||DELL PowerEdge R630 2x Intel Xeon E5-2680 v3||DELL PowerEdge R620 2x Intel Xeon E5-2670||DELL PowerEdge R620 2x Intel Xeon E5-2670||DELL PowerEdge R620 2x Intel Xeon E5-2670|
|Compute||48 Cores (each 12 cores, 24 threads)||32 Cores (each 8 cores, 16 threads)||32 Cores (each 8 cores, 16 threads)||32 Cores (each 8 cores, 16 threads)|
|Memory||128 GB RAM||64 GB RAM||64 GB RAM||64 GB RAM|
|Network||4x 10Gbps Base-T 2x||4x 1Gbps Base-T (intel I350)||4x 1Gbps Base-T (intel I350)||4x 1Gbps Base-T (intel I350)|
|SSDs||238 GiB (sda/sdb) 4x 931 GiB (sdc/sdd/sde/sdf) 2x 1.86 TiB (sdg/sdh)||558 GiB Harddrive (sda/sdb)||558 GiB Harddrive (sda/sdb)||558 GiB Harddrive (sda/sdb)|
It is important to note that all of the servers are used hardware, and all of the drives are SSDs.
“The Watertower” sat behind a few pieces of network hardware, including large business fiber connection in Seattle, WA. Here are the traffic patterns we measured during November, and the advertised limitations from our ISP.
|Egress Advertised||Egress in Practice||Ingress Advertised||Ingress in Practice|
|200 Mbps||217 Mbps||1 Gbps||112 Mbps|
Our busiest traffic day was 11/21/22 where we processed 999.80 GiB in RX/TX traffic in a single day. During the month of November we averaged 36.86 Mbps in traffic with samples taken every hour.
The server service layout is detailed below.
Problems in Production
For the vast majority of November, Hachyderm had been stable. Most users reported excellent experience, and our systems remained relatively healthy.
On November 27th, I filed the 1st of what would become 21 changelogs for our production infrastructure.
The initial report was failing images in production. The initial investigation lead our team to discover that our NFS clients were behaving unreasonably slow.
We were able to prove that NFS was “slow” by trying to navigate to a mounted directory and list files. In the best cases results would come back in less than a second. In the worst cases results would take 10-20 seconds. In some cases the server would lock up and a new shell would need to be established; NFS would never return.
I filed a changelog, and mutated production. This is what became the first minor change in a week long crisis to evacuate the basement.
We were unable to fix the perceived slowness with NFS with my first change.
However we did determine that we had scaled our compute nodes very high in the process of investigating NFS. Load averages on Yakko, Wakko, and Dot were well above 1,000 at this time.
Each Yakko, Wakko, and Dot were housing multiple systemd units for our ingress, default, push, pull, and mailing queues – as well as the puma web server hosting Mastodon itself.
At this point Alice was serving our media over NFS, postgres, redis, and a lightweight Nginx proxy to load balance across the animaniacs (Yakko, Wakko, and Dot).
The problems began to cascade the night of the 27th, and continued to grow worse by the hour into the night.
- HTTP(s) response times began to degrade.
- Postgres response times began to degrade.
- NFS was still measurably slow on the client side.
The main observation was that the service would “flap”, almost as if it was deliberately toying with our psychology and our hope.
We would see long periods of “acceptable” performance when the site would “settle down”. Then, without warning, our alerts would begin to go off.
Hachyderm hosts a network of edge or point of presence (PoP) nodes that serve as a frontend caching mechanism in front of core.
During the “spikes” of failure, the edge Nginx logs began to record “Connection refused” messages.
The trend of “flapping” availability continued into the night. The service would recover and level out, then a spike in 5XX level responses, and then ultimately a complete outage on the edge.
This continued for several days.
A Note on Empathy
It is important to note that Hachyderm had grown organically over the month of November. Every log that was being captured, every graph that was consuming data, every secret, every config file, every bash script – all – were a consequence of reacting to the “problem” of growth and adoption.
I call this out, because this is very akin to most of the production systems I deal with in my career. It is important to have empathy for the systems and the people who work on them. Every large production is a consequence of luck. This means that something happened that caused human beings to flock to your service.
I am a firm believer that no system is ever “designed” for the consequences of high adoption. This is especially true with regard to Mastodon, as most of our team has never operated a production Mastodon instance before. To be candid, it would appear that most of the internet is in a similar situation.
We are all experimenting here. Hachyderm was just “lucky” to see adoption.
There is no such thing as both a mechanistic and highly adopted system. All systems that are a consequence of growth, will be organic, and prone to the symptoms of reactive operations.
In other words, every ugly system is also a successful system. Every beautiful system, has never seen spontaneous adoption.
Finding Root Causes
By the 3rd day we had roughly 20 changelogs filed.
Each changelog capturing the story of a highly motivated and extremely hopeful member of the team believing they had once and for all identified the bottleneck. Each, ultimately failing to stop the flapping of Hachyderm.
I cannot say enough good things about the team who worked around the clock on Hachyderm. In many cases we were sleeping for 4 hours a night, and bringing our laptops to bed with us.
- @Quintessence wins the “Universe’s best incident commander” award.
- @Taniwha wins the “Best late night hacker and cyber detective” award.
- @hazelweakly wins the “Extreme research and googling cyberhacker” award.
- @malte wins the “Best architect and most likely to remain calm in a crisis” award.
- @dma wins the “Best scientist and graph enthusiast” award.
After all of our research, science, and detection work we had narrowed down our problem two 2 disks on Alice.
/dev/sdg # 2Tb "new" drive /dev/sdh # 2Tb "new" drive
The IOPS on these two particular drives would max out to 100% a few moments before the cascading failure in the rack would begin. We had successfully identified the “root cause” of our production problems.
Here is a graphic that captures the moment well. Screenshot taken from 2am Pacific on November 30th, roughly 3 days after production began to intermittently fail.
It is important to note that our entire production system, was dependent on these 2 disks, as well as our ZFS pool which was managing the data on the disks,
[novix@alice]: ~>$ df -h Filesystem Size Used Avail Use% Mounted on dev 63G 0 63G 0% /dev run 63G 1.7G 62G 3% /run /dev/sda3 228G 149G 68G 69% / tmpfs 63G 808K 63G 1% /dev/shm tmpfs 63G 11G 53G 16% /tmp /dev/sdb1 234G 4.6G 218G 3% /home /dev/sda1 1022M 288K 1022M 1% /boot/EFI data/novix 482G 6.5G 475G 2% /home/novix data 477G 1.5G 475G 1% /data data/mastodon-home 643G 168G 475G 27% /var/lib/mastodon data/mastodon-postgresql 568G 93G 475G 17% /var/lib/postgres/data data/mastodon-storage 1.4T 929G 475G 67% /var/lib/mastodon/public/system tmpfs 10G 7.5G 2.6G 75% /var/log
Both our main media block storage, and our main postgres database was currently housed on ZFS. The more we began to correlate the theory, the more we could correlate slow disks to slow databases responses, and slow media storage. Eventually our compute servers and web servers would max out our connection pool against the database and timeout. Eventually our web servers would overload the media server and timeout.
The timeouts would cascade out to the edge nodes and eventually cause:
- 5XX responses in production.
- Users hitting the “submit” button as our HTTP(s) servers would hang “incomplete” resulting in duplicate posts.
- Connection refused errors for every hop in our systems.
We had found the root cause. Our disks on Alice were failing.
Migration 1: Digital Ocean
We had made the decision to evacuate The Watertower and migrate to Hetzner weeks prior to the incident. However it was becoming obvious that our “slow and steady” approach to setting up picture-perfect infrastructure in Hetzner wasn’t going to happen.
We needed off Alice, and we needed off now.
A few notable caveats about leaving The Watertower.
- Transferring data off The Watertower was going to take several days with the current available speed of the disks.
- We were fairly confident that shutting down production for several days wasn’t an option.
- Our main problem was getting data off the disks.
Unexpectedly I received a phone call from an old colleague of mine @Gabe Monroy at Digital Ocean. Gabe offered to support Hachyderm altruistically and was able to offer the solution of moving our block storage to Digital Ocean Spaces for object storage.
Thank you to Gabe Monroy, Ado Kukic, and Daniel Hix for helping us with this path forward! Hachyderm will forever be grateful for your support!
There was one concern, how were we going to transfer over 1Tb of data to Digital Ocean on already failing disks?
One of our infrastructure volunteers @malte had helped us come up with an extremely clever solution to the problem.
We could leverage Hachyderm’s users to help us perform the most meaningful work first.
Solution: NGINX try_files
Malte’s model was simple:
- We begin writing data that is cached in our edge nodes directly to the object store instead of back to Alice.
- As users access data, we can ensure that it will be be taken of Alice and delivered to the user.
- We can then leverage Mastodon’s S3 feature to write the “hot” data directly back to Digital Ocean using a reverse Nginx proxy.
We can point the
try_files directive back to Alice, and only serve the files from Alice once as they would be written back to S3 by the edge node accessing the files. Read try_files documentation.
In other words, the more that our users accessed Hachyderm, the faster our data would replicate to Digital Ocean. Conveniently this also meant that we would copy the data that was being immediately used first.
We could additionally run a slow
rclone for the remaining data that is still running 2+ days later as I write this blog post.
This was the most impressive solution I have seen to a crisis problem in my history of operating distributed systems. Our users, were able to help us transfer our data to Digital Ocean, just by leveraging the service. The more they used Hachyderm, the more we migrated off Alice’s bad disks.
Migration 2: Hetzner
By the time the change had been in production for a few hours, we all had noticed a substantial increase in our performance. We were able to remove NFS from the system, and shuffle around our Puma servers, and sidekiq queues to reduce load on Postgres.
Alice was serving files from the bad disks, however all of our writes were now going to Digital Ocean.
While our systems performance did “improve” it was still far from perfect. HTTP(s) requests were still very slowly, and in cases would timeout and flap.
At this point it was easy to determine that Postgres (and it’s relationship to the bad disks) was the next bottleneck in the system.
Note: We still have an outstanding theory that ZFS, specifically the unbalanced mirrors, is also a contributing factor. We will not be able to validate this theory until the service is completely off Alice.
It would be slightly more challenging coming up with a clever solution to get Postgres off Alice.
On the morning of December 1st we finished replicating our postgres data across the atlantic onto our new fleet of servers in Hetzner.
- Nixie (Alice replacement)
- Freud (Yakko)
- Fritz (Wakko)
- Franz (Dot)
We will be publishing a detailed architecture on the current system in Hetzner as we have time to finalize it.
Our team made an announcement that we were shutting production down, and scheduled a live stream to perform the work.
The video of the cutover is available to watch directly on Twitch.
NodeJS and Mastodon
The migration would not be complete without calling out that I was unable to build the Mastodon code base on our new primary Puma HTTP server.
After what felt like an eternity we discovered that we needed to recompile the NodeJS assets.
cd /var/lib/mastodon NODE_OPTIONS=--openssl-legacy-provider RAILS_ENV=production bundle exec rails assets:precompile
Eventually we were able to build and bring up the Puma server which was connected to the new postgres server.
We moved our worker queues over to the new servers in Hetzner.
The migration was complete.
State of Hachyderm: After
To be candid, Hachyderm “just works” now and we are serving our core content within the EU in Germany.
There is an ever-growing smaller and smaller amount of traffic that is still routing through Alice as our users begin to access more and more obscure files.
Today we have roughly 700Gb of out 1.2Tb of data transferred to Digital Ocean.
We will be destroying the ZFS server in Alice, and replacing the disks as soon as we can completely take The Watertower offline.
On our list of items to cover moving forward:
- Offer a detailed public resource of our architecture in Hetzner complete with Mastodon specific service breakdowns.
- Build a blog and community resource such that we can begin documenting our community and bringing new volunteers on board.
- Take a break, and install better monitoring on our systems.
- Migrate to NixOS or Kubernetes depending on the needs of the system.
- Get back to working on Aurae, now with a lot more product requirements than we had before.
We suffered from pretty common pitfalls in our system. Our main operational problems stemmed from scaling humans, and not our knowledge of how to build effective distributed systems. We have observability, security, and infrastructure experts from across Silicon Valley working on Hachyderm and we were still SSHing into production and sharing passwords.
In other words, our main limitations to scale were managing people, processes, and organizational challenges. Even determining who was responsible for what, was a problem within itself.
We had a team of experts without any formal precedent working together, and no legal structure or corporate organization to glue us together. We defaulted back to some bad habits in a pinch, and also uncovered some exciting new patterns that were only made possible because of the constraints of the fediverse.
Ultimately I believe that myself, and the entire team is convinced that the future of the internet and social is going to be in large collaborative operational systems that operate in a decentralized network.
We made some decisions during the process, such as keeping registrations open during the process that I agree with. I think I would make the same decisions again. Our limiting factor in Hachyderm had almost nothing to do with the amount of users accessing the system as much as it did the amount of data we were federating. Our system would have flapped if we had 100 users, or if we had 1,000,000 users. We were nowhere close to hitting limits of DB size, storage size, or network capacity. We just had bad disks.
I think the biggest risk we took was onboarding new people to a slow/unresponsive service for a few days. I am willing to accept that as a good decision as we are openly experimenting with this entire process.
I have said it before, and I will say it again. I believe in Hachyderm. I believe we can build a beautiful and effective social media service for the tech industry.
The key to success will be how well we experiment. This is the time for good old fashioned computer science, complete with thoughtful hypothesis and detailed observability to validate them.