Posts in 2023

  • Security incident: Redis cache exposed to public internet

    Sunday, July 16, 2023 in Incidents

    Background Between July 9 and July 16, 2023, one of Hachyderm’s Redis cache servers was exposed to the public internet. On July 16, 2023, the Hachyderm Infrastructure team identified a misconfiguration of our firewall on the cache server which …

    Read more

  • Moderation Postmortem

    Tuesday, May 02, 2023 in Incidents

    Hello Hachydermians! There has been a lot of confusion this week, so we’re writing up this blog post to be both a postmortem of sorts and a single source of truth. This is partly to combat some of the problems generated by hearsay: hearsay generates …

    Read more

  • TLS Expires: media.hachyderm.io

    Monday, February 27, 2023 in Incidents

    On February 28th, 2023 at approximately 01:55 UTC Hachyderm experienced a service degradation in which images failed to load in production. We were able to quickly identify the root cause as expired TLS certificates in production for …

    Read more

  • Fritz Timeouts

    Saturday, January 07, 2023 in Incidents

    On January 7th, 2023 at approximately 22:26 UTC Hachyderm experienced a spike in HTTP response times as well as a spike in 504 Timeouts across the CDN. Working backwards from the CDN to fritz we discovered another cascading failure. Context There is …

    Read more

  • Fritz on the fritz

    Tuesday, January 03, 2023 in Incidents

    On January 3th, 2023 at approximately 12:30 UTC Hachyderm experienced a spike in response times. This appeared to be due to a certificate that had not been renewed on fritz, which runs the Mastodon Puma and Streaming services. The service appeared to …

    Read more

Posts in 2022

  • The Queues ☃️ down in Queueville

    Tuesday, December 20, 2022 in Incidents

    Every Queue down in Queueville liked ActivityPub a lot. But John Mastodon who lived just north of Queuville, did not! John Mastodon hated ActivityPub, the whole Activity season! Now please don’t ask why. No one quite knows the reason. It could …

    Read more

  • Degraded Service: Media Caching and Queue Latency

    Sunday, December 18, 2022 in Incidents

    On Saturday, December 17th, 2022 at roughly 12:43 UTC Hachyderm received our first report of media failures which started a 2-day-long investigation of our systems by @hazelweakly, @quintessence, @dma, and @nova. The investigation coincidentally …

    Read more

  • Global Outage: 504 Timeouts

    Tuesday, December 13, 2022 in Incidents

    On Tuesday, December 13th, 2022 at roughly 18:52 UTC Hachyderm experienced a 7 minute cascading failure that has impacted our users around the globe resulting in unresponsive HTTP(s) requests and 5XX level requests. The service has not experienced …

    Read more