Mastodon's Implementation of FTS is a Shitshow

Author's note: If you have no idea what Mastodon is, here's some info.

Also note: I have been called a professional before, and have been doing things like deploying JVM shit and other Godawful web apps for 11 years as of this writing, but don't take my word for it blindly. I'm sometimes kind of an idiot. Think for yourself.

Mastodon 2.3.0 has a long-awaited feature: searching your own toots.

Great! Except Gargron completely fucked everything up. I'm not even going to mince words or be nice on this one. As an admin, and president of a small non-profit, I simply do not have resources to run this new feature. The reason is simple: ElasticSearch.

ElasticSearch... really?

Why on Earth did he choose a platform literally designed for data warehousing? It's a component of Kibana, a platform literally designed to invade people's privacy for corporate gain.

That in of itself doesn't make it bad, but it gives admins something dangerous: power. A lot more than people realise. It's also designed for the enterprise and big companies like Facebook who can just throw more resources at things your average person can't.

ElasticSearch is a resource hog

ElasticSearch recommends 64 GB of RAM, or at least 32 GB. This is well beyond the reach of your average Masto admin, in terms of costs. The only people who can truly afford to run it correctly are things like mastodon.social which already get considerable donations, instances whose admins are already working in hosting and have infrastructure to spare, or those with deep pockets.

I am none of those things. I'm just president of a small non-profit that owns an instance. We don't even have 16 GB to blow on this, not when that same amount of RAM could go to 4 Mastodon instances (sadly, yeah, that's... about it).

Sure, I could run it in less, but I've run JVM apps before. I know this thing isn't gonna run right in less. It might appear to be fine, but given enough time, it will eventually lag, followed by no longer responding. This thing will need babysitting for most admins. Certainly, this will not run on a $12/month VPS from linode that is also running the resource hog is Ruby on Rails, which sucks up a shitload of RAM.

But what about alternatives?

I have yet to see any convincing argument that Mastodon needs a third database (it also needs Redis and Postgres) other than Gargron's laziness, lack of knowledge, simply "following the crowd," or some other more nefarious motive I can't rule out.

One thing is clear: having to run this makes Mastodon even harder to run.

Sure, obstensibly this feature is optional for admins, but users will expect it. Users do not want to hear "we don't have the RAM." They don't want to hear "we don't have the money to let you search your toots." This sounds like a cop-out.

Has he even tried just using plain Lucene? Wikipedia's used it for ages for searching, and it works just fine for them. It also doesn't eat up a shitload of resources.

Has he tried PostgreSQL's TSEARCH with indexing? Surely this would be adequate.

Ah who am I kidding, Mastodon didn't even have column constraints until the middle of last year. Because what is even referential integrity? It's better to assume lack of knowledge than malice; incompetence is simply not knowing (although the Dunning-Kruger Effect is dangerous), whereas malice assumes bad intent which I can't prove (then again, I can't disprove it).

BY THE POWER OF THE WEBSHIT... I HAVE THE POWER!!!

ElasticSearch gives admins a unique kind of power, and in my view, a bad kind. The power to literally search everyone's toots (not just their users' toots), in ways that you can't even imagine doing with Google. This is such a terrible idea in of itself, given the amount of hostile behaviour I've seen from some admins.

For example: Alda, former admin of witches.town, once told me to jump into traffic (very classy). Although she later deleted those toots, she never apologised, and later called me the "worst person [she] ever met." Do I really think Alda should be given easy-to-use power to do that? Hell no!

What do I mean by this? I mean that you can literally search toots that match specific patterns, or do analytics on the data. You can search specific users for uses of hashtags in some combination, or search for specific phrases, or even key words. You can count the number of times someone has said "tankies suck" or "down with cis" in slightly different ways.

Google can do a lot of this, but it's clunky and unreliable. LIKE queries to PostgreSQL can be very slow. ElasticSearch is bloated, but one thing it is not is slow. This has infinite blackmail potential. What if some admin with a grudge digs up some post you made in a drunken stupor a year ago or something that portrays you in an unflattering light?

Datamining remote users for fun and profit

The way federation works, all toots get sent to a server and are stored locally. It has to work this way, there's no way around it, and even if it didn't have to store them, it would still be possible to warehouse them. This is an unavoidable side effect of federation. Sadly, this means your toots are open to the same scruinty by admins as everyone else's.

This would be true with or without ElasticSearch, I have to confess. But this point bears worth repeating.

Remember: Mastodon is only as trustworthy as the instance admins the people who follow you are on.

I put a hundred thousand pictures of my ass on the Internet, so the NSA could spy on it

(Subheading inspired by Crudbump)

This doesn't even get into the potential for open ElasticSearch servers to be used for searching. Authentication isn't configured by default in ElasticSearch, and as of this writing, the Mastodon docs recommend listening on 0.0.0.0 (all IP's). THIS IS INSANITY! Holy shit. (btw if you're using this for the love of god firewall your server or only listen on localhost (if it runs on the same box) or a local-only IP and do not follow the advice blindly!). This means anyone, including nation-state actors, can now search everyone's toots publicly over ElasticSearch if some admin doesn't secure their shit, which is basically the way that Mastodon encourages you to set it up. Classy!

How many admins are going to secure their stuff, especially blindly following this advice and not realising anything is wrong? Think of all the MongoDB servers on the Internet that have been used for data mining...

Potential for harassment

Then there's the potential for users who aren't even admins to abuse this feature.

I remember when Gargron wasn't going to implement FTS because of the abuse potential. How quickly people forget!

As a sop I presume, he chose to only allow searching "authorized" toots, which apparently he defines as:

  • Your toots
  • The toots of anyone you boost, favourite, or get mentioned in

But this really doesn't matter, since if you want to harass someone later, you just fave their toot, and then search it at your convenience. And boom goes the dynamite.

With Mastodon, you can turn off indexing of toots by Google, which can help mitigate public exposure issues (although anyone who has been told online privacy can ever be perfect either doesn't have a clue or is lying). You cannot turn off indexing of toots by ElasticSearch, not even locally. And when anyone interacts with your status, your toot can then be searched by them. Not as powerfully as raw ElasticSearch, but still good enough. Enough to harass you with, certainly.

It can be argued that the way federation works makes the idea that something can't be abused a pipe dream. This is true. It is also true that you can write a scraper that uses the Mastodon API to scrape someone's toots in like 5 minutes using something like Mastodon.py. But these are all high-effort and sophisticated attacks, relative to the intelligence of your average asshole. Your average disgruntled Mastodon user with an axe to grind can use this with no special knowledge of anything. No Google-fu. Nothing. They just have to fave your toots and remember them later, when they decide to smear you and create outrage.

This is all ignoring that admins can mod the instance to search anyone's toots, which is certainly an appealing mod from a usability standpoint.

What do I really think of FTS?

Essentially, the "protections" here are simply selling rotten fruit in an opaque bag.

I really don't mind FTS, but what I do mind is not disclosing fully the implications, and essentially ignoring it. Distributed systems can be good, but one must be aware of their limitations and users must be mindful there are dickheads everywhere in the world, and some run Mastodon instances.

It is always worth bearing in mind that Gargron didn't create Mastodon for vulnerable people, he created it to replace identi.ca. If you don't know what that is, think your average HackerNews user. It just so happens that minorities are a peripherial demographic he has happened to attract. All well and good, but he needs to be mindful that people are on here who are often under constant attack in this era of the alt-right. I wish he would have used a little more care, is all.

I also wish he would have chosen something that didn't literally introduce what amounts to gatekeeping for Mastodon instance creation. Having to set up all this stuff costs money. The resources to run this are becoming more and more expensive as time goes on as Mastodon becomes more and more bloated. If this continues, even the Interlinked Foundation's instance, mst3k, may go dark at some point, because we simply can't afford it and achieve our mission at the same time.

I honestly can't discount more nefarious motives like running Kibana to aid in searching toots for data mining or going on witch hunts on mastodon.social (given they do shady things like shadowban people). Which, I sincerely hope this is crankery from my diseased mind rather than something that's actually happening.

links

social