Hey fellow nerds, I have an idea that Iād like to discuss with you. All feedback ā positive or negative ā is welcome. Consider this a baby RFC (Request for Comments).
So. Iāve been having a think on how to implement the right to be forgotten (one of the cornerstones of eg. the GDPR) in the context of federated services. Currently, itās not possible to remove your comments, posts, etc., from the Fediverse and not just your āhome instanceā without manually contacting every node in the network. in my opinion, this is a fairly pressing problem, and there would already be a GDPR case here if someone were to bring the āeye of Sauronā (ie. a national data protection authority) upon us.
Please note that this is very much a draft and it does have some issues and downsides, some of which Iāve outlined towards the end.
The problem
In a nutshell, the problem Iām trying to solve is how to guarantee that āwell-behavedā instances, which support this proposal, will delete user content even in the most common exceptional cases, such as changes in network topology, network errors, and server downtime. These are situations where youād typically expect messages about content or user deletion to be lost. Itās important to note that Iāve specifically approached this from the āright to be forgottenā perspective, so the current version of the proposal solely deals with āmass deletionā when user accounts are deleted. It doesnāt currently integrate or work with the normal content deletion flow (Iāll further discuss this below).
While I understand that in a federated or decentralized network itās impossible to guarantee that your content will be deleted, but we canāt let āperfect be the enemy of good enoughā. Making a concerted effort to ensure that in most cases user content is deleted (initially this could even just be a Lemmy thing and not a wider Fediverse thing) when the user so wishes would already be a big step in the right direction.
I havenāt yet looked into āprior artā except some very cursory searches and I had banged the outline of this proposal out before I even went looking, but I now know that eg. Mastodon has the ability to set TTLs on posts. This proposal is sort of adjacent and could be massaged a bit to support this on Lemmy (or whatever else service) too.
1. The proposal: TTLs on user content
- Every comment, post etc. (content) must have an associated TTL (eg. a
live_until
timestamp). This TTL can be fairly long, on the order of weeks or even a couple of months - well before the contentās TTL runs out (eg. even halfway through the TTL, with some random jitter to prevent āthundering herdsā), an instance asks the āhome instanceā of the user who created the content whether the user account is still live. If it is, great, update the TTL and go on with life
- in cases where the āhome instanceā of a content creator canāt be reached due to eg. network problems, this āliveness checkā must be repeated at random long-ish intervals (eg. every 20 ā 30h) until an answer is gotten or the TTL runs out
- information about user liveness should be cached, but with a much shorter TTL than content
- in cases where the userās home instance isnāt in an instanceās linked instance list or is in their blocked instance list, this liveness check may be skipped
- when content TTL runs out and a user liveness check hasnāt succeeded, or when a user liveness check specifically comes back as negative, the content must be deleted
- when a liveness check comes back as negative and the user has been removed, instances must delete the rest of that userās content and not just the one whose TTL ran out
- when a liveness check fails (eg. the userās home instance doesnāt respond), instances may delete the rest of that userās content. Or I guess they probably should?
- user accounts must have a TTL, on the order of several years
- when a user performs any activity on the instance, this TTL must be updated
- when this TTL runs out, the account and all of its related content on the instance must be deleted
- instances may eg. ping users via email to remind them about their account expiring before the TTL runs out
2. Advantages of this proposal
- guarantees that user content is deleted from āwell behavedā instances, even in the face of changing network topologies when instances defederate or disappear, hiccups in message delivery, server uptime and so on
- would allow supporting Mastodon-like general content TTLs with a little modification, hence why it has TTLs per content and not just per user. Maybe something like a
refresh_liveness
boolean field on content that says whether an instance should do user liveness checks and refresh the contentās TTL based on it or not? - with some modification this probably could (and should) be made to work with and support the regular content deletion flow. Something for draft v0.2 in case this gets any traction?
3. Disadvantages of this proposal
- more network traffic, DB activity, and CPU usage, even during ānormalā operation and not just when something gets deleted. Not a huge amount but the impact should probably be estimated so weād have at least an idea of what itād mean
- however, considering the nature of the problem, some extra work is to be expected
- as noted, the current form of this proposal does not support or work with the regular deletion flow for individual comments or posts, and only addresses the more drastic scenario when a user account is deleted or disappears
- spurious deletions of content are theoretically possible, although with long TTLs and persistent liveness check retries they shouldnāt happen except in rare cases. Whether this is actually a problem requires more thinkifying
- requires buy-in from the rest of the Fediverse as long as itās not a protocol-level feature (and thereās more protocols than just ActivityPub). This same disadvantage would naturally apply to all proposals that arenāt protocol-level. The end goal would definitely be to have this feature be a protocol thing and not just a Lemmy thing, but one step at a time
3.1 āItās a feature, not a bugā
- when an instance defederates or otherwise leaves the network, content from users on that instance will eventually disappear from instances no longer connected to its network. This is a feature: when you lose contact with an instance for a long time, you have to assume that itās been ālost at seaā to make sure that the usersā right to forgotten is respected. As a side note, this would also help prune content from long gone instances
- content canāt be assumed to be forever. This is by design: in my opinon Lemmy shouldnāt try to be a permanent archive of all content, like the Wayback Machine
- this solution is more complex than simply actually deleting content when the user so wishes, instead of just hiding it from view like itās done now in Lemmy. While ātrue deletionā definitely needs to also be implemented, itās not enough to guarantee eventual content deletion in cases like defederation, or network and server errors leading to an instance not getting the message about content or a user being deleted
I donāt fully understand the āright to be forgottenā.
I mean, itās very useful when you want to make sure a corporation which profits from your data doesnāt want to delete that data, but from the perspective of forums like in here I struggle to understand the need of people to delete everything at some point.
The only result I see from this is useful knowledge being lost.
Imagine if I make a useful post which people come from time to time to solve their issue. People would probably link to beehaw not my instance, since I posted in this community. After a couple of years I no longer can maintain my instance and goes down, then my useful post has a silent self-destruct, people wonāt know this and keep linking it and eventually itāll end up like with a lot of forums:
āThe solution is in this linkā
āThanks, that solved my issueā
But now link is dead and the solution gone.
With how lemmy works now then people will still be able to find the content even if the instance where it originated from dies.
I see this as a very useful feature to preserve knowledge.
If you donāt want something to be forever in the internet then donāt post it, as you said, the wayback machine exists, so even then youāre acknowledging the GDPR request you made to the instance was useless, you still need to go to any archiver there is to be sure your data has been properly deleted.
I think there is a difference between agreeing with the law itself and agreeing with the usefulness. GDPR gives users incredible power over their data, and in the case of Reddit it allows you to leave the platform very effectively for example.
This is sadly the case with everything on the internet and life in general tbh.
Donāt quote me on this, but I donāt think GDPR says they have to delete every instance of your content across the internet, just the ones they have power over.> āThe solution is in this linkā
Also, Iām mainly adding some of my thoughts, donāt take this as criticism of your post or your viewpoint. I fully agree that there is no solution that pleases everyone here.