Ghost deactivation was performed by the manual script executions on the directory-server service with a django management shell.

The first step is always a subscriber ids retrieval. It is done by requesting SubscribersReachData model with certain criteria (either by clicks or by deliveries behavior, plus amount of days this behavior should have been observed, aka deliveries_idle_days):

from datetime import datetime
import pandas as pd
 
from apps.subscribers.caches import SubscribersReachData
from apps.subscribers.models import Subscriber
 
today = datetime.now().date()
output_file = f'/tmp/delivery_ghosts_{today.strftime("%Y-%m-%d")}.csv'
 
subscriber_ids = [r['subscriber_id'] for r in SubscribersReachData.objects.filter(deliveries_idle_days__gte=21).only('subscriber_id')]
owner_ids = [2675, 234, 333, 1913, 1846, 1080, 2708, 1]
 
result = set(Subscriber.objects.filter(id__in=subscriber_ids, is_active=True, owner_id__in=owner_ids).values_list('id', flat=True))
df = pd.DataFrame([{'id': sid} for sid in result])
df.to_csv(output_file, index=False)

After subscriber ids are retrieved, their ids are placed to the /tmp/delivery_ghosts_2024-04-18.csv file. Then, the script is executed to deactivate the ghosts:

import pandas as pd
 
from apps.subscribers.models import Subscriber
 
 
input_file = '/tmp/delivery_ghosts_2024-04-17.csv'
owners_verified = True
owner_ids = [2675, 234, 333, 1913, 1846, 1080, 2708, 1]
 
subscriber_ids = list(pd.read_csv(input_file)['id'].unique())
print(f'Total {len(subscriber_ids)} users to be deactivated')
 
query = Subscriber.objects.filter(id__in=subscriber_ids, is_active=True)
if not owners_verified:
    query = query.filter(owner_id__in=owner_ids)
 
total = query.count()
 
for i, subscriber in enumerate(query.iterator(chunk_size=100000)):
    print(f'{i} / {total}')
    assert isinstance(subscriber, Subscriber)
    if subscriber.is_active:
        subscriber.make_inactive()

The script takes already retrieved subscriber ids and iterates over them while deactivating each user with all necessary post-deactivation actions (subscription-event with unsubscribe_count producing, etc)

Problems during script execution

There were two major problems during script execution. First one was an amount of time it took to first grab and then deactivate all the users. The process happened to be too slow, therefore further optimizations should be made to speed up the process a bit.

Second problem was that the script was shutting down for unknown reasons. Repetitive script executions were needed to finish the process. It really takes a lot of time to deactivate all the ghosts right now. It is fine if we do it once in a while, but if we need to do it more often, we should start thinking about tweaking those scripts and database queries to make it better.