Problem
There is clearly a decrease in the number of subscriptions since 2024-05-30. It is unclear how it may happened, but it seems that it is a statistics only that is affected. Subscribers are still seems to be fine.
Investigation
Statistics is definitely loosing some data. I grabbed subscriptions from subscription_events and i clearly see the problem started on 2024-05-30.
select toDate(timestamp) as date, sum(subscribe_count), sum(unsubscribe_count) from subscription_events where date > '2024-05-25' group by date;
┌───────date─┬─sum(subscribe_count)─┬─sum(unsubscribe_count)─┐
│ 2024-05-26 │ 71776 │ 51958 │
│ 2024-05-27 │ 70035 │ 60527 │
│ 2024-05-28 │ 66456 │ 46801 │
│ 2024-05-29 │ 67245 │ 46755 │
│ 2024-05-30 │ 65771 │ 45128 │
│ 2024-05-31 │ 52411 │ 35305 │
│ 2024-06-01 │ 43668 │ 27825 │
│ 2024-06-02 │ 46025 │ 28777 │
│ 2024-06-03 │ 41321 │ 28881 │
│ 2024-06-04 │ 40939 │ 28778 │
│ 2024-06-05 │ 30068 │ 19112 │
└────────────┴──────────────────────┴────────────────────────┘At the same time i see much larger amount of new subscribers in the system.
subs = Subscriber.objects.filter(subscribed_date__gte='2024-06-05')
>>> len([s for s in subs if bool(s.token)])
48796It’s also not related to unsubscription process. The amount of unsubscribed users has also decreased which is seen from the previous query.
I verified how many users did we unsubscribe today:
>>> len([s for s in subs if s.unsubscribed_date is not None])
9885which also did not present any useful insights.
At the current moment the only thing that might have been affected is event producing kafka library.
def _register_subscription_event(self) -> None:
kafka_producer = Producer(SubscriptionEvent)
event = SubscriberEventsFactory.make_subscribe_event(self.subscriber)
kafka_producer.send(event, async_mode=False)Perhaps async_mode=False is not working as expected. I can try to switch it to True with further kafka_producer.poll() call.
I also restarted the service to eliminate any possible issues with Kafka library cache. Tomorrow results will show if it helped.
Possible causes
- Kafka library cache issue
- subscriber events are not being produced in a full amount
- schema registry does not accept some of the events (
subscribers_agefield?)
Solution
None of the above helped. It turned out it was a clickhouse issue that caused the problem. After clickhouse restart the statistics started to return to normal. Should wait for another day to confirm that it is not an issue anymore.