updates

update will provide all necessary tools to collect and return all deactivated subscribers average age, grouped by a certain period of time (e.g. day, week, month, etc.). more about the feature can be found in the deactivated-users-age page.

since we have subscription-event updated with subscriber_age field, we need to be careful with the data consistency and make sure that all migrations are done properly.

there is also a bug fix for unsubscriptions that were not properly handled in the system after new error types appeared (it did not work properly with the new error types).

notes

statistics-api service already has some basic migrations to be applied in order to update clickhouse stream collections with a new schema. since subscriber-events-stream is attached to the subscription-events table, migrations will backup old data and apply new schema to the new (empy) table. further data migrations should be done manually.

after succussful deployment, data should have been migrated from the old table to the new one.

select concat('insert into subscription_events (timestamp,subscriber_id,message_example_id,user_country,user_browser,user_browser_language,channel_id,active_campaigns,campaign_id,subscribe_count,unsubscribe_count,window_id,window_pool_id,firebase_app,firebase_id,subscriber_age) select timestamp,subscriber_id,message_example_id,user_country,user_browser,user_browser_language,channel_id,active_campaigns,campaign_id,subscribe_count,unsubscribe_count,window_id,window_pool_id,firebase_app,firebase_id,0 from subscription_events_old where toyyyymm(todatetime64(timestamp, 0))=',partition) as cmd, database, table, partition, sum(rows), sum(bytes_on_disk), count() from system.parts where database='statistics' and table='subscription_events_old' group by database, table, partition order by partition

insert statements above one by one to finish with data migration.

update map

deployment render all [[templates/items/deployment]] where page = @page.name

problems during deployment

after statistics-api deployment, we have faced some issues with the data consistency on a subscription_events_stream.

code: 8. db::exception: received from 127.0.0.1:9000. db::exception: field subscriber_age not found in avro schema: while executing sourcefrominputstream.

the cause of this error were old subscription-event messages that were not updated with the new schema.

we need to update our materialized view with the new schema and reapply the migrations.

create materialized view statistics.subscription_stream_to_table to statistics.subscription_events
(
    `timestamp` int32,
    `subscriber_id` int64,
    `message_example_id` int64,
    `user_country` string,
    `user_browser` string,
    `user_browser_language` string,
    `channel_id` int64,
    `active_campaigns` array(int64),
    `campaign_id` int64,
    `subscribe_count` int32,
    `unsubscribe_count` int32,
    `window_id` int32,
    `window_pool_id` int32,
    `firebase_app` string,
    `firebase_id` int32,
    `subscriber_age` int32
) as
select *
from statistics.subscription_events_stream

old data should have also been removed in order to not interfere with the new data. python script was executed to clean the old data from the subscription_events kafka topic for a statistics group.

import gc
 
from webpush.lib.kafka.clients import consumer
from webpush.lib.kafka.models.subscriptions import subscriptionevent
 
i = 0
with consumer([subscriptionevent], 'statistics', prepare_topics=false, with_manual_commits=false) as consumer:
    while true:
        print('.')
        gc.collect()
        for _ in range(2500):
            try:
                req = consumer.get(timeout=1)
            except timeouterror:
                continue