Problem

The problem is that the sent amount of messages on sender_firebase_statistics table does not correspond to the actual sent messages taken from channels_stat_daily table. Here is the example:

channels stat daily output

2024-08-24 - 23833877937
2024-08-25 - 23653522348
2024-08-26 - 23751405481
2024-08-27 - 16153471015

sender_firebase_statistics output

2024-08-24 - 15243943345
2024-08-25 - 15063587756
2024-08-26 - 15161470889
2024-08-27 - 11893535303

Investigation

The difference is quite a bit. The interesting thing is that the amount of deliveries is equal on both tables. A mechanism by which data is being written o the sender_firebase_statistics table is different for sent and deliveries, because:

  • sent_count is taken from message_events table
  • delivered_count is taken from messagecallbackevents table
SELECT
    toDate(timestamp) AS date,
    firebase_id,
    toInt32(0) AS sender_id,
    toInt32(0) AS delivered_count,
    toInt32(0) AS clicked_count,
    toInt32(0) AS unique_click_count,
    toInt32(0) AS closed_count,
    sum(subscribe_count) AS subscribe_count,
    sum(unsubscribe_count) AS unsubscribe_count,
    toInt32(0) AS sent_count,
    toInt32(0) AS errors_count
FROM statistics.subscription_events WHERE date > '2024-08-20'
GROUP BY toDate(timestamp), firebase_id
 
SELECT
    toDate(timestamp) as date,
    firebase_id,
    sender_id,
    toInt32(0) as delivered_count,
    toInt32(0) as clicked_count,
    toInt32(0) as unique_click_count,
    toInt32(0) as closed_count,
    toInt32(0) as subscribe_count,
    toInt32(0) as unsubscribe_count,
    toInt32(0) as sent_count,
    sum(errors_count) as errors_count
FROM message_errors_events GROUP BY date, firebase_id, sender_id ORDER BY date
 
SELECT date, sum(sent_count) as sent FROM (
    SELECT
        toDate(timestamp) as date,
        firebase_id,
        sender_id,
        sum(sent_count) as sent_count
    FROM message_events WHERE date = '2024-08-26' GROUP BY toDate(timestamp), firebase_id, sender_id ORDER BY date
) GROUP BY date ORDER BY date
 

Quering message_events table in the same manner that is should be taken by materialized view to pass data to the sender_firebase_statistics gives me the correct amount of sent messages in total. Let’s take it closer by comparing a single day and a signle firebase + sender pair:

date       | firebase_id | sender_id | sent_count
2024-08-26 | 2205        |  400      | 250757829
           | 1984        |  433      | 263732410
           | 2197        |  433      | 355744845 

They are all the same for sender_firebase_statistics table. But there is a unique record living in message_events grouped by firebase_id and sender_id which refers to firebase=0, sender=0 pair with a huge amount of sent messages (8775431127) which is the exact piece we’re missing for the sender_firebase_statistics table.

Such data is definitely missing for our table:

select * from sender_firebase_statistics where sender_id = 0 and firebase_id = 0 and date = '2024-08-26'

Questions

  • why does this data is not being written to the sender_firebase_statistics table?
  • why do we have such a record in the message_events table?

Solution