When was a 48 hour PITR limit added to the firehose?


#1

With the April 5th update, our application that connects to the firehose wasn’t able to reconnect after some point in time. We were not able to detect this until today but normally this would not be a problem. We have in the past been able to obtain the missing stream data as our application remembers the last PITR record and uses that on a connection reattempt. However, this time the connection was not allowed because it was providing a PITR timestamp that was greater than 48 hours. When did this change? I do not see it documented in the Firehose documentation or mentioned in any of the release notes. It use to be the case that the stream could be replayed from the beginning of the account creation. I know this to be true not only because this is not the first time we’ve been disconnected from the stream for more than 48 hours but also because i remember reading this or being told this by a representative and then verifying this functionality at some point in the past.

In addition, I’ve noticed that, after I readjusted our application to only ask for the last 48 hours, upon trying to catch up, if the stream starts to ever return data that is becomes 48 hours old, it seems that it will again disconnect our application from the firehose. If this is true, this means to me that even asking for 48 hours worth of data on the connection is not worth it as it would require both the firehose to send the data incredibly fast as well as on our side to process the data incredibly fast. We’re able to process upwards of 800-1000 messages per second. What is the rate that the firehose is able to send? Is it the expectation that if the connected application falls behind by more than 48 hours it will be automatically disconnected? And what about a catch-up scenario like ours? Are clients expected to process the stream that at a high rate w/o any leniency? I understand that its a challenge to hold on to that amount of data for long periods of time but communication about the change would be nice. Additionally it makes sense to me to allow a connection to only obtain data so far back but while processing the flood in order to catch up, allow some forgiveness in the date range of the data to slip past the 48 hour window, or maybe just do not send the data instead of disconnecting the client.

lt;dr;

When did the Firehose limit data fetching to 48 hours and is the connection severed if any data is about to be sent over the stream that is over 48 hours old?


#2

Hi Aaron, I emailed you a few moments ago with an answer to your question. Firehose support is generally best handled via email with your account rep to ensure a quicker reply. Thanks.