Osisoft - Notes about data compression. v3.4.375

Here's just a couple of notes about Compression. You will notice that some sets of data, just by their very nature, they are, they are highly compressible. I mean, just take a look at this Trend right here. We are seeing just constant spikes up and down, up and down. There's not a lot of data that is going to be able to be compressed out of that because it's just varying so much from one iteration to the next. We are probably going to be generating Compression Events, you know, Archive Events, for each value that comes in. So, in this case, using the Swinging Door Algorithm, all the values would be archived for this. Now, compare that to this data over here. This data is, you know, fairly uninteresting. It, it does not change that much, and it's obviously not going to be violating the deadband a great deal, so it's probably going to be very highly compressible. Another way of describing this is we, when we, when we look at things like scan rate, the amount of times you scan a value does not really impact how much you store because, well, scan rate is simply how often you look. You can look at a dull value all you want, and if it does not change we are simply not going to store it. We are going to store it based on the Compression Deviation. Now, some other things. Remember, we do store only actual Process Values. So, every single thing you see in the Archive, you know, these values right here. They are actual values. We do not ever make up data. Any kind of data that might be out of sequence -- if data comes in, oh, let's say for some reason, strange reason, in this particular sequence, we have values that come in, you know, at this time, and then again at this time, and then for some reason we get a value over here at that time -- and it actually comes in after this value over here comes in. Well, if that happens, this value is going to go in without Compression. We call that out-of-sequence data. And, all out-of-sequence data like that would be archived. It's rare that we see an interface that gets its data so out of order that it does that, but if it does, then that data will be written without Compression. Also, a point to note, as I have said many times now. Each and every tag has its own Compression setting. So, I am only looking at the Compression for the Tag CDT185 right now. So that is something to consider, Each Tag has its own. And, again, you can always opt out by turning Compressing off. I do want to reiterate that it is absolutely a trade-off when you set Data Compression. So, and, and we understand it and admit it. It is a trade-off, so I can understand the, the impulse to set Compression off, to turn it off entirely. If you want raw data, you can do that by simply setting Compressing Equal to Off. So, the trade-off is between the accuracy of the data and the speed of retrieval. It's really not the disk space that we are concerned about. It's mostly performance, especially performance when people are using a live tool like ProcessBook. You know, we have learned over the years, if it takes longer than a second or two for people to retrieve their data, they just stop going to PI. They, they will not use it as much if it takes a long time. So, you know, we, we would like for everybody to have well-tuned data so that their performance is very, very fast as they retrieve this data. So, and the trade-off, basically, is between the raw data -- you know, here's the raw data complete with all the line noise. If you trail this, you know, there's a tremendous amount of line noise that you see in here that really, probably, does not even represent useful information to anybody. That line noise is eliminated during the Exception Test. So, if you look at this point here, now we are looking at data that has passed that Exception Test, so the line noise is gone. And, after that line noise has been removed, we still have the opportunity to tighten up even more. A good example would be to look at this sequence right here. That sequence of values -- it all fits fairly closely to a certain vector, where if we simply record, you know, the beginning and the mid-point and the end, we can store all these values using a lot fewer values. So the Compression Test is -- the point of the Compression Test is to take this data and to see how efficiently we can store that in the Archives without losing any accuracy. But, admittedly, it is a trade-off.