First I will explain what i try to do:
I need to know how many visitors has visited at least two articles from a given set of articles (rather at least two articles in each issue printed). This could be done with content groups (or so it is said) but that does not work with old data. This could be done in BigQuery, but that also does not work with old data (and i did not get clean results when i did it with new data either). And for reporting reasons I need to use old data as well.
How to do it in google API:
First i check how many users has read any article in a given issue. Lets call those users A (and the set of articles S)
Then for a given article s I check how many has read at least one of the other articles (lets call it A_ñ) Now to se how many users that has read only a_n I can extract A_ñ from A)
Now the sum of those A_n is the amount of users that has read only one article from the issue, and If i extract that sum from A, I get the amount of users that has read at least 2 articles.
This method works fine. Until, for some reasons it starts to get inconsistent results from the Google Data API.
The inconsistency is predictable and follows some kind of wave function. Not all issues of the year do have this problem, but all that do share the wave structure (but the periodicity seems to differ (not enough data to tell for certian, though))
I’ve tried to change collection time of the data. In the beginning I asked for yesterdays data and thought the cause would be that the servers where not updated yet. But If i go back retroactively and ask for the data back a month, I get the same result as for yesterday (and for the data two days ago).
As there are alot of articles, and it works neatly in the set of few articles, it’s very hard to get the scale high enough in the graphical interface to replicate the error.
If you deem it needed I will update with the php code making the calls. But given the nature of the error, the pattern, I would prefer if the replies were from a systematic pov.