I have a website that takes two primary get strings:
?type=GAME&id=SomeGameID
?type=SCENARIO&id=SomeScenarioID
for reasons unknown, I have recently begun receiving requests for erroneous get strings from both Yandex and Baidu. They are always in the form of:
?type=GAME&id=SomeScenarioID
None of my users are triggering these errors, so I am (sort of) confident that this is not due to an HTML template error somewhere on my part. There is also no HTTP_REFER showing up in the $_SERVER array, so I’m guessing these are direct requests from bad dbase data on their part.
I see two options for dealing with these bad requests, and would like to know which is recommended… or if there are other, better options I have not thought of:
- simply 404 the request, since it is incorrect
- redirect the request to
?type=SCENARIO&id=SomeScenarioID
because the scenario IDs are always valid, the breakage is due to asking for the wrong type.
2
I would stick with rejecting the request, however as the url being requested does exist, the error code 400 seems more appropriate.
From the w3 http/1.1 standards:
10.4.1 400 Bad Request
The request could not be understood by the server due to malformed
syntax. The client SHOULD NOT repeat the request without
modifications.
4
I would favor 404 over 400 in this situation. 400 means the request is malformed, but the request you described is not malformed. The request is for a GAME that does not exist; this is the definition of error 404.
If there were a way to know: the request is for a GAME, but the ID given is clearly a scenario ID, then this would constitute malformed syntax. However, based on your statements in the comments, there is no discernible difference between game IDs and scenario IDs. The game ID being requested is a valid ID, it simply doesn’t exist.
This is the same reason your redirect idea would not be good. You can’t know that someone is actually looking for a SCENARIO if they have specifically asked for a GAME. If a consumer asked for a GAME but accidentally passed a bad ID and got redirected to a SCENARIO, they would be terribly confused.
I concur that 400 is a better response than 404.
When you say “I’m guessing these are direct requests from bad dbase data on their part”. I interpret that as meaning that the requests are coming from web crawlers. If that is the case, then another alternative would be to configure your site’s robots.txt file to tell the Yandex and Baidu crawlers to stay out of the part of your site with the queries. Or maybe just block them.
Assuming that this is crawler activity, the problem might not be bad data (which would have had to come from somewhere). It is possible that the engines are trying alternatives in a rather lame fashion in order to harvest stuff that isn’t directly linked. (It sounds like a stupid thing to do, but you never know …)
On the other hand, if this is not web crawler activity, perhaps it is the result of some users trying to access your site using some custom client rather than using the UI implemented by your web templates. That would also explain the lack of referrer headers.
1