Inevitably I’ll stop using an antiquated css, script, or image file. Especially when a separate designer is tinkering with things and testing out a few versions of images. Before I build one myself, are there any tools out there that will drill through a website and list unlinked files? Specifically, I’m interested in ASP.NET MVC sites, so detecting calls to (and among many other things) @Url.Content(…) is important.
1
Aside strictly static website, the task would be rather random:
-
You can’t scan the source code in order to find the links, since links can be generated. Imagine the following case:
On a page, when a user effectuates an action, an image is added to the DOM (so you actually don’t have any
<img/>
element in HTML originally). The link to a image is assigned by JavaScript. In order to find a part of this link, JavaScript does an AJAX request; the other part is hardcoded in JavaScript code. The final URI is http://example.com/photos/nature/polar-bear.jpg?width=800The server receives the request for the image and rewrites the URL to http://example.com/generate-photo.aspx?category=nature&name=polar-bear.jpg&width=800. It appears that the new URI points to a dynamic resource which generates the image by taking an existent one (/photos/catalog/133d6566-3c98-4690-be4a-caad41c0e21d.jpg) and adding a copyright.
Could you possibly track this situation automatically?
-
You can’t rely on logs, since the fact that the resource was not requested for a while doesn’t mean that it will never be requested.
The only viable alternative is to:
-
List every resource on the website,
-
Collect the statistics from the logs in order to filter the resources which were used for the past N months. Don’t forget about a huge amount of small issues which can arise: remember that there is URL rewriting, that you need to canonize the requests, that there are default pages (http://example.com/index.html will mostly be called http://example.com/), etc.
-
Based on those statistics, forget about the resources which are in use: you don’t need to remove them.
-
For the remaining resources, try to guess for each one the context in which it could be used, and check if it is. This last step is extremely complex for a program and requires human brain (or years and years of R&D).
As a side note, do you know that instead of Url.Content
, ASP.NET MVC 4 allows to use ~ directly, like this:
<a href="~/Products/Edit/458">Edit</a>
1