BackTools for checking your website   Previous session  Course Index  Next session



I've already mentioned checking your website using a different browser.  There are other ways of checking it.

Link checker

This checks all the links in your website, both internal and external.  No doubt they all worked when you set them up, but things change.  The page you're linking to may have been moved or may no longer exists at all.  You may have deleted one of your own pages and forgotten that there was still a link to it from your site map.  So go to http://home.snafu.de/tilman/xenulink.html and download a free copy of Xenu Link Checker.  Open the zip file and run setup.exe.  You can accept the offered location for the program or choose your own.  Now run the program.  Click File |Check URL, enter the address of your website and press [Enter].  Eventually it will say “Link sleuth finished.  Do you want a report?”  and when you press [Enter] it will display in your browser a web page showing you all the broken links and the pages which contain them.  Occasionally it will have trouble accessing a link and you find that when you click on it there's no problem, or maybe it takes too long for the server to respond.  But in my experience, if it says the page doesn't exist it's never wrong.  Near the bottom of the page is HTML which you could grab for a Site Map if you want one.  Below this is a list of broken links within your pages — the Anchor business I've already mentioned in the section on Links.  Very useful stuff — don't hesitate!

HTML Validator

There are many of these, free and otherwise.  I use the web-based validator at http://www.htmlhelp.org/tools/validator/ and understand most of the messages it produces.  It's a real shock the first time you validate the website you have so painstakingly created and are greeted by dozens of error messages.  Specify your full website name and don't forget to tick the box that says “Validate entire site”.  There's a limit of 100 pages, which I started hitting while producing these notes.

If you don't have a doctype you'll get the message: “Error: missing document type declaration; assuming HTML 4.01 Transitional” but it will go ahead and validate the page anyway.  If there are errors in your doctype you'll get the message: “Error: unrecognized DOCTYPE; unable to check document” and it won't do anything — I had several attempts before I got this right and I still don't understand it.

For a long time I had an error message: “Error: element div not allowed here; possible cause is an inline element containing a block-level element”.  I had a div within an H2 tag.  According to the list linked to by the error message, these are both block-level elements, but the validator still objects.  I tried various other possibilities and eventually settled on replacing div by span.  Similarly it objected to HR within H2 so I moved it.

You can just press [F5] (Refresh) to revalidate once you have uploaded a new version.

There is also an HTML Validator at http://validator.w3.org/ but it will only validate the specified page, not an entire site.  Having spent a long time cleaning things up using the first validator I mentioned, I then tried one of my pages on this one and got,

No Character Encoding Found!  Falling back to UTF-8.

The document located at http://colinhume.com/american.htm was checked and found to be tentatively valid XHTML 1.0 Transitional.  This means that with the use of some fallback or override mechanism, we successfully performed a formal validation using an SGML or XML Parser.  In other words, the document would validate as XHTML 1.0 Transitional if you changed the markup to match the changes we have performed automatically, but it will not be valid until you make these changes.

What does this mean?  Well, in my page index.htm (which did validate successfully) I have a line in the Head section:  <Meta Charset="utf-8">but I didn't have this in any other page — WebEdit now automatically adds it to all of them.  The “u” in “utf-8” stands for “universal”, so that's the character set I recommend — if you're writing a Chinese website seek further information elsewhere.

CSS Validator

I use two.  The first is from the same site:

http://www.htmlhelp.org/tools/csscheck/

You can specify whether or not you want warnings.  As with the HTML Validator, I got a lot of output: mainly warnings but a couple of errors which I then corrected.  Here are a few of the warnings.

Warning: The shorthand background property is more widely supported than background-color.

Warning: To help avoid conflicts with user style sheets, background and color properties should be specified together.

Warning: To help avoid conflicts with user style sheets, background-image should be specified whenever background-color is used.  In most cases, background-image: none is suitable.

background is a shorthand property which allows you to give background-color, background-image, background-repeat, background-attachment and background-position in one go.  You don't need to give all the values in the list, and they don't need to be in this sequence.  So to avoid the first warning, instead of background-color: red you can use background: red.

The second warning points out that users can have their own style-sheets, and if this specifies color: red he won't see anything!  I was taking it for granted that body text would be black, but I'd do better to say so.  The third warning is more far-fetched — would a user really add a background image to everything?  And if he really wants to, should I stop him?  But since I'm now using the shorthand, I might as well say background: red none and have done with it.

There were two which were errors rather than warnings:

BlockQuote.Table.TD     Error: Only one class is allowed per simple selector.

Background: LightBlue none     Error: Invalid keyword.

I've already mentioned the first.  I was surprised that it didn't accept LightBlue as a colour, since it had worked on both the browsers I had tried, but I changed it to #ADD8E6 and got rid of the error message.

The second CSS Validator is at:

http://jigsaw.w3.org/css-validator/validator.html  W3.Org is the World Wide Web Consortium, which is the group which maintains web standards — so if anyone is in charge of the web, they are.  One of their directors is Tim Berners-Lee, inventor of the World Wide Web.  So their validator should be right, though it's interesting that each validator gives warnings that the other doesn't.

WebEdit will call these validators automatically for a single file or the whole site using Web|On-line Validate one and Web|On-line Validate all, and will abstract the information you need, so you don't have to scroll through lots of “Congratulations, no errors!” messages.  It also gets round the 100 page limit.  You specify which one of each pair of validators you want, and whether you want to see warnings or just errors, in the Options|Validators menu.

Firefox

The Firefox browser has lots of free add-ins.  One of the best was called Firebug — it's now officially part of Firefox and you press [F12] to activate it.  It's amazing!  In particular, if you position to an element in your HTML it will tell you which CSS elements are affecting it, and it will put a horizontal line through items which have been overridden.  So you can see that a particular element has picked up color: white from one element and ignored color: black from a more general element.  This could save you a lot of time!  Other browsers now have similar tools, but I still prefer Firefox.

Log file analyser

If you are on a server that you pay for, you may well have log files generated automatically, and it's worth looking at them.

I found I had a folder called “logs” on my server.  This had no files in it, but another folder with a random-looking name, presumably so that no-one else could guess where to find my log files.  This contained many files with names like ex120229.log, which are log files produced to tell me which parts of my website have been accessed on that date (29th February 2012).  After a time the log files are zipped to save space, so they will have names like ex120229.zip, but the program I'm recommending copes with those too.  This is how you can analyse the data.

Find your log files using the Server menu, and allow WebEdit to create the necessary folders.  Right-click in the main area and click Select All.  Now right-click on any of the selected log files and click Download.  Eventually all the log files will be downloaded to your local drive.

Go to http://www.weblogexpert.com/lite.htm and download WebLog Expert Lite, which is freeware.  Install it and run it.  (Once you're using it successfully you can delete the Sample profile and also the file sample.log in the program directory.)  Click New and give the requested information.  On the next screen, give the path to your log files on the local drive — for instance C:\External\Colin\Logs\W3SVC1198\*.* — there's a “Browse” button so you don't have to remember and type in the path.

Click Analyze and in a few seconds you will see lots of interesting information!  For instance, the Referrers item of the Contents shows you how people got to your website and what they were looking for.  I found that the most popular search engine (by a factor of 200) was Google, which I would have predicted.  I discovered that by far the highest search phrase was “waltz steps”, which I certainly would not have predicted; “dance technique” came second, and most of the others were to do with waltz.  So maybe I should devote more attention to this page.  Maybe I should produce a downloadable video on waltzing and charge people for it.  Whatever reason you had for creating your website, this is where you can see why people really visit it.

All of the sections are worth looking through.  For instance, look at Browsers and decide which you can ignore and which you need to concentrate on.

And what about the final “Not found” section?  In the space of 21 days I had 6,475 occurrences of “Code 404: Not found” — the others were 57 or less.  What's going on?  What were people looking for and not finding?  The program doesn't tell you this — but now that I've realised I need to, I can search all the log files for “404” using WebEdit.  What I found was that most of the lines containing “404” were of the form

“2006-09-28 02:35:49 10.2.5.20 GET /robots.txt — 80 — 219.142.118.81 — — 404 0 64”

robots.txt is a file telling browsers which pages you do not want indexed.  I checked the official documentation and found

The presence of an empty “/robots.txt” file has no explicit associated semantics; it will be treated as if it was not present, i.e. all robots will consider themselves welcome.
which was what I was hoping for.  So I created a robots.txt file in my root directory containing a single blank line.  In future I'll be able to find the 404 errors that really matter.

Later I discovered that some browsers look for a file called favicon.ico in the root directory.  I do have one of these files, but I'd tidily moved it to my images directory.  I moved it back!  This is the icon which is used if someone puts a shortcut to one of your website pages on their desktop, and may appear in the address bar for the page and in bookmarks.  Read about it at http://en.wikipedia.org/wiki/Favicon.

When you've finished studying the logs, you can reclaim the space on the server and your local drive.  Back in the Server display of the log files, right-click the main page, click Select All, right-click any selected file and click Delete.  The log files will be deleted — apart from the one currently in use by the server.

Google Analytics

If you don't have log files automatically generated (or even if you do) there are web sites which will do a similar job.  Read the Wikipedia article on Google Analytics and decide whether you want to use it.  If so, go to http://www.google.com/analytics/ and create an account.  You'll then be shown the code which you need to paste into each of your web pages (or all those you want to keep track of) just before the end of the Body.  Don't immediately check that it has done anything — it will say the code hasn't been found.  Wait a day or so and then check — login and click on the appropriate “View report”.

You can also use the Google Webmaster Tools at https://www.google.com/webmasters/tools.  I think to use this you need to upload a sitemap to Google — there's an option in WebEdit to scan your sitemap.htm and generate sitemap.xml which is the form Google wants the information in.  You can then see, for instance, the two websites which refer to a page you have deleted, and you can email the owners of these sites asking them to change or remove the failing link.

Another site is www.statcounter.com and there are many more, but I suspect Google is the best (as usual).

Website Grabber

If you want to see how somebody else's website works (or if you've been asked to take over a website and have no idea what it consists of), a very useful free tool is WebReaper from www.webreaper.net.  This will download the entire site to your hard disk so that you can search for particular items or whatever.  When I ran it I found it was missing file msvcr70.dll, so I downloaded this from www.dll-files.com and unzipped it into C:\Windows\System32 without any problems.

After you've downloaded the site, you will find all the pages and images under Desktop (from the WebEdit Open dialogue click the Windows Open button), then Reaped Sites, then the URL of the website.

Website Grader

If you're hoping to make money from your site, or if you're simply hoping that people will find it, you might find it worthwhile running the free Website Grader at marketing.grader.com.  It will give you lots of hints about Search Engine Optimisation, and will compare your site with any rival sites you specify.