I've already mentioned checking your website using a different browser. There are other ways of checking it.
If you don't have a doctype you'll get the message: “Error: missing document type declaration; assuming HTML 4.01 Transitional” but it will go ahead and validate the page anyway. If there are errors in your doctype you'll get the message: “Error: unrecognized DOCTYPE; unable to check document” and it won't do anything — I had several attempts before I got this right and I still don't understand it.
For a long time I had an error message: “Error: element div not allowed here; possible cause is an inline element containing a block-level element”. I had a div within an H2 tag. According to the list linked to by the error message, these are both block-level elements, but the validator still objects. I tried various other possibilities and eventually settled on replacing div by span. Similarly it objected to HR within H2 so I moved it.
You can just press [F5] (Refresh) to revalidate once you have uploaded a new version.
There is also an HTML Validator at http://validator.w3.org/ but it will only validate the specified page, not an entire site. Having spent a long time cleaning things up using the first validator I mentioned, I then tried one of my pages on this one and got,
No Character Encoding Found! Falling back to UTF-8.What does this mean? Well, in my page index.htm (which did validate successfully) I have a line in the Head section:
The document located at http://colinhume.com/american.htm was checked and found to be tentatively valid XHTML 1.0 Transitional. This means that with the use of some fallback or override mechanism, we successfully performed a formal validation using an SGML or XML Parser. In other words, the document would validate as XHTML 1.0 Transitional if you changed the markup to match the changes we have performed automatically, but it will not be valid until you make these changes.
<Meta Charset="utf-8">but I didn't have this in any other page — WebEdit now automatically adds it to all of them. The “u” in “utf-8” stands for “universal”, so that's the character set I recommend — if you're writing a Chinese website seek further information elsewhere.
You can specify whether or not you want warnings. As with the HTML Validator, I got a lot of output: mainly warnings but a couple of errors which I then corrected. Here are a few of the warnings.
Warning: The shorthand background property is more widely supported than background-color.background is a shorthand property which allows you to give background-color, background-image, background-repeat, background-attachment and background-position in one go. You don't need to give all the values in the list, and they don't need to be in this sequence. So to avoid the first warning, instead of background-color: red you can use background: red.
Warning: To help avoid conflicts with user style sheets, background and color properties should be specified together.
Warning: To help avoid conflicts with user style sheets, background-image should be specified whenever background-color is used. In most cases, background-image: none is suitable.
The second warning points out that users can have their own style-sheets, and if this specifies color: red he won't see anything! I was taking it for granted that body text would be black, but I'd do better to say so. The third warning is more far-fetched — would a user really add a background image to everything? And if he really wants to, should I stop him? But since I'm now using the shorthand, I might as well say background: red none and have done with it.
There were two which were errors rather than warnings:
BlockQuote.Table.TD Error: Only one class is allowed per simple selector.I've already mentioned the first. I was surprised that it didn't accept LightBlue as a colour, since it had worked on both the browsers I had tried, but I changed it to #ADD8E6 and got rid of the error message.
Background: LightBlue none Error: Invalid keyword.
The second CSS Validator is at:
http://jigsaw.w3.org/css-validator/validator.html W3.Org is the World Wide Web Consortium, which is the group which maintains web standards — so if anyone is in charge of the web, they are. One of their directors is Tim Berners-Lee, inventor of the World Wide Web. So their validator should be right, though it's interesting that each validator gives warnings that the other doesn't.
WebEdit will call these validators automatically for a single file or the whole site using Web|On-line Validate one and Web|On-line Validate all, and will abstract the information you need, so you don't have to scroll through lots of “Congratulations, no errors!” messages. It also gets round the 100 page limit. You specify which one of each pair of validators you want, and whether you want to see warnings or just errors, in the Options|Validators menu.
I found I had a folder called “logs” on my server. This had no files in it, but another folder with a random-looking name, presumably so that no-one else could guess where to find my log files. This contained many files with names like ex120229.log, which are log files produced to tell me which parts of my website have been accessed on that date (29th February 2012). After a time the log files are zipped to save space, so they will have names like ex120229.zip, but the program I'm recommending copes with those too. This is how you can analyse the data.
Find your log files using the Server menu, and allow WebEdit to create the necessary folders. Right-click in the main area and click Select All. Now right-click on any of the selected log files and click Download. Eventually all the log files will be downloaded to your local drive.
Go to http://www.weblogexpert.com/lite.htm and download WebLog Expert Lite, which is freeware. Install it and run it. (Once you're using it successfully you can delete the Sample profile and also the file sample.log in the program directory.) Click New and give the requested information. On the next screen, give the path to your log files on the local drive — for instance C:\External\Colin\Logs\W3SVC1198\*.* — there's a “Browse” button so you don't have to remember and type in the path.
Click Analyze and in a few seconds you will see lots of interesting information! For instance, the Referrers item of the Contents shows you how people got to your website and what they were looking for. I found that the most popular search engine (by a factor of 200) was Google, which I would have predicted. I discovered that by far the highest search phrase was “waltz steps”, which I certainly would not have predicted; “dance technique” came second, and most of the others were to do with waltz. So maybe I should devote more attention to this page. Maybe I should produce a downloadable video on waltzing and charge people for it. Whatever reason you had for creating your website, this is where you can see why people really visit it.
All of the sections are worth looking through. For instance, look at Browsers and decide which you can ignore and which you need to concentrate on.
And what about the final “Not found” section? In the space of 21 days I had 6,475 occurrences of “Code 404: Not found” — the others were 57 or less. What's going on? What were people looking for and not finding? The program doesn't tell you this — but now that I've realised I need to, I can search all the log files for “404” using WebEdit. What I found was that most of the lines containing “404” were of the form
“2006-09-28 02:35:49 10.2.5.20 GET /robots.txt — 80 — 184.108.40.206 — — 404 0 64”
robots.txt is a file telling browsers which pages you do not want indexed. I checked the official documentation and found
The presence of an empty “/robots.txt” file has no explicit associated semantics; it will be treated as if it was not present, i.e. all robots will consider themselves welcome.which was what I was hoping for. So I created a robots.txt file in my root directory containing a single blank line. In future I'll be able to find the 404 errors that really matter.
Later I discovered that some browsers look for a file called favicon.ico in the root directory. I do have one of these files, but I'd tidily moved it to my images directory. I moved it back! This is the icon which is used if someone puts a shortcut to one of your website pages on their desktop, and may appear in the address bar for the page and in bookmarks. Read about it at http://en.wikipedia.org/wiki/Favicon.
When you've finished studying the logs, you can reclaim the space on the server and your local drive. Back in the Server display of the log files, right-click the main page, click Select All, right-click any selected file and click Delete. The log files will be deleted — apart from the one currently in use by the server.
You can also use the Google Webmaster Tools at https://www.google.com/webmasters/tools. I think to use this you need to upload a sitemap to Google — there's an option in WebEdit to scan your sitemap.htm and generate sitemap.xml which is the form Google wants the information in. You can then see, for instance, the two websites which refer to a page you have deleted, and you can email the owners of these sites asking them to change or remove the failing link.
Another site is www.statcounter.com and there are many more, but I suspect Google is the best (as usual).
After you've downloaded the site, you will find all the pages and images under Desktop (from the WebEdit Open dialogue click the Windows Open button), then Reaped Sites, then the URL of the website.