Sitemap generators

jeanm

Member
I've just uploaded my "new" site and have been organizing the sitemap. I used http://www.sitemapdoc.com/Default.aspx

The results keep telling me I have 6 duplicate pages. All bar one are .htm pages which no longer exist on my site. The other is a .php but I find no trace of that being duplicated anywhere. I've checked and double checked everything so I don't know why I'm find these warnings.

My question is: Are these sitemap generators really accurate? I'm at a a loss to know how to fix the problems if they are.
 

smoovo

New Member
Open your site map file, it's on http://www.railway-train-travel.com.au/sitemap.xml and erase this code.

HTML:
<url>
    <loc>http://www.railway-train-travel.com.au/index.php</loc>
    <lastmod>2011-01-14</lastmod>
    <changefreq>daily</changefreq>
    <priority>0.5</priority>
</url>

You already have it without "index.php" (like you already figured).

The other pages has duplicate content. It doesn't mean that you have this URL duplicated, it does mean you have duplicated content. You have 2 (or more) pages with the same exact page content inside it.

The robot scans your pages and sees your script, if it has more pages with the same exact script it alerts for this matter.

You can leave it, but it will be bad for your SEO. Search engines don't like it, at all. My advice is to change your duplicate pages with a new content.

- Good Luck.
 

jeanm

Member
Thanks Smoovo. I just now deleted reference to the index.php page on the sitemap.xml. Regarding the other 5 problem pages:

I've just looked through every folder on the remote server and there is not one .htm file left there any more. When I re-designed the whole site I changed all the pages from .htm to .php . The content of each old .htm page was virtually identical to what it is now as a new .php page. Apart from the php extension, the name of each current page is identical to what it was on the "old" site.

During the transition period, which lasted a couple of weeks, virtually every page was duplicated on the remote server. When I finally had all the .php pages finished I then input some re-write code so that all the existing .htm links (out there on search engines and on other people's sites) forced the new .php page to appear in lieu. I checked everything was working well and I then went through the whole site and deleted every single old .htm page.

I've now been through the generated sitemap.xml with a fine tooth comb. The 5 problem pages are all ones where the sitemap generator has picked up both a .htm and a .php page of the same name. I don't know how this can be so if there are no .htm pages left on the site. As you can see, I can't delete those five .htm pages from my web site because they have already been deleted 2 days ago.

I've got rid of the 5 non-existent .htm references on the sitemap.xml but what will solve the problem if the sitemap generator is picking up pages that haven't existed for a couple of days.:confused:
 
Top