Swoogle Home

Documentation

publications
manual
news

Login Form






Lost Password?
No account yet? Register

BSQ Sitestats Summary

Site Stats Summary
  Hits Visitors
Today 2216 299
Week 23617 2262

Swoogle Manual


Swooglebot:  Swoogle's Semantic Web Crawler

Swooglebot collects Semantic Web Documents from the Web to build a semantic web search engine - Swoogle. On this page, we describe our crawling strategy.

How do I prevent Swooglebot from crawl parts or all of my site?

Before crawling your website, Swoolgebot will try to read the file robots.txt, which could be used to disallow crawler programs to download some or all information from your web server. The standard format and usage of robots.txt is stated as the Robots Exclusion Protocol. Swooglebot analyzes all the records of robots.txt and obeys the rules for User-Agent of "Swooglebot". If no entry is specified for Swooglebot, it follows the rules in  the first record of User-Agent "*".  Please note changing the robots.txt (and  META tags) in your web pages will not effect Swoogle's indexing result immediately. You might need to wait for one week or more to see the results.

(future work: Swooglebot will check the standard the Robots META tag, which tells robots not to index a webpage or follow links in it.)

How do I suggest my site for Swooglebot to index?

Go to the  Submit URL  page.

Why is Swooglebot downloading the same page on my site multiple times?

In general, Swooglebot only download each file for one time.  Occasionally the crawler is stopped or crashed,  and then restarted, and its crawl job may be scrolled back. 

What kinds of links does Swooglebot follow?

Swooglebot follows HREF links and SRC links of html files and some URIs of XML files.

My Swooglebot question is not answered here. Where do I send my question?

Please send questions regarding our Swooglebot  to swoogle-developers@cs.umbc.edu.

manual  o   news  o   faq  o   web-service  o   submit-url  o   sw-archive  o   feedback  o   swoogle2005

Swoogle © 2004-2007, ebiquity group at UMBC
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.