DMac
9th December 2011, 08:18 AM
Searching the internet as most people know it exists through major search sites such as google, bing, yahoo et al. What most people don't know is that the entire web is much bigger than the amount of data that is able to be collected by the mainstream search engines.
Take for example Google. Google uses a web crawling technology that follows public link after link to create an index of the web. The other mainstream search engines use similar technologies.
Something to keep in mind though, is that there are ways to keep a website and its data from being indexed by web crawlers such as Google.
This information is not necessarily private and it is accessible by any given anonymous user on the web.
Enter the "deep web" or invisible internet.
Let's take a look at what Wikipedia has to say about the deep web (or darknet) for some basic introduction to the subject:
Introduction
The Deep Web (also called Deepnet, the invisible Web, DarkNet, Undernet, or the hidden Web) refers to World Wide Web content that is not part of the Surface Web, which is indexed by standard search engines.
Mike Bergman, founder of BrightPlanet, credited with coining the phrase, has said that searching on the Internet today can be compared to dragging a net across the surface of the ocean: a great deal may be caught in the net, but there is a wealth of information that is deep and therefore missed. Most of the Web's information is buried far down on dynamically generated sites, and standard search engines do not find it. Traditional search engines cannot "see" or retrieve content in the deep Web—those pages do not exist until they are created dynamically as the result of a specific search. The deep Web is several orders of magnitude larger than the surface Web.
Size
Estimates based on extrapolations from a study done at University of California, Berkeley in the year 2000, speculate that the deep Web consists of about 91,000 terabytes. By contrast, the surface Web (which is easily reached by search engines) is about 167 terabytes; the Library of Congress, in 1997, was estimated to have 3,000 terabytes. More accurate estimates are available for the number of resources in the deep Web: He et al. detected around 300,000 deep web sites in the entire Web in 2004, and, according to Shestakov, around 14,000 deep web sites existed in the Russian part of the Web in 2006.
That's a tremendous amount of data hidden from most people's view! It is also interesting to note that these statistics are over 5 years old. The amount of data currently hidden from the mainstream public is likely much higher in today's world.
Here are some search engines that use different technology than the mainstream search engines one can use to explore different sections of the 'darknet':
10 Search Engines to Explore the Invisible Web (http://www.makeuseof.com/tag/10-search-engines-explore-deep-invisible-web/)
Infomine (http://infomine.ucr.edu/)
Infomine has been built by a pool of libraries in the United States. Some of them are University of California, Wake Forest University, California State University, and the University of Detroit. Infomine “˜mines’ information from databases, electronic journals, electronic books, bulletin boards, mailing lists, online library card catalogs, articles, directories of researchers, and many other resources.
You can search by subject category and further tweak your search using the search options. Infomine is not only a standalone search engine for the Deep Web but also a staging point for a lot of other reference information. Check out its Other Search Tools and General Reference links at the bottom.
The WWW Virtual Library (http://vlib.org/)
This is considered to be the oldest catalog on the web and was started by started by Tim Berners-Lee, the creator of the web. So, isn’t it strange that it finds a place in the list of Invisible Web resources? Maybe, but the WWW Virtual Library lists quite a lot of relevant resources on quite a lot of subjects. You can go vertically into the categories or use the search bar. The screenshot shows the alphabetical arrangement of subjects covered at the site.
Intute (http://www.intute.ac.uk/)
Intute is UK centric, but it has some of the most esteemed universities of the region providing the resources for study and research. You can browse by subject or do a keyword search for academic topics like agriculture to veterinary medicine. The online service has subject specialists who review and index other websites that cater to the topics for study and research.
Intute also provides free of cost over 60 free online tutorials to learn effective internet research skills. Tutorials are step by step guides and are arranged around specific subjects.
Complete Planet (http://aip.completeplanet.com/)
Complete Planet calls itself the “˜front door to the Deep Web’. This free and well designed directory resource makes it easy to access the mass of dynamic databases that are cloaked from a general purpose search. The databases indexed by Complete Planet number around 70,000 and range from Agriculture to Weather. Also thrown in are databases like Food & Drink and Military.
For a really effective Deep Web search, try out the Advanced Search options where among other things, you can set a date range.
Infoplease (http://www.infoplease.com/index.html)
Infoplease is an information portal with a host of features. Using the site, you can tap into a good number of encyclopedias, almanacs, an atlas, and biographies. Infoplease also has a few nice offshoots like Factmonster.com for kids and Biosearch, a search engine just for biographies.
DeepPeep (http://www.deeppeep.org/)
DeepPeep aims to enter the Invisible Web through forms that query databases and web services for information. Typed queries open up dynamic but short lived results which cannot be indexed by normal search engines. By indexing databases, DeepPeep hopes to track 45,000 forms across 7 domains.
The domains covered by DeepPeep (Beta) are Auto, Airfare, Biology, Book, Hotel, Job, and Rental. Being a beta service, there are occasional glitches as some results don’t load in the browser.
IncyWincy (http://www.incywincy.com/)
IncyWincy is an Invisible Web search engine and it behaves as a meta-search engine by tapping into other search engines and filtering the results. It searches the web, directory, forms, and images. With a free registration, you can track search results with alerts.
DeepWebTech (http://www.deepwebtech.com/)
DeepWebTech gives you five search engines (and browser plugins) for specific topics. The search engines cover science, medicine, and business. Using these topic specific search engines, you can query the underlying databases in the Deep Web.
Scirus (http://www.scirus.com/srsapp/)
Scirus has a pure scientific focus. It is a far reaching research engine that can scour journals, scientists’ homepages, courseware, pre-print server material, patents and institutional intranets.
TechXtra (http://www.techxtra.ac.uk/index.html)
TechXtra concentrates on engineering, mathematics and computing. It gives you industry news, job announcements, technical reports, technical data, full text eprints, teaching and learning resources along with articles and relevant website information.
Just like general web search, searching the Invisible Web is also about looking for the needle in the haystack. Only here, the haystack is much bigger. The Invisible Web is definitely not for the casual searcher. It is a deep but not dark because if you know what you are searching for, enlightenment is a few keywords away.
Take for example Google. Google uses a web crawling technology that follows public link after link to create an index of the web. The other mainstream search engines use similar technologies.
Something to keep in mind though, is that there are ways to keep a website and its data from being indexed by web crawlers such as Google.
This information is not necessarily private and it is accessible by any given anonymous user on the web.
Enter the "deep web" or invisible internet.
Let's take a look at what Wikipedia has to say about the deep web (or darknet) for some basic introduction to the subject:
Introduction
The Deep Web (also called Deepnet, the invisible Web, DarkNet, Undernet, or the hidden Web) refers to World Wide Web content that is not part of the Surface Web, which is indexed by standard search engines.
Mike Bergman, founder of BrightPlanet, credited with coining the phrase, has said that searching on the Internet today can be compared to dragging a net across the surface of the ocean: a great deal may be caught in the net, but there is a wealth of information that is deep and therefore missed. Most of the Web's information is buried far down on dynamically generated sites, and standard search engines do not find it. Traditional search engines cannot "see" or retrieve content in the deep Web—those pages do not exist until they are created dynamically as the result of a specific search. The deep Web is several orders of magnitude larger than the surface Web.
Size
Estimates based on extrapolations from a study done at University of California, Berkeley in the year 2000, speculate that the deep Web consists of about 91,000 terabytes. By contrast, the surface Web (which is easily reached by search engines) is about 167 terabytes; the Library of Congress, in 1997, was estimated to have 3,000 terabytes. More accurate estimates are available for the number of resources in the deep Web: He et al. detected around 300,000 deep web sites in the entire Web in 2004, and, according to Shestakov, around 14,000 deep web sites existed in the Russian part of the Web in 2006.
That's a tremendous amount of data hidden from most people's view! It is also interesting to note that these statistics are over 5 years old. The amount of data currently hidden from the mainstream public is likely much higher in today's world.
Here are some search engines that use different technology than the mainstream search engines one can use to explore different sections of the 'darknet':
10 Search Engines to Explore the Invisible Web (http://www.makeuseof.com/tag/10-search-engines-explore-deep-invisible-web/)
Infomine (http://infomine.ucr.edu/)
Infomine has been built by a pool of libraries in the United States. Some of them are University of California, Wake Forest University, California State University, and the University of Detroit. Infomine “˜mines’ information from databases, electronic journals, electronic books, bulletin boards, mailing lists, online library card catalogs, articles, directories of researchers, and many other resources.
You can search by subject category and further tweak your search using the search options. Infomine is not only a standalone search engine for the Deep Web but also a staging point for a lot of other reference information. Check out its Other Search Tools and General Reference links at the bottom.
The WWW Virtual Library (http://vlib.org/)
This is considered to be the oldest catalog on the web and was started by started by Tim Berners-Lee, the creator of the web. So, isn’t it strange that it finds a place in the list of Invisible Web resources? Maybe, but the WWW Virtual Library lists quite a lot of relevant resources on quite a lot of subjects. You can go vertically into the categories or use the search bar. The screenshot shows the alphabetical arrangement of subjects covered at the site.
Intute (http://www.intute.ac.uk/)
Intute is UK centric, but it has some of the most esteemed universities of the region providing the resources for study and research. You can browse by subject or do a keyword search for academic topics like agriculture to veterinary medicine. The online service has subject specialists who review and index other websites that cater to the topics for study and research.
Intute also provides free of cost over 60 free online tutorials to learn effective internet research skills. Tutorials are step by step guides and are arranged around specific subjects.
Complete Planet (http://aip.completeplanet.com/)
Complete Planet calls itself the “˜front door to the Deep Web’. This free and well designed directory resource makes it easy to access the mass of dynamic databases that are cloaked from a general purpose search. The databases indexed by Complete Planet number around 70,000 and range from Agriculture to Weather. Also thrown in are databases like Food & Drink and Military.
For a really effective Deep Web search, try out the Advanced Search options where among other things, you can set a date range.
Infoplease (http://www.infoplease.com/index.html)
Infoplease is an information portal with a host of features. Using the site, you can tap into a good number of encyclopedias, almanacs, an atlas, and biographies. Infoplease also has a few nice offshoots like Factmonster.com for kids and Biosearch, a search engine just for biographies.
DeepPeep (http://www.deeppeep.org/)
DeepPeep aims to enter the Invisible Web through forms that query databases and web services for information. Typed queries open up dynamic but short lived results which cannot be indexed by normal search engines. By indexing databases, DeepPeep hopes to track 45,000 forms across 7 domains.
The domains covered by DeepPeep (Beta) are Auto, Airfare, Biology, Book, Hotel, Job, and Rental. Being a beta service, there are occasional glitches as some results don’t load in the browser.
IncyWincy (http://www.incywincy.com/)
IncyWincy is an Invisible Web search engine and it behaves as a meta-search engine by tapping into other search engines and filtering the results. It searches the web, directory, forms, and images. With a free registration, you can track search results with alerts.
DeepWebTech (http://www.deepwebtech.com/)
DeepWebTech gives you five search engines (and browser plugins) for specific topics. The search engines cover science, medicine, and business. Using these topic specific search engines, you can query the underlying databases in the Deep Web.
Scirus (http://www.scirus.com/srsapp/)
Scirus has a pure scientific focus. It is a far reaching research engine that can scour journals, scientists’ homepages, courseware, pre-print server material, patents and institutional intranets.
TechXtra (http://www.techxtra.ac.uk/index.html)
TechXtra concentrates on engineering, mathematics and computing. It gives you industry news, job announcements, technical reports, technical data, full text eprints, teaching and learning resources along with articles and relevant website information.
Just like general web search, searching the Invisible Web is also about looking for the needle in the haystack. Only here, the haystack is much bigger. The Invisible Web is definitely not for the casual searcher. It is a deep but not dark because if you know what you are searching for, enlightenment is a few keywords away.