Printers / Mobile / Screenreaders
Research Guides
Admin Sign In 

The Secret Web  Tags: secret_web web_secrets hidden_web  

Advanced web mining.
Last update: Apr 11th, 2009 URL: http://lanecc.libguides.com/secretweb  Print Guide  RSS Updates

Home             Print Page
  
 

The secret web

Contents

Non-indexed special databases
Search engines
Indexes and directories
Deep web searching
Search engine special features
The archived web
RSS feeds
Customized information blogs

External resources

UC Berkeley Invisible/Deep Web Workshops

Wikipedia "Deep Web" article


Non-indexed special databases

Much of the web is "hidden" or "invisible" - some estimates say up to 60%. One of the reasons is that much information is valuable and proprietary, and so commands a price. Many people for example think that all magazine & news articles are freely available on the web - not so, in most cases. Those articles cost time, money and effort to write & publish, and companies want to extract maximum value from them. So most published material resides in commercial databases, which are packaged and resold to companies and libraries. This is part of the web in a sense, but only accessible to a restricted clientele. At Lane, we provide a number of these databases, but there are thousands out there - some highly restricted and valuable, some quite readily available. To see our databases - about 45 of them - go to Lane's Find an Article page:

http://lanecc.edu/library/find/article.htm

Additionally, much of the web is hidden for secrecy and privacy - intranets for organizations, governments and companies are designed for a few discreet users, and the public is kept out. Hackers spend a lot of time and energy trying to crack these intranets and databases.

At its simplest, any web file is hidden unless it's probed by a web spider or bot - and to be found this way, it must be externally linked. So much of the internet is also lost inadvertently, thru careless web design, or thru deliberate cloaking

For more on the invisible web, check out the Invisible/Deep Web Workshops from UC Berkeley Libraries.


Search engines

Search engines are the way most people use the Internet. About 80% of web searching is thru Google, with Yahoo and MSN making up most of the rest. All these engines work by sending out bots or spiders to prowl the 'net, looking for keywords and meta descriptions, and flowing the information back to the company's servers. Therefore the web is only as current as the most recent visit from each engine's spider - we think about monthly, altho it's a guarded secret.

Google
http://google.com/

MSN Search
http://search.msn.com/

Search Engine Colossus
http://www.searchenginecolossus.com/

Search Engines Worldwide
http://home.inter.net/takakuwa/search/

SurfWax
http://www.surfwax.com/

Vivisimo
http://vivisimo.com/

Yahoo!
http://search.yahoo.com/

A Collection of Special Search Engines
http://www.leidenuniv.nl/ub/biv/specials.htm


Indexes and Directories

Most people are used to looking for information from directories and indexes - that's what a phone book is. The Internet also has a number of directories that allow you to search by traveling thru menus. Sometimes these are called subject trees or subject directories.

BUBL Link by Subject
http://bubl.ac.uk/link/subjectbrowse.cfm

Directory Resources
http://www.DirectoryResources.info/

Dmoz Open Directory Project
http://www.dmoz.org/

INFOMINE
http://infomine.ucr.edu/

Librarians' Index to the Internet
http://www.lii.org/

MegaSources
http://www.ryerson.ca/~dtudor/megasources.htm

Resource Discovery Network
http://www.rdn.ac.uk//

Web Based Resources
http://www.i8.com/

Yahoo!
http://www.Yahoo.com/



Custom search and deep web research

Searching the Internet with your own SearchBot or developing a resource list of areas of deep web search will aid you in the discovery of new information as well as going where no search engines have traveled. Here are a few of these resources:

Bot Research
http://www.BotResearch.info/

Deep Web Research
http://www.DeepWebResearch.info/

Finding Information on the Internet - Internet Tutorials
http://www.academicinfo.net/reffind.html

Finding What You Need With the Best Search Engines
http://www.philb.com/whichengine.htm

Invisible-Web
http://www.invisible-web.net/

Knowledge Discovery
http://www.knowledgediscovery.info/

Mining The Invisible Web
http://www.miningtheinvisibleweb.com/

ProFusion
http://www.profusion.com/

Tool Kit for the Expert Web Searcher
http://www.ala.org/ala/lita/litaresources/toolkitforexpert/toolkitexpert.htm


Search engine special features

There's a lot going on in Google that's not immediately apparent. Here's a good book:

Google hacks by Tara Calishain
http://library.lanecc.edu/search/t?SEARCH=Google+hacks

This book covers:

  • Searching Google: great hints on constructing good searches
  • Google special services and collections
  • Third-party Google services
  • Non-API Google applications
  • Introducing the Google Web API
  • Google Web applications
  • Google pranks and games
  • The Webmaster side of Google

However Google is constantly evolving. What's new?

Google Images
http://www.google.com/advanced_image_search?hl=en
will connect you to millions of images to enhance your paper or web presentation. Use the advanced search feature.

Google Groups is the old Usenet system, taken over by Google:
http://groups-beta.google.com/
Search here for up-to-the-minute comments that'll keep your perspective fresh.

But also remember Yahoo groups, which is bigger and brighter (in my view):
http://groups.yahoo.com/.

Another huge groups collective is the blogging world of Livejournal:
http://www.livejournal.com/
Here thousands of virtual communities share ideas, hints and information in the most esoteric areas imaginable.

Google Suggest is a new hidden feature that is not yet available from the Google main page
http://www.google.com/webhp?complete=1
Google Suggest provides you with search suggestions, in real time, as you type. Key in a few letters of a particular search term and Google Suggest displays a list of words it thinks match. Google Suggest works in Internet Explorer 6.0+, Netscape 7.1+, Mozilla 1.4+, Firefox 0.8+, Opera 7.54+, or Safari 1.2.2+. Both JavaScript and cookies must be enabled in your browser.

Try Google Labs for the beta versions of new Google products:
http://labs.google.com/

such as Google video - search tv programs!
http://video.google.com/

and finally Google's most sensational recent innovation, the satellite search:
http://maps.google.com/

click on "Satellite" in the top right, find your house!

Here is Eugene-Springfield


The archived web


Sometimes it's useful to peek back in time at long-dead web pages - the information may have been deleted, but it could still be useful. To do this, go to the Waybackmachine at:

http://archive.org/

This is what Lane's web page looked like in Jan. 1997!

http://web.archive.org/web/*/http://lanecc.edu/


RSS (Real Simple Syndication)

A new way to keep up with web information is to subscribe to an RSS feed. These work with your browser. The Firefox browser has a built in reader, or there are several add-ons such as Sage. Safari also has a built in RSS reader. For IE there are several: Pluck http://www.pluck.com/ and TinyRSS http://www.codeproject.com/jscript/TinyRSS.asp.


Customized blogs

Finally, here are some customized information blogs (aka subject tracers) assembled by Marcus Zillman:

Agriculture Resources
http://www.AgricultureResources.info/

Artificial Intelligence Resources
http://www.AIResources.info/

Astronomy Resources
http://www.AstronomyResources.info/

Auction Resources
http://www.AuctionResources.info/

Biological Informatics
http://BiologicalInformatics.info/

Bot Research
http://www.BotResearch.info/

Business Intelligence Resources
http://www.BIResources.info/

ChatterBots
http://www.ChatterBots.info/

Data Mining Resources
http://www.DataMiningResources.info/

Deep Web Research
http://www.DeepWebResearch.info/

Directory Resources
http://www.DirectoryResources.info/

eCommerce Resources
http://www.eCommerceResources.info/

Elder Resources
http://www.ElderResources.info/

Employment Resources
http://www.EmploymentResources.info/

Entrepreneurial Resources
http://www.EntrepreneurialResources.info/

Financial Sources
http://www.FinancialSources.info/

Finding People
http://www.FindingPeople.info/

Games Resources
http://www.GamesResources.info/

Genealogy Resources
http://www.GenealogyResources.info/

Grant Resources
http://www.GrantResources.info/

Grid Resources
http://www.GridResources.info/

Healthcare Resources
http://www.HealthcareResources.info/

Information Quality Resources
http://www.InformationQualityResources.info/

Internet Alerts
http://www.InternetAlerts.info/

Internet Demographics
http://www.InternetDemographics.info/

Internet Experts
http://www.InternetExperts.info/

Internet Hoaxes
http://www.InternetHoaxes.info/

Knowledge Discovery
http://www.KnowledgeDiscovery.info/

Military Resources
http://www.MilitaryResources.info/

Outsourcing/Offshoring Information and Resources
http://www.OutsourcingOffshore.us/

Privacy Resources
http://www.PrivacyResources.info/

Reference Resources
http://www.ResearchResources.info/

Research Resources
http://www.ResearchResources.info/

Script Resources
http://www.ScriptResources.info/

ShoppingBots
http://www.ShoppingBots.info/

Statistics Resources
http://www.StatisticsResources.info/

Student Research
http://www.StudentResearch.info/

Theology Resources
http://www.TheologyResources.info/

Tutorial Resources
http://www.TutorialResources.info/

World Wide Web Reference
http://www.WWWReference.info/


Constructing a bibliography

We have detailed instructions on how to construct a bibliography according to the MLA format. For further information, go to:

http://lanecc.edu/library/instruction/mla.htm


Questions or comments? Please contact Don Macnaughtan

 
>> Return to Lane's Home Page         >> Return to top of page
Lane Community College Library
4000 E 30th Ave., Eugene OR 97405
2nd floor, Center Bldg - (541) 463-5220
Please direct comments about this site to library@lanecc.edu
Revised 12/31/08 (ljg)
© 1996-present Lane Community College
Description

  Loading content... please wait