Friday, May 11, 2007

Review: Implementing the Google Search Appliance in an Intranet environment

Our corporate intranet is a non-framed environment with both Lotus Domino and IIS (.Net and classic ASP) applications and content. We have between 300,000-500,000 pages of web content and documents across more than 1200 “sites” on approximately 30 unique domains. We used to have Inktomi’s UltraSeek Server 3.0 as our intranet search engine which was beginning to look like its age (purchased in 1998). The Inktomi product did not handle attachments well (DOC, PPT, PDF, etc.), would not crawl our secured sites, and was no longer supported by the vendor. We did a cursory review of the search vendors and were immediately attracted to Google’s 30 day trial offer for their Google Search Appliance (GSA). After signing a standard agreement, they shipped us a brand new shiny yellow unit which we could test for 30 days before returning or purchasing.

Product info
The GSA is a “black box” 1U standard rack-mountable server. By “black box” I mean, Google gives you a web interface to administer the device but do not want you to access the Operating System (a heavily Google-customized version of Linux). In fact, the license agreement stipulates that you will not tamper with the hardware or OS of the appliance in any way. The device has no need for a keyboard, mouse or video – all you need for normal operation is a network cable and standard power input.

The GSA comes in different flavors to fit different needs varying by size of the hardware and correspondingly size of the license. (Licensing is based on the number of URLs crawled by the appliance.) There are 3 different hardware configurations; the GB-1001, GB-5005, and GB-800. These are broken down as follows;

* GB-1001 – 150K documents for $28K, 300K documents for $50K
* GB-5005 – 1.5M documents for $230K
* GB-8008 – 4M documents for $450K

Why Google?
As advertised, the GSA met all of our needs being able to index the large variety of filetypes we have in our environment, access secured content, having a documented API, etc. The Google brand power was another big selling factor. When we told our users that they were going to get a Google-based search engine they knew their days of troubled searching were over. Lastly, the 30-day trial run experience we had with the GSA sealed the deal. The appliance is the easiest enterprise solution I’ve ever had to install, configure and maintain. We were literally up and running within an hour of opening the shipping box.

Installation
The appliance has two network ports on the back panel; one for normal operation and the other used exclusively for network configuration. To configure the network settings we connected a laptop to the appliance via a special (some pin-outs are non-standard) orange Ethernet cable which is included. The installation process was about as easy as one can imagine for a “black box.”

First we plugged in the normal operation network cable and then the power. The power plug on the appliance IS the power switch; plug it in to turn on and unplug it to turn it off. After plugging it in, we waited about 5 minutes for the appliance to play a tune which is the signal to continue. Next, we hooked up our laptop (already set to DHCP mode) to the appliance and powered it up. After logging in to our laptop and making sure we had the correct IP assigned by the appliance’s built-in DHCP server we are ready to configure the network settings. Total elapsed time (excluding rack mounting): 10 minutes.

Configuration
Network configuration, like normal administration, is done entirely through a browser and is a simple 5 step process. The first screens ask you for basic network information; IP address, subnet mask, default gateway, and DNS. Subsequent screens collect SMTP server, “From” address for GSA notification messages, time zone, NTP (time) servers and the admin account name/password. The last step is to test a few URLs which you will be crawling to make sure you’ve done the setup correctly. After a final settings review screen configuration is complete and you can then unplug your laptop and get to the good part; start crawling. Total elapsed time: 10 minutes.

Crawling the site(s)
Using the URL provided, all administration of the GSA is done remotely. After logging in with the ID/password we provided in the previous step, we were presented with the Administration console. We created a new collection to hold our index, put in the “Start crawling from” URL, copied that same URL into the “Follow and Crawl only URLs with the Following patterns” box and we were done. We saved our settings and then clicked the “Start crawling” button. We then went over to the “Crawl status” screen and watched the “Crawled URLs” counter increase. Google advertises that it can crawl about 4,000 URLs in about 15 minutes or so. We found the crawl time would increase significantly if there are documents (Word, PDF, Excel, etc.) linked to from those URLS.

After the crawl is done the collection is automatically indexed and then checked against the Serving Prerequisites (any criteria you wish to use to determine whether to move an indexed collection to production) and the collection will either be moved to Production (and consequently searchable) or be moved to Staging. The Staging area lets you validate new crawls before letting users search against them.

Crawling configuration
After your first crawl you may find the need to go back and tweak the crawling parameters. Google gives you a good amount of control over how sites are crawled, the frequency, how many threads are used, etc. For sites with security, the GSA supports Basic Authentication and an additional security module is available which supports Forms Authentication. The most challenging configuration aspects for us were determining the right combination of URL patterns to exclude from the search. If you are a Domino shop and looking to use the GSA you may need to spend some time getting the crawler configuration just right to support the sometimes convoluted Domino query string parameters.

After we got the crawl parameters tuned and the first complete crawl done we did some testing to see if the crawler grabbed all the content. Browsing our site and testing with some strings buried deep inside the taxonomy we always found the GSA had crawled them accurately. We also did some testing with strings inside PDF documents, PowerPoint presentations and the like. When we did come across something that hadn’t been crawled a careful analysis led us to discover that we needed to do some more tweaking of the crawl settings.

Other notable features
Google also gives you a KeyMatch tool that allows you to specify which indexed documents should appear at the top of the results page for a given query. These manifest themselves almost identically to the Sponsored Links at the top of the results page of the Google we all use. A Synonym tool allows you to specify alternate words or phrases for search queries. For example, if someone searches for WCM, you can suggest “Web Content Management” at the top of the results page.

An output format feature lets you control (via an XSLT) the presentation of the search results. You can use this for changing the fonts, colors, logo, header, etc. of the results page. We were able to easily remove the “Cached” feature on the results page with some XSLT modifications.

The Reporting tool lets you run reports on search queries over various time ranges. It will show you the number of searches per day, per hour, the top 100 keywords and top 100 queries for the time period specified.

Downsides
The GSA is not for organizations looking to index their shared network drives as the appliance has no facility for crawling file systems. This is really too bad as many companies struggle with the huge quantities of unstructured content on stored on their networks. Of course, there are a plethora of other products out there for exactly this issue.

Access directly to databases (e.g. SQL, Oracle, etc.) is another area which is off-limits for the GSA as well as any kind of integration with content or document management systems.

Conclusion
The Google Search Appliance (GSA) is an excellent search product for HTTP-accessible content. It gives great control over administrative features such as crawler configuration and results serving and sufficient reporting capabilities as well. Those looking for a solution to integrate directly with a content/document management system, databases, or indexing network drives should look to another product. HoweverPsychology Articles, if you have a intranet or intranet site with plenty of HTML-based content the GSA may be just what you need.

ABOUT THE AUTHOR

Bryan Mjaanes is the creator/editor of Intranet101.com, a community-based forum for Intranet professionals.

Are You Looking For Some Discount Ugg Boots?

If you find yourself caught up in the recent boot craze, but think that these trendy boots are way out of touch with your wallet, there is something that you should know. Ugg boots, those oh-so-soft multi-hued boots made from pure Australian sheep skin are available at a discount, if you know where to look. The best place to begin checking for discount ugg boots is on the Internet. The Internet is a wonderful invention. With a good search engine and a few clicks of a button, you can locate just about anything on the Internet, even long-lost distant relatives you had hoped you would never see again. Just begin your search for discount ugg boots by typing an appropriate phrase into the search area. Faster than you can blink an eye, your computer screen will fill with pages upon pages of links you can click to start your search for your very own pair of these trendy boots. If you are looking for discount ugg boots have a look at the great deals here.

The most obvious place to begin your search for discount ugg boots, even without using a search engine, is to visit your favorite web auction site. There you can search for boots, or more specifically, ugg boots, or even more specifically, three-quarter length, tan ugg boots in size seven. Press a button, find a pair that matches your criteria and start naming your price. Before you know it, you will have at your doorstep your very own pair of size seven, tan, three-quarter length ugg boots. See that was not so hard!

Also available on the Internet are different types of shopping mall clubs which you can join by paying the stated membership fee. These types of clubs offer members the ability to purchase various items including clothing, home accessories and other consumables that are available through the particular virtual shopping mall program at a discount. The product lines that these types of virtual shopping malls carry typically include the types of items that one would expect to find in a normal shopping mall environment. So, search around for a shopping mall program which offers discount ugg boots, join the program, and shop around until you find the perfect pair. Who knows, you may just get hooked on the virtual shopping mall experience. Most credit cards are accepted!

The reality is that because this style of boot is just so popular and the product seems to literally fly off the shelf, discount ugg boots may be difficult to locate, even on the Internet. From a retailer’s point of view, it does not make good financial sense to discount the price of a product that is easily and continuously selling at full retail price. With the fall season in full swing and the holiday season right around the corner, this style of boot likely will continue its brisk selling pace making discounts difficult to locate.

Eventually, however, stores will begin to discount their prices in an effort to draw consumers into their stores and encourage them to make purchases. If you are patient and are willing to take time to research the Internet and other retail outlets, you will eventually be able to find a pair of discount ugg boots that fit your needs, and your feet and won’t cause too much damage to your wallet.

Just be aware that the market is quickly flooding with lower quality imitations of this boot style. You may find that some discount ugg boots are not made from pure Australian sheep skin. These boots will likely look pretty close to the real thingFeature Articles, but your feet will know the difference.


ABOUT THE AUTHOR

Brian Fong
http://www.Sheep-Skin-Boots-Guide.com
If you are patient and are willing to take time to research the Internet and other retail outlets, you will eventually be able to find a pair of discount ugg boots that fit your needs ...

For Wordpress users version 1.5 new feature “Pages”

After many requests from wordpress users, the latest version of wordpress has a build in option to create static pages. You can use static pages for an about page, contact page, a links page, etc. This dreamhost review page is an example of a stand alone static page. The advantages are that you can add stand alone content to its own page, outside the normal weblog hierarchy. Pages have the same editing options, plug-in functionality and themes as posts. Or you can customize the stand alone page as much as you want. Static pages also help search engine indexing, as opposed to dynamic urls.

The comprehensive overview and instructions are on the wordpress codex “pages” page.

ABOUT THE AUTHOR

Byline- Ron Robinson maintains a personal weblog about science fiction writing and a Weblog about profitable online publishing.
Note this byline need not be included. Please publish this article as is, including all links.

Thomas R. Cutler: Manufacturing Journalist Thrives as Contributing Editor for Industry 2.0

Approaching a year now, Thomas R. Cutler, has been a contributing editor for Industry 2.0. Cutler, who founded the Manufacturing Media Consortium in 1999, has grown the participation from 300 journalists to nearly 3000 key clients, journalists, editors, trendsetters, and key business leaders worldwide. Cutler has authored more than 1000 articles for a wide range of manufacturing periodicals, industrial publications, and business journals including most of the leading monthly trade publications. Cutler is the author of The Manufacturers’ Public Relations and Media Guide. Cutler was voted #1 Manufacturing Journalist for the third year in a row.

Industry 2.0 is published monthly. With a controlled circulation base of 30,000, more than two hundred thousand monthly readers across India, have relied up this vital publication since 2002. Covering twenty industry sectors, Industry 2.0 reaches key decision-makers. 65% of the publications readers are managing directors, CEO’s, vice presidents and other senior executives.

TR Cutler, Inc., (www.trcutlerinc.com), is the nation's largest manufacturing marketing and public relations firm, based in Fort Lauderdale, Florida. According to the website, “Cutler tells the extraordinary stories of manufacturers. There are great companies making great products. Cutler is quite vocal that there are too many manufacturers and companies serving the manufacturing sector that have simply neglected to tell their story. Cutler's goal is to tell these manufacturing stories in an interesting, dynamic, understandable, and relevant way.”

Cutler also contributes to such publications as Automation.com, Quality Digest, Manufacturing.net, IFSQN, Manufacturing & Technology, Food & Beverage Journal, Software Magazine, Food & Drug Packaging, Food Quality, Fabricating & Metalworking, World Trade, American Machinist, Industrial Distribution, The Manufacturers, and hundreds of other leading periodicals.

TR CutlerArticle Submission, Inc.
www.trcutlerinc.com
Thomas Cutler
trcutler@trcutlerinc.com
888-902-0300

http://www.articlesfactory.com/articles/marketing/thomas-r-cutler-manufacturing-journalist-thrives-as-contributing-editor-for-industry-20.html


Finding and Managing Quality Reciprocal Links: A Tutorial for The Newbie

All of us want to increase traffic to our web sites. It helps our search engine rankings, and is a very cost-effective way to provide us with potential new customers. One of the best, and certainly least expensive, ways to do that is by exchanging links with sites similar to our own, or that contain content our own visitors are likely to find interesting and useful. It is important to restrict our exchanges to such sites because if we indiscriminately exchange with everybody and anybody we become what is known as a “link farm” and wind up being banned by the search engines. No one wants to be banned by the search engines, so this article will discuss how to find relevant sites with whom to exchange links, and how to keep track of them after you have exchanged the links. Even though there are software programs that will do most if not all of this for you, they have various flaws and inadequacies. If you are one of those people who prefer the personal touch, this article is for you!

The first thing to do is to type your key words or phrases into a search engine and see what comes up. The sites you see on the first few pages have made it to the top of the search engine rankings. Go to these sites and look around on the home page for the phrase “links exchange” or sometimes just “links.” Some sites now use “resources.“ They will have instructions for how to place a link to their site on your site, and instructions on the information they need from you in order for them to place a link to your site on theirs. Some have a form for you to fill out; others want you to e-mail it to them. Have your information (your site Title, URL (your home page where you want them to point their link), Description, and the URL of your links page where you have placed their link, your name and e-mail address) saved in a document ready to cut and paste into both forms and e-mails. It will save you tons of time. Always remember that the Golden Rule applies here. You want them to place a link to you on their site so you need to reciprocate. If you are uncomfortable with the content of their links page and would prefer not to be associated with that site, then just move on. There are plenty of others.

Some of the higher ranked sites will have non-content related restrictions about with whom they will exchange or will not links. They will only exchange with you if your site and/or links pages have achieved a certain Google PageRank. (To download the Google Toolbar click here.) Even if you do not have such a page rank yourself, you can request an exchange with such sites but do not be surprised if they decline or ignore you. If your site is new, be sure not to stop looking after the first page or two in the search engines. There are many wonderful sites not on the first few pages and most of them would be more than happy to exchange links with you.

Repeat this search frequently because nothing is static on search engines; you might get different results every day searching for the very same word or phrase!

Once you have exchanged links with a site check out the other sites on their links pages. Some sites have links pages loaded with other sites that would be a good match for you. Go ahead and offer to exchange links with them, also. Beware however that some sites may not have been very discerning in their choices of exchange partners. You, however, will be very discerning and will choose wisely the sites with whom you will exchange links! Be careful not to be too narrow, though. Remember that you want to exchange with sites that have content your visitors may find interesting and useful. If you only exchange with clones or near clones of your site, nobody will be interested for very long. For instance, if your site is about a particular breed of dog, don’t limit your exchanges to other sites about that breed, or even other breeds. Try sites that deal in dog food, dog care in general, grooming products, training methods, dog accessories, shows and other events, etc. Set your site up so that each category has its own page or set of pages, and alphabetize the links on each page by site title if your software does not do that automatically. What you want to avoid are sites that have nothing whatsoever to do with the main topic of your site. In our example of a site about a breed of dog, avoid exchanging links with sites offering bargain vacations on the other side of the globe, casinos, real estate, music, etc. You get the idea. Remember that people who visit your site are looking for dog related information, not that other stuff.

Now that you have accumulated several pages of links, you need to be able to keep track of them so that you don’t request exchanges from the same webmasters more than once. It would be embarrassing for them to respond to your request with “I exchanged with you 2 months ago” or something like that. So, what do you do if your software does not do it for you?

Set up a simple spread sheet. Don’t worry about setting it up to print out on neat pages, because you shouldn’t have to print it out. Make the columns as wide as you need to. You will need to set them up as follows: Title / Description / Category /URL / Reciprocal URL. You can add optional columns for webmaster name and e-mail if you want. Your categories in our hypothetical site would be food, training, accessories, etc. As you add a link to your site, add it to the bottom of your spreadsheet. You can then alphabetize your spreadsheet by whichever column you need. Not only will you know which sites you already have, but you can see at a glance where on your site you have put it! I have used this system extensively and it works extremely well.

One last thing. Please don’t ever refuse to exchange links with someone without a high enough page rank because they are new. Everyone was new once and started out with a page rank of “0” including those who are now at the top of the list for their chosen key words and phrases. As people help you to get started when you are new, turn around and help others behind you as they get started. In case you might be wondering how to tell the difference between a new site with a page rank of “0” and a link farm with a page rank of “0”, the new site with have a cached page and the link farm will not. The link farm will have a bizillion links and the new site will have few if any. To find this information look for the icon on the Google Toolbar that looks like a blue circle with a lower case white “i” in the middle, located right next to the little green bar that shows you your current page rank. When you click on that icon you will see a drop down menu that includes both “cached snapshot page” (what your page looked like last time Google checked out your site) and “backward links” which shows how many links Google shows point to your site from other sites. If when you click on one of those drop down menu choices you find that Google has no record of that site, it’s best to wait to exchange links until you find our why there is no record. If the site has been up for more than a few days, there should be at least a cached page. If there isn’t, the site may have been bannedFind Article, and you should not associate with banned sites.

ABOUT THE AUTHOR

Sandi Moses has been involved in internet marketing since
November, 2003. Visit her sites at
http://www.123iwork4me.com
http://www.123-home-based-business-works-4-me.com