Last night was our third SEMNE event (Search Engine Marketing New England), and we were humbled to have Dan Crow, director of crawl systems at Google, spilling the beans about how to get your site into Google. He talked for a half hour or so, and then proceeded to answer audience questions for at least another hour. As I sat there listening to him (yes, I actually listened to this one!), I was struck by what an awesome opportunity it was for everyone in that room to be provided with such important information -- straight from Google. It was clear that the 100 or so people in the room agreed. In fact, at 7:30 on the dot, everyone spontaneously stopped their networking activities and simply took their seats without being asked to. These folks definitely came to hear Google!
What Is Indexing?
Dan started out his presentation discussing what "indexing" means and how Google goes about it. Basically, the process for the Google crawler is to first look at the robots.txt file in order to learn where it shouldn't go, and then it gets down to business visiting the pages it is allowed to visit. As the crawler lands on a page, it finds the relevant information contained on it, then follows each link and repeats the process.
Robots.txt Explored
Dan proceeded to explain how to use your robots.txt file for excluding pages and directories from your site that you might not want indexed, such as the cgi-bin folder. He told us how each of the major search engines have their own commands for this file but that they're working to standardize things a bit more in the future.
In terms of what the crawler looks at on the page, he said there are over 200 factors, with "relevance" playing a big part in many of them.
Google Still Loves Its PageRank
Dan also discussed the importance of PageRank (the real one that only Google knows about, not the "for-amusement-purposes-only" toolbar PR that many obsess over). He let us know that having high-quality links is still one of the greatest factors towards being indexed and ranked, and then he proceeded to explain how building your site with unique content for your users is one of the best approaches to take. (Now, where have you heard that before? ;) He explained how creating a community of like-minded individuals that builds up its popularity over time is a perfect way to enhance your site.
Did You Know About These Tags?
We were also treated to some additional tips that many people may not have known about. For instance, did you know that you could stop Google from showing any snippet of your page in the search engine results by using a "nosnippet" tag? And you can also stop Google from showing a cached version of your page via the "noarchive" tag. Dan doesn't recommend these for most pages since snippets are extremely helpful to visitors, as is showing the cache. However, Google understands that there are certain circumstances where you may want to turn those off.
Breaking News!
Google is coming out with a new tag called "unavailable_after" which will allow people to tell Google when a particular page will no longer be available for crawling. For instance, if you have a special offer on your site that expires on a particular date, you might want to use the unavailable_after tag to let Google know when to stop indexing it. Or perhaps you write articles that are free for a particular amount of time, but then get moved to a paid-subscription area of your site. Unavailable_after is the tag for you! Pretty neat stuff!
Webmaster Central Tools
Dan couldn't say enough good things about their Webmaster Central tools. I have to say that seems to be very common with all the Google reps I've heard speak at various conferences. The great thing is that they're not kidding! If you haven't tried the webmaster tools yet, you really should because they provide you with a ton of information about your site such as backward links, the keyword phrases with which people have found each page of your site, and much, much more!
Sitemaps Explored
One of the main tools in Webmaster Central is the ability to provide Google with an XML sitemap. Dan told us that a Google sitemap can be used to provide them with URLs that they would otherwise not be able to find because they weren't linked to from anywhere else. He used the term "walled garden" to describe a set of pages that are linked only to each other but not linked from anywhere else. He said that you could simply submit one of the URLs via your sitemap, and then they'd crawl the rest. He also talked about how sitemaps were good for getting pages indexed that could be reached only via
webforms. He did admit later that even though those pages would be likely to be indexed via the sitemap, at this time they would still most likely be considered low quality since they wouldn't have any PageRank. Google is working on a way to change this in the future, however.
Flash and AJAX
Lastly, Dan mentioned that Google still isn't doing a great job of indexing content that is contained within Flash and/or AJAX. He said that you should definitely limit your use of these technologies for content that you want indexed. He provided a bit of information regarding Scalable Inman Flash Replacement (sIFR), and explained that when used in the manner for which it was intended, it's a perfectly acceptable solution for Google. (You can read more about sIFR here) Dan said that Google does hope to do a better job of indexing the information
contained in Flash at some point in the future.
The Q&A
Many of the points mentioned above were also covered in greater detail during Dan's extensive Q&A session. However, there were many additional enlightening tidbits that got covered. For instance, Sherwood Stranieri from Catalyst Online asked about Google's new Universal Search, specifically as it applied to when particular videos (that were not served up from any Google properties) would show up in the main search results. Dan explained that in Universal Search, the videos that show up are the same that show up first while using Google's video search function.
The Dreaded Supplemental Results
Of course, someone just *had* to ask about supplemental results and what causes pages to be banished there. (This is one of the most common questions that I hear at all SEO/SEM conferences.) Dan provided us with some insights as to what the supplemental results were and how you could get your URLs out of them. He explained that basically the supplemental index is where they put pages that have low PageRank (the real kind) or ones that don't change very often. These pages generally don't show up in the search results unless there are not enough relevant pages in the main results to show. He
had some good news to report: Google is starting to crawl the supplemental index more often, and soon the distinction between the main index and the supplemental index will be blurring. For now, to get your URLs back into the main results, he suggested more incoming links (of course!).
There was a whole lot more discussed, but I think this is enough to digest for now! All in all, my SEMNE co-founder Pauline and I were extremely pleased with how the night unfolded. We had a great turnout, met a ton of new contacts, caught up with a bunch of old friends, and received some great information straight from Google!
http://www.isedb.com/db/articles/1687/