Tip:
Highlight text to annotate it
X
Hi, my name is Maile Ohye, and I work at Google as a Developer Programs tech lead.
I'm so glad to be speaking to you today because for me and on my behalf of all my
colleagues at Google, we understand how important it is to have a strong news ecosystem so I
hope you find something in this presentation that you find useful. Today we're going
to talk about three main topics. First the ranking factors of Google News search. Next
we're going to cover some of the frequently asked questions that we hear from publishers
or from SEOs. And last we're going to talk more about the best practices when
you publish articles. So let's take a first look at how your articles appear in
a Google search result. There are several ways. First is obviously on google.com, where
people might see a news onebox. And this here in the upper screen shot shows you a news
result for a search like Obama medals, where now the user is shown some news article.
There's one way where your articles can appear in Google news. On this second
screenshot this is from a user going directly to news.google.com and here's where
they see a similar cluster of articles but instead of the google.com homepage they're
seeing it on the news home page. So you might be asking yourself, "How did these articles
appear?" Now the way we gather these articles are by first crawling it, next grouping
it, and then ranking all of the information. And we'll cover each of these steps
more in depth. Let's start with crawling. In the crawling stage, much like websearch,
we have Googlebot who's going to go out to your news sites to look for new articles.
And there's two ways that we retrieve these articles. One is through our discovery
crawl where Google sees new URLs and then crawls those articles, but in addition to
that discovery crawl you can also create News Sitemaps. And News Sitemaps are a way for
you to list exactly what are your new URLS, and so we can use that as well in addition
to our discovery crawl to find your new information. And of course, we respect the Robots Exclusion
Protocol, so you can create a robots.txt file or use http headers to let us know specifically
what documents you want crawled and what documents you want excluded from Google search results.
Last, once we've crawled and made sure that we've only crawled what we are
allowed to crawl, we bring those articles back to Google. And that's the end
of the crawling phase. So next we get into that grouping phase, and
here's where we have this classification idea. In classification, what we're
doing is actually looking at each individual article's contents. So you can see
on this article "The millions Kozlowski didn't steal" . We actually take
out individual words like "business" , "tycho" , "money" and
"cfo" and understand that this article pertains to the section of business. And that's
how we populate those different sections in Google news like Business, Health and Entertainment.
Another thing we're doing is populating our additions whether it's UK, US or
India. And we can take that from the text as well. Here we've taken words like
New York and Manhattan and that's led us to believe that this article pertains to
the United States. So this is that grouping stage where we understand what an article
is about and also what sections and additions it pertains to. So now that we've covered
crawling, grouping, we now have ranking. And ranking is going to come in two phases.
First of course is story ranking. Story ranking is much like what you see on the Google News
page where there's a group of stories, whether it might be Obama and the medal ceremony,
or it might be the death of Michael Jackson. Or it might be rising oil prices. Story ranking
is deciding which of these stories should be placed higher which second, which third.
That type of idea. These cluster of stories. And we rank these story clusters according
to aggregate editorial interest. So let's take a deeper look at what that means. In
the upper diagram you can see that a smaller story has a small effect on publishing activity.
Let's say in North Carolina a man was giving out free cars to those that really
needed it. That's a great human interest story. It might be covered in a local paper
and also be picked up by a few wires. But this is still a relatively small story not
showing as much aggregate editorial interest as say a larger story, like the death of Michael
Jackson, which is not only published on a local newspaper, but foreign and national
papers, covered by many wires, also including op-ed articles and follow-up articles. You
can see that due to all the editorial interest about this story we will likely rank it higher
than the interest story about a man giving out free cars in North Carolina. So that's
story ranking. We're actually ranking those clusters. The next part about ranking
is the individual article ranking. Article ranking helps us take a cluster of stories,
say the death of Michael Jackson, and helps us determine out of those 200 stories which
one should be ranked first for our users, which should be ranked second and so on. There
are many signals that go into article ranking, but I'm just going to cover four of
the major ones for you here. First is fresh and new. It's important to us that
an article contain recent substantial information about a topic. And it needs to be objective
news to lead this cluster of stories. So press releases, satire, op-eds aren't eligible
to lead clusters. Another factor is duplication and novelty detection. And that's where
we try to determine an original source of content from those that are duplicating the
information. So something that we use there is this idea of citation rank. So for an article
we can see that if a news story was broken by the Los Angeles Times and then later another
article saying Washignton cited the Los Angeles Times as being the source of the information
then we can start to see the citation rank taking place for this story. That this article
from the Los Angeles Times might have higher ranking now because other people are citing
it as being an original story. Another factor is local and personal relevancy.
And this applies to individual sections, as well as additions of your publication. So
what we want to do is actually give more weight to local sources that are likely more relevant
to the news item. So if we take that idea of a man giving out free cars in North Carolina,
it's likely that we would take a paper like the Charlotte Observer, and know that
could be a higher authority for that story and therefore that article might be ranked
higher in this cluster. The last signal I wanted to cover in article ranking is the
idea of trusted sources. For us trusted sources doesn't have to do with some arbitrary
decision that we make, but it's actually data driven. So according to our data over
time, did users start to look at your articles and then click on them? Let's say that
there were five articles being listed and a significant amount of users chose the third
article and went to that source. Then we might start to determine that this source is actually
very trusted for this certain type of information and over time we start to build out what publications
are trusted sources. But not for their entire publication, it's done on a section
and category basis. So something like the Sporting News could be very trusted for sports
information but maybe not so much for business. And likely something like the Wall Street
Journal might be very trusted in the United States for business information but maybe
not in India. So again, these trusted sources have to do with section and addition. So it's
a very specific thing that we're looking for due to aggregate user behaviour. So those
are just four of the signals that we use in news Search article ranking.
Next let's go into some of your frequently asked questions. You might be asking "What
are the benefits of submitting a News Sitemap?" Well, we think that Sitemaps are beneficial
to us and to you as a publisher as well. First of all they provide you greater control over
which of your articles appear in Google News. And that's because, as I mentioned
earlier, they help compliment our discovery crawl and tell us exactly what articles are
new and which articles we should crawl. Second, News Sitemaps are great because they help
you give us meta-information about your articles. So rather than rely on our extractor you can
give us the publication date. And rather than rely on just our extractor to determine the
categories for your article you can give us good hints by using the keywords field. So
all in all, we think News Sitemaps provide a huge benefit to publishers.
Another frequently asked question is "Can Googlebot visit our URLs more than once?"
And the answer is yes, we can definitely recrawl URLs to check for updates. But just taking
a step back. Initially Google can actually find your new content within a matter of minutes
of when you published it. And we find your new content through our discovery crawl or
through news sitemaps and after that initial discovery we will definitely go back and re-check
for new article content. So the time at which we may re-crawl varies, so that re-crawl rate
varies, but its pretty safe to say that we'll probably go back and check for new content
within 12 hours. So we'll find it within a matter of minutes and we'll re-crawl
for new content within 12 hours. You might also be asking "How do I optimize
my multimedia content?" Well that's a great question. So we're going to
take a look at two types of content. First, let's talk about videos. With videos
you can create a youtube channel and submit that to us. We're looking to include
other types of video hosters, but right now with Youtube we have a pretty good idea of
the user experience, that the video will load etc., so youtube is a trusted video hoster
platform for us. And if you do use Youtube remember that including textual descriptions
and transcripts are also helpful because that helps us associate a specific video with the
subject matter. Now let's talk about images. With images
we have five tips that will help your images get included in Google News Search. First
you want to use a large size image with a good aspect ratio. Second you want descriptive
captions and alt text. Third you want to keep your good image near the title. And that again
helps us associate an image with the subject matter. Fourth, you want your good image to
be inline and not a clickable version. So again you want your good image near the title
and inline. And last we prefer JPG. So if you use things like PNG images that's
not as good for Google News as a JPEG. So I would definitely stick to JPEG if you would
like your images included in Google News.
So the last frequently asked question is of course "What about PageRank?" PageRank
is a lesser factor in Google News than it is in Websearch. And that makes sense right
because the linking structure for an article that was only published minutes ago isn't
going to be the same as one that was published years or months ago. So we have to use PR
delicately in Google news. So instead of using signals like PR we actually use more signals
like we talked about earlier. Which is things like timeliness. Is it fresh and new? Or does
it have local or personal relevancy. Those types of things. So now that we've
covered how Google crawls and groups and ranks articles and we answered some of your frequently
asked questions let's just get in to some best practices.
First, it's important that you create permanent unique URLs with at least 3 digits.
And the reason for this is, is that traditionally, news publishers have used article Ids and
then equals a number in their url strings. And that has helped us to determine that its
an article and not just a static html page. But if your news publishing system doesn't
include digits, three at least three for Google News, then you can actually submit a News
Sitemap. So that's the workaround. If you don't have 3 digits in your
URLs, you can create a News Sitemap and let us know which specific URLS belong in News.
The second best practice is to not break up the article body. So in your news article
it should have sequential paragraphs that can all be included in Google News. You don't
want to break that up with user comments or links to related posts or even if you have
things like it links to additional pages. That's not as good for Google News.
We'll take all the article on that first page. So look again to not break up
the article body. A third best practice is to put dates between
the title and the body and that will help our date extractor to have the correct publication
date. Fourth, titles matter. And this is to have
a good HTML title as well as an article title. So you want your title to be extremely indicative
of the story at hand. Fifth , its best for Google News if you separate
your original article content from your press releases. And you can do this in a directory
structure. And this helps us determine what is specifically a news article versus what
might be satire or opinion or a press release. And the last tip of course is to create unique
and informative content. And taht's always going to help you do well in the rankings.
So the more unique content that you create, and the more users that enjoy that, the more
users will send there and this is kind of converse to the idea of just publishing other
people's content or just having duplicate information. So again, the greater information
that you put out for all of us to read the more users you'll attract to your site.
If you have additional questions, please feel free to visit our News Publisher Help Center
and thanks so much for watching.