Google Patent on Structured Data Focuses upon JSON-LD

Ernest Hemingway Structure Data

In a search engine that answers questions based upon crawling and indexing facts found within structured data on a site, that search engine works differently than a search engine which looks at the words used in a query, and tries to return documents that contain the same words as the ones in the query; hoping that such a matching of strings might contain an actual answer to the informational need that inspired the query in the first place. Search using Structured Data works a little differently, as seen in this flowchart from a 2017 Google patent:

Flow Chart Showing Structured Data in a Search

In Schema, Structured Data, and Scattered Databases such as the World Wide Web, I talked about the Dipre Algorithm in a patent from Sergey Brin, as I described in the post, Google’s First Semantic Search Invention was Patented in 1999. That patent and algorithm described how the web might be crawled to collect pattern and relations information about specific facts. In that case, about books. In the Google patent on structured data, we see how Google might look for factual information set out in semi-structured data such as JSON-LD, to be able to answer queries about facts, such as, “What is a book, by Ernest Hemingway, published in 1948-1952.

This newer patent tells us that it might solve that book search in this manner:

In particular, for each encoded data item associated with a given identified schema, the system searches the locations in the encoded data item identified by the schema as storing values for the specified keys to identify encoded data items that store values for the specified keys that satisfy the requirements specified in the query. For example, if the query is for semi-structured data items that have a value “Ernest Hemingway” for an “author” key and that have values in a range of “1948-1952” for a “year published” key, the system can identify encoded data items that store a value corresponding to “Ernest Hemingway” in the location identified in the schema associated with the encoded data item as storing the value for the “author” key and that store a value in the range from “1948-1952” in the location identified in the schema associated with the encoded data item as storing the value for the “year published” key. Thus, the system can identify encoded data items that satisfy the query efficiently, i.e., without searching encoded data items that do not include values for each key specified in the received query and without searching locations in the encoded data items that are not identified as storing values for the specified keys.

It was interesting seeing Google come out with a patent about searching semi-structured data which focused upon the use of JSON-LD. We see them providing an example of JSON on one of the Google Developer’s pages at: Introduction to Structured Data

As it tells us on that page:

This documentation describes which fields are required, recommended, or optional for structured data with special meaning to Google Search. Most Search structured data uses schema.org vocabulary, but you should rely on the documentation on developers.google.com as definitive for Google Search behavior, rather than the schema.org documentation. Attributes or objects not described here are not required by Google Search, even if marked as required by schema.org.

The page then points us to the Structured Data Testing Tool, to be used as you prepare pages for use with Structured Data. It also tells us that for checking on Structured Data after it has been set up, the Structured Data Report in Google Search Console can be helpful, and is what I usually look at when doing site audits.

The Schema.org website has had a lot of JSON-LD examples added to it, and it was interesting to see this patent focus upon it. As they tell us about it in the patent, it seems that they like it:

Semi-structured data is self-describing data that does not conform to a static, predefined format. For example, one semi-structured data format is JavaScript Object Notation (JSON). A JSON data item generally includes one or more JSON objects, i.e., one or more unordered sets of key/value pairs. Another example semi-structured data format is Extensible Markup Language (XML). An XML data item generally includes one or more XML elements that define values for one or more keys.

I’ve used the analogy of how XML sitemaps are machine-readable, compared to HTML Sitemaps, and that is how JSON-LD shows off facts in a machine-readable way on a site, as opposed to content that is in HTML format. As the patent tells us that is the purpose behind this patent:

In general, this specification describes techniques for extracting facts from collections of documents.

The patent discusses schemas that might be on a site, and key/value pairs that could be searched, and details about such a search of semi-structured data on a site:

The aspect further includes receiving a query for semi-structured data items, wherein the query specifies requirements for values for one or more keys; identifying schemas from the plurality of schemas that identify locations for values corresponding to each of the one or more keys; for each identified schema, searching the encoded data items associated with the schema to identify encoded data items that satisfy the query; and providing data identifying values from the encoded data items that satisfy the query in response to the query. Searching the encoded data items associated with the schema includes: searching, for each encoded data item associated with the schema, the locations in the encoded data item identified by the schema as storing values for the specified keys to identify whether the encoded data item stores values for the specified keys that satisfy the requirements specified in the query.

The patent providing details of the use of JSON-LD to provide a machine readable set of facts on a site can be found here:

Storing semi-structured data
Inventors: Martin Probst
Assignee: Google Inc.
US Patent: 9,754,048
Granted: September 5, 2017
Filed: October 6, 2014

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for storing semi-structured data. One of the methods includes maintaining a plurality of schemas; receiving a first semi-structured data item; determining that the first semi-structured data item does not match any of the schemas in the plurality of schemas; and in response to determining that the first semi-structured data item does not match any of the schemas in the plurality of schemas: generating a new schema, encoding the first semi-structured data item in the first data format to generate the first new encoded data item in accordance with the new schema, storing the first new encoded data item in the data item repository, and associating the first new encoded data item with the new schema.

Take Aways

By using Structured Data such as in Schema Vocabulary in JSON-LD formatting, you make sure that you provide precise facts in key/value pairs that provide an alternative to the HTML-based content on the pages of a site. Make sure that you follow the Structured Data General Guidelines from Google when you add it to a site. That page tells us that pages that don’t follow the guidelines may not rank as highly, or may become ineligible for rich results appearing for them in Google SERPs.

And if you are optimizing a site for Google, it also helps to optimize the same site for Bing, and it is good to see that Bing seems to like JSON-LD too. It has taken a while for Bing to do that (see Aaron Bradle’s post, An Open Letter to Bing Regarding JSON-LD.) It appears that Bing has listened a little, adding some capacity to check on JSON-LD after it is deployed: Bing announces Bing AMP viewer & JSON-LD support in Bing Webmaster Tools. The Bing Markup Validator does not yet help with JSON-LD, but Bing Webmaster Tools now helps with debugging JSON-LD. I like using this Structured Data Linter myself.


Copyright © 2018 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Google Patent on Structured Data Focuses upon JSON-LD appeared first on SEO by the Sea ⚓.


Source: http://feedproxy.google.com/~r/seobythesea/Tesr/~3/5KO-IxVqTK0/

Advertisements

Schema, Structured Data, and Scattered Databases such as the World Wide Web

I spoke at SMX Advanced this week on Schema markup and Structured Data, as part of an introduction to its use at Google.

I had the chance to visit Seattle, and tour some of it. I took some photos, but would like to go back sometimes and take a few more, and see more of the City.

One of the places that I did want to see was Pike Place market. It was a couple of blocks away from the Hotel I stayed at (the Marriott Waterfront.)

It is a combination fish and produce market, and is home to one of the earliest Starbucks.

pike-place-market-entrance

I could see living near the market and shopping there regularly. It has a comfortable feel to it.

Pike Place Farmers Market

This is a view of the Farmers Market from the side. I wish I had the chance to come back later in the day, and see what it was like other than in the morning.

Victor Steinbrueck Park

This was a nice little park next to Pike Place Market, which looked like a place to take your dog for a walk while in the area, and had a great view of Elliot Bay (the central part of Puget Sound.)

A view of Puget Sound

This is a view of the waterfront from closer to the conference center.

Mount Ranier

You can see Mount Ranier from the top of the Conference Center.

My presentation for SMX Advanced 2018:

https://www.slideshare.net/billslawski/slideshelfframe>

Schema, Structured Data & Scattered Databases Such as the World Wide Web. My role in this session is to introduce Schema and Structured Data and how Google is using them on the Web.

Google is possibly best known for the PageRank Algorithm invented by founder Lawrence Page, whom it is named after. In what looks like the second patent filed by someone at Google was the DIPRE (Dual interative pattern relation expansion) patent, invented and filed by Sergey Brin. He didn’t name it after himself (Brinrank) like Page did with PageRank.

The provisional patent filed for this invention was the whitepaper, “Extracting Patterns and Relations from Scattered Databases such as the World Wide Web.” The process behind it is set out in the paper, and it involves a list of 5 books, titles, their authors, Publishers, Year published. Unlike PageRank, it doesn’t involve crawling webpages, and indexing links from Page to page and anchor text. Instead, it involves collecting facts from page to page, and when it finds pages that contain properties and attributes from these five books, it is supposed to collect similar facts about other books on the same site. And once it has completed, it is supposed to move on to other sites and look for those same 5 books, and collect more books. The idea is to eventually know where all the books are on the Web, and facts about those books, that could be used to answer questions about them.

This is where we see Google being concerned about structured data on the web, and how helpful knowing about it could be.

When I first started out doing inhouse SEO, it was for a Delaware incorporation business, and geography was an important part of the queries that my pages were found for. I had started looking at patents, and ones such as this one on “Generating Structured Data caught my attention. It focused on collecting data about local entities, or local businesses, and properties related to those. It was built by the team led by Andrew Hogue, who was in charge of the Annotation framework at Google, who were responsible for “The Fact Repository”, an early version of Google’s Knowledge Graph.

If you’ve heard of NAP consistency, and of mentions being important to local search, it is because Local search was focusing on collecting structured data that could be used to answer questions about businesses. Patents about location prominence followed, which told us that a link counted as a mention, and a patent on local authority, which determined which Website was the authoritative one for a business. But, it seemed to start with collecting structured data about businesses at places.

The DIPRE Algorithm focused upon crawling the web to find facts, and Google Maps built that into an approach that could be used to rank places and answer questions about them.

If you haven’t had a chance to use Google’s experimental table search, it is worth trying out. It can answer questions to find answers from data-based tables across the web, such as “what is the longest wooden pier in California”, which is the one in Oceanside, a town next to the one I live in. It is from a Webtables project at Google.

Database fields are sometimes referred to as schema and table headers which tell us what kind of data is in a table column may also be referred to as “schema”. A data-based web table could be considered a small structured database, and Google’s Webtable project found that there was a lot of information that could be found in web tables on the Web.

Try out the first link above (the WebTables Project Slide) when you get the chance, and do some searches on Google’s table search. The second paper is one that described the WebTables project when it first started out, and the one that follows it describes some of the things that Google researchers learned from the Project. We’ve seen Structured Snippets like the one above grabbing facts to include in a snippet (in this case from a data table on the Wikipedia page about the Oceanside Pier.)

When a data table column contains the same data that another table contains, and the first doesn’t have a table header label, it might learn a label from the second table (and this is considered a way to learn semantics or meaning from tables) These are truly scattered databases across the World Wide Web, but through the use of crawlers, that information can be collected and become useful, like the DIPRE Algorithm described.

In 2005, the Official Google Blog published this short story, which told us about Google sometimes answering direct questions in response to queries at the top of Web results. I don’t remember when these first started appearing, but do remember Definition results about a year earlier, which you could type out “Define:” and a word or ask “What is” before a word and Google would show a definition, and there was a patent that described how they were finding definitions from glossary pages, and how to ideally set up those glossaries, so that your definitions might be the ones that end up as responses.

In 2012, Google introduced the Knowledge Graph, which told us that they would be focusing upon learning about specific people, places and things, and answering questions about those instead of just continuing to match keywords in queries to keywords in documents. They told us that this was a move to things instead of strings. Like the books in Brin’s DIPRE or Local Entities in Google Maps.

We could start using the Web as a scattered database, with questions and answers from places such as Wikipedia tables helping to answer queries such as “What is the capital of Poland”

And Knowledge bases such as Wikipedia, Freebase, IMDB and Yahoo Finance could be the sources of facts about properties and attributes about things such as movies and actors and businesses where Google could find answers to queries without having to find results that had the same keywords in the document as the query.

In 2011, The Schema.org site was launched as a joint project from Google, Yahoo, Bing, and Yandex, that provided machine-readable text that could be added to web pages. This text is provided in a manner that is machine readable only, much like XML sitemaps are intended to be machine-readable, to provide an alternative channel of information to search engines about the entities pages are about, and the properties and attributes on those pages.

While Schema.org was introduced in 2011, it was built to be extendable, and to let subject matter experts be able to add new schema, like this extension from GS1 (the inventors of barcodes in brick and mortar stores) If you haven’t tried out this demo from them, it is worth getting your hands on to see what is possible.

In 2014, Google published their Biperpedia paper, which tells us about how they might create ontologies from Query streams (sessions about specific topics) by finding terms to extract data from the Web about. At one point in time, Search engines would do focused crawls of the web starting at sources such as DMOZ, so that the Index of the Web they were constructing contained pages about a wide range of categories. By using query stream information, they are crowdsourcing the building of resources to build ontologies about. This paper tells us that Biperpedia enabled them to build ontologies that were larger than what they had developed through Freebase, which may be partially why Freebase was replaced by wiki data.

The Google+ group I’ve linked to above on the Schema Resources Page has members who work on Schema from Google, such as Dan Brickley, who is the head of schema for Google. Learning about extensions is a good idea, especially if you might consider participating in building new ones, and the community group has a mailing list, which lets you see and participate in discussions about the growth of Schema.


Copyright © 2018 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Schema, Structured Data, and Scattered Databases such as the World Wide Web appeared first on SEO by the Sea ⚓.


Source: http://feedproxy.google.com/~r/seobythesea/Tesr/~3/YnqcGY7LnuI/

Google to Offer Combined Content (Paid and Organic) Search Results

Combined Content Search Results

Google Introduces Combined Content Results

When Google patents talk about paid search, they refer to those paid results as “content” rather than as advertisements.

A recent patent from Google (Combining Content with Search Results) tells us about how Google might identify when organic search results might be about specific entities, such as brands. It may also recognize when paid results are about the same brands, whether they might be products from those brands.

In the event that a set of search results contains high ranking organic results from a specific brand, and a paid search result from that same brand, the process described in the patent might allow for the creation of a combined content result of the organic result with the paid result.

Merging Local and Organic Results in the Past

When I saw this new patent, it brought back memories of when Google found a way to merge organic search results with local search results. The day after I wrote about that, in the following post, I received a call from a co-worker who asked me if I had any idea why a top ranking organic result for a client might have disappeared from Google’s search results.

I asked her what the query term was, and who the client was. I performed the search, and noticed that our client was ranking highly for that query term in a local result, but their organic result had disappeared. I pointed her to the blog post I wrote the day before, about Google possibly merging local and organic results, with the organic result disappearing, and the local result getting boosted in rankings. It seemed like that is what happened to our client, and I sent her a link to my post, which described that.

How Google May Diversify Search Results by Merging Local and Web Search Results

Google did merge that client’s organic listing with their local listing, but it appeared that was something that they ended up not doing too often. I didn’t see them do that too many more times.

I am wondering, will Google start merging together paid search results with organic search results? If they would do that for local and organic results, which rank things in different ways, it is possible that they might with organic and paid. The patent describes how.

The newly granted patent does tell us about how paid search works in Search results at Google:

Content slots can be allocated to content sponsors as part of a reservation system, or in an auction. For example, content sponsors can provide bids specifying amounts that the sponsors are respectively willing to pay for presentation of their content. In turn, an auction can be run, and the slots can be allocated to sponsors according, among other things, to their bids and/or the relevance of the sponsored content to content presented on a page hosting the slot or a request that is received for the sponsored content. The content can be provided to a user device such as a personal computer (PC), a smartphone, a laptop computer, a tablet computer, or some other user device.

Combining Paid and Organic Results

Here is the process behind this new patent involving merging paid results (content) and organic results:

  1. A search query is received.
  2. Search results responsive to the query are returned, including one associated with a brand.
  3. Content items (paid search results) based at least in part on the query, are returned for delivery along with the search results responsive to the query.
  4. This approach includes looking to see if eligible content items are associated with a same brand as the brand associated in the organic search results.
  5. If there is a paid result and an organic result that are associated with each othte, it may combine the organi search result and the eligible content item into a combined content item, and provide the combined content item as a search result responsive to the request.

When Google decides whether the eligible content item is associated with the same brand as an organi result, it is a matter of determining that one content item is sponsored by an owner of the brand.

A combined result (of the paid and the organic results covering the same brand) includes combining what the patent is referring to as “a visual universal resource locator (VisURL),”

That combined item would include:

  • A title
  • Text from the paid result
  • A link to a landing page from the paid result into the combined content item
  • The combine items may also includ other information associated with the brand, such as:

  • A map to retail locations associated with brand retail presence.
  • Retail location information associated with the brand.

In addition to the brand owner, the organic result that could be combine might be from a retailer associated with the brand.

It can involve designating content from the sponsored item that is included in the combined content item as sponsored content (so it may show that content from the paid result as being an ad.)

It may also include “monetizing interactions with material that is included from the at least one eligible content item that is included in the combined content item based on user interactions with the material.” Additional items shown could include an image or logo associated with the brand, or one or more products associated with the brand, or combine additional links relevant to the result.

Additional Brand Content in Search Results

The patent behind this approach of combining paid and organic results was this one, granted in April:

Combining content with a search result
Inventors: Conrad Wai, Christopher Souvey, Lewis Denizen, Gaurav Garg, Awaneesh Verma, Emily Kay Moxley, Jeremy Silber, Daniel Amaral de Medeiros Rocha and Alexander Fischer
Assignee: Google LLC
US Patent: 9,947,026
Granted: April 17, 2018
Filed: May 12, 2016

Abstract

Methods, systems, and apparatus include computer programs encoded on a computer-readable storage medium, including a method for providing content. A search query is received. Search results responsive to the query are identified, including identifying a first search result in a top set of search results that is associated with a brand. Based at least in part on the query, one or more eligible content items are identified for delivery along with the search results responsive to the query. A determination is made as to when at least one of the eligible content items is associated with a same brand as the brand associated with the first search result. The first search result and one of the determined at least one eligible content items are combined into a combined content item and providing the combined content item as a search result responsive to the request.

The patent does include details on things such as an “entity/brand determination engine,” which can be used to compare paid results with organic results, to see if they cover the same brand. This is one of the changes that indexing things instead of strings is bringing us.

The patent does have many other details, and until Google announces that they are introducing this, I suspect we won’t hear more details from them about it. Then again, they didn’t announce officially that they were merging organic and local results when they started doing that. Don’t be surprised if this becomes available at Google.


Copyright © 2018 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Google to Offer Combined Content (Paid and Organic) Search Results appeared first on SEO by the Sea ⚓.


Source: http://feedproxy.google.com/~r/seobythesea/Tesr/~3/CqhevbwaEXI/

10 Tips to Get More YouTube Subscribers

Original source: 10 Tips to Get More YouTube Subscribers via DailySEOblog.

Did you know that the very first video on YouTube was uploaded on April 23, 2005? It was an 18 second video titled “Me at the Zoo” and had never been publicly shown before, until now. At present, YouTube subscribers account for almost one-third of the Internet users, numbering in the billions. Driven by the…

Source: https://dailyseoblog.com/get-youtube-subscribers/#utm_source=rss&utm_medium=rss

10 Top Affiliate Networks to Make Money

Original source: 10 Top Affiliate Networks to Make Money via DailySEOblog.

I am amazed to see how Affiliate Marketing has gained traction in recent times. It is a part of the marketing strategy of almost all leading brands in the world. In fact, US affiliate marketing spend is expected to increase at a CAGR of 10.1% between 2015 and 2020, to become an estimated $6.8 billion…

Source: https://dailyseoblog.com/10-top-affiliate-networks-to-make-money/#utm_source=rss&utm_medium=rss

10 Tips to Improve AdSense Income

Original source: 10 Tips to Improve AdSense Income via DailySEOblog.

Google launched its AdSense program in March 2003. It was initially named content targeting advertising. AdSense is a simple Pay Per Click (PPC ads) system that displays ads related to your site’s content. Bloggers or website owners get paid each time someone clicks on the AdSense ads on their site. Due to its simplicity and…

Source: https://dailyseoblog.com/tips-to-improve-adsense-income/#utm_source=rss&utm_medium=rss

10 Tips to Increase Your Instagram Followers

Original source: 10 Tips to Increase Your Instagram Followers via DailySEOblog.

Instagram has gained such huge popularity in recent times that it recorded a whopping 800 million monthly active users in September 2017, up from 600 million in December 2016, according to data published by Statista. Teens and young millennials actively share photos and videos on this highly visual social networking platform. And, the aspiration of…

Source: https://dailyseoblog.com/increase-instagram-followers/#utm_source=rss&utm_medium=rss