HOW GOOGLEBOT INDEXES A SITE DEVELOPED BY A JAVASCRIPT FRAMEWORK

The statistics show that webmasters use for creating site well-known JavaScript frameworks React.js, Vue.js, Angular.js and others. Complex interfaces, effects and animation are developed. The sites, developed as SPA (single page application), are becoming more and more popular on the modern Internet. Nevertheless, SEO problem of JavaScript site is not solved in 2018 yet. In 2019 only Googlebot works with SPA pages but the developers of search engine notify of problems and recommend searching for an alternative. Bingbot and Yandexbot work with SPA unpredictable.

 React Usage Statistics in January, 2019
React Usage Statistics in January, 2019

As early as in 2014 Google representatives claimed that the search engine processed JavaScript on sites, but company representatives recommended being cautious and deliberate to JS technologies. Let’s briefly consider the process of indexing a web page by the Google search engine to distinguish a problem.

The scanning has three stages. Initially, the search engine evaluates the possibility of indexing for this reason it is better to develop websites in a way that the search robot has the ability to scan content, considering the site structure. Next, we estimate the possibility of rendering and analyzing the received information. The robot must receive all the necessary content, o complete the operation.  Scanning time is the time allocated by the search engine to analyze the site. Time is strictly limited and only the content that is processed by the search engine is included in the index.

The problem with the processing of JavaScript site arises at the stage of site analysis. Often it can happen that interacting with page elements, the script does not add an element, but rather removes it. It turns out such a situation in which after get the scripts done, Googlebot gets half-empty HTML. The second point is the costs for scanning. It is necessary to avoid complex logic and considerable costs. Googlebot may not have time to do scripts and some of the content will be lost.

The scheme of site page scanning by Googlebot
The scheme of site page scanning by Googlebot

Moreover, it is also important not to forget that other browsers do not deal with JavaScript. Bing and Yahoo in the United States occupy a significant market share; Yandex search engine is popular in the CIS. The task is to develop a commercial site and, if you want a modern SPA, at the time of product development you should plan how the problem of indexing will be solved. Hope only for technical advancement of Google will trigger for rating downgrade of search results on other sites.

Solving SPA indexing problems with free software

Let’s consider solutions of the problem using free software. The client doesn’t need to pay for expensive licenses and develop a solution from scratch. Our task is to solve SPA problems without additional expenses for programmers and software. All you need is to connect and configure a ready-made solution.

The first appropriate solution is called Rendora - this is a new FOSS (free and open-source software) project written in the GO language. Rendora is a dynamic renderer that works with Google headless. In fact, Rendora makes a ready-made HTML server-side render and renders the search engine, and the user works with a standard SPA site. The client receives a standard isomorphic render. There is no need to find a new solution. You just need to connect and configure Rendora.

Rendora functions as a reverse HTTP proxy before the main backend server. The preferred server can work on any technology. For our projects we use x4.cms written in php either Django or Node.js.

Functional block diagram of Rendora
Functional block diagram of Rendora

The Rendora server checks the requests and in case the required page is requested by the bot, the program sends a command to Google headless to request the search page and then returns the SSR version to the client. In such a way Googlebot gets static HTML. If the page is requested by regular users, Rendora sends the HTML to the client in its original form, without changing anything, like a standard proxy server.

From the visible advantages of the solution, rendora is developed on Golang, which significantly speeds up the work in comparison with similar solutions on Node.js. Caching is implemented for storing SSR pages and subsequent instant issue. You can set up ignoring some unimportant content for indexing, such as types and images, which speeds up Google headless work.

You can read more about configuring and running Rendora on the official github project page.

In the next article we will look at alternative solutions to the problem of rendering and indexing SPA applications.

Ready to start?