**Update:** I've built a new version of this tool, including a Chrome extension. [Read about it here](/blog/gsc-index-coverage-extractor/).

In my previous article I talked about how to [get indexing information in bulk from Google Search Console](https://jlhernando.com/blog/url-inspector-automator-node/) using the URL Inspection Tool and Node.js. This tool is great to gather individual information about specific URLs in your site. However, Google also provides site owners with a more holistic view of the indexing status of their sites with the Index Coverage Report.

![Index Coverage Report Search Console](/img/index-coverage-search-console-top-edit.jpg 'Index Coverage Report Search Console')

You can check [Google’s own documentation](https://support.google.com/webmasters/answer/7440203?hl=en) and [video tutorial](https://www.youtube.com/watch?v=L0UqvdHJaXE) to understand in more detail the data this section provides, but at a very top level the key data points are:

1. **The amount of pages that Google has indexed.**
2. **The amount of pages that Google has found but has not indexed (either because of an error or purposefully excluded).**
3. **How big your site is from Google’s point of view (Valid + Excluded + Errors).**

Right now there are four main categories: **Errors**, **Valid with warning**, **Valid** and **Excluded** subdivided into 29 subcategories. Each of these subcategories provide an additional level of classification to help site owners and SEOs understand why your URLs belong in the main category. Not all subcategories will be visible, only the ones that apply to your site.

![Details table Index Coverage Report Search Console](/img/subcategories-coverage.JPG 'Details table Index Coverage Report Search Console')

Unfortunately, the export option on the Index Coverage Report view (pictured above) only gives you the top level numbers per report. If you want to know and export which URLs are inside the multiple reports, you have to click on each report and export them one by one.

![Export individual coverage report detail](/img/errors-report-edit.jpg 'Export individual coverage report detail')

This way to extract the data is very manual and time consuming. Hence, I decided to automate it with Node.js and add it a few more features.

## Installing and running the script

Make sure that you have [Node.js](https://nodejs.org/en/) in your machine. At the time of writing this post I’m using version 14.16.0. In this script I'm using a specific syntax that can only be used from version 14 onwards so double check that you are above that version.

```bash
# Check Node version
node -v
```

Download the script using git, Github’s CLI or simply [downloading the code from Github directly](https://github.com/jlhernando/index-coverage-extractor).

```bash
# Git
git clone https://github.com/jlhernando/index-coverage-extractor.git

# Github’s CLI
gh repo clone https://github.com/jlhernando/index-coverage-extractor
```

Then install the necessary modules to run the script by typing _npm install_ in your terminal

```bash
npm install
```

In order to extract the coverage data from your website/property update the credential.js file with your Search Console credentials.
![Update credentials.js with your Search Console user & password](/img/credentials.jpg 'update credentials.js with your Search Console user & password')

After that use your terminal and type _npm start_ to run the script.

```bash
npm start
```

The script logs the processing in the console so you are aware of what is happening.
![Index Coverage Extractor in action](/img/index-coverage-headless.jpg 'Index Coverage Extractor in action')

Like in the [URL Inspection Automator](https://github.com/jlhernando/url-inspector-automator-js), the script uses [Playwright](https://playwright.dev/) and runs in headless mode. If you want to see the browser automation in action, simply change the launch option to _headless: false_ in the index.js file and save it before running the script.

![change headless mode to see the browser automation in action](/img/browser-headless-false.gif 'change headless mode to see index coverage automation in action')

## The output

The script will create a "coverage.csv" file and a "summary.csv" file.

The "coverage.csv" will contain all the URLs that have been extracted from each individual coverage report.
![Coverage report detail csv](/img/coverage-csv.jpg 'index coverage report export detail csv')

The "summary.csv" will contain the amount of urls per report that have been extracted, the total number that GSC reports in the user interface (either the same or higher) and an "extraction ratio" which is a division between the URLs extracted and the total number of URLs reported by GSC.

![Coverage report summary csv](/img/coverage-summary.jpg 'index coverage report export summary csv')

I believe this extra data point is useful because GSC has an export limit of 1000 rows per report. Hence, the "extraction ratio" gives you an accurate idea of how many URLs you have been able to extract versus how many you are missing from that report.

## Future updates

There are a few features that I think would be really nice to have for future releases. For example, extract the "update" date and modify the script as a Google cloud function to store the data in BigQuery only if the date is different to the previous date stored.

I know other SEOs are doing this kind of extraction already using Google Sheets which is very cool so I might give that a try. In the meantime, I hope that you find this script useful and if you have any thoughts [hit me up on Twitter](https://twitter.com/jlhernando).

