Skip to content

Getting Unreachable hosts error when trying to scrape data #576

@beeena

Description

@beeena

I'm trying to scrape data using the following command.

docker run -it --env-file=./config/development/dev.env -e "CONFIG=$(cat ./config/config.json | jq -r tostring)" algolia/docsearch-scraper

Although I have ensured the usage of an accurate API-key and App-ID, I am encountering an error of "Unreachable hosts".

Error

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/src/index.py", line 119, in <module>
    run_config(environ['CONFIG'])
  File "/root/src/index.py", line 45, in run_config
    config.query_rules
  File "/root/src/algolia_helper.py", line 21, in __init__
    self.index_name_tmp
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/algoliasearch/search_client.py", line 127, in copy_rules
    return self.copy_index(src_index_name, dst_index_name, request_options)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/algoliasearch/search_client.py", line 94, in copy_index
    request_options,
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/algoliasearch/http/transporter.py", line 35, in write
    return self.request(verb, hosts, path, data, request_options, timeout)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/algoliasearch/http/transporter.py", line 72, in request
    return self.retry(hosts, request, relative_url)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/algoliasearch/http/transporter.py", line 94, in retry
    raise AlgoliaUnreachableHostException("Unreachable hosts")
algoliasearch.exceptions.AlgoliaUnreachableHostException: Unreachable hosts

config.json


{
    "index_name": "dev_RESORTIFI_HELP",
    "start_urls": [
      "https://help.resortifi.com/"
    ],
    "sitemap_urls": [
      "https://help.resortifi.com/sitemap.xml"
    ],
    "sitemap_alternate_links": true,
    "stop_urls": [
      "/tests"
    ],
    "selectors": {
      "lvl0": {
        "selector": "(//ul[contains(@class,'menu__list')]//a[contains(@class, 'menu__link menu__link--sublist menu__link--active')]/text() | //nav[contains(@class, 'navbar')]//a[contains(@class, 'navbar__link--active')]/text())[last()]",
        "type": "xpath",
        "global": true,
        "default_value": "Documentation"
      },
      "lvl1": "header h1",
      "lvl2": "article h2",
      "lvl3": "article h3",
      "lvl4": "article h4",
      "lvl5": "article h5, article td:first-child",
      "lvl6": "article h6",
      "text": "article p, article li, article td:last-child"
    },
    "strip_chars": " .,;:#",
    "custom_settings": {
      "separatorsToIndex": "_",
      "attributesForFaceting": [
        "language",
        "version",
        "type",
        "docusaurus_tag"
      ],
      "attributesToRetrieve": [
        "hierarchy",
        "content",
        "anchor",
        "url",
        "url_without_anchor",
        "type"
      ]
    },
    "conversation_id": [
      "833762294"
    ],
    "nb_hits": 46250
  }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions