unable to run the scraper on local url 

hey, I am trying to run the scraper on my Local . In my config when I given the url which is live then the scraper indexing my document , but when I change the url to local url it's don't scrape the documentation . I convert the local port url to ngrock url and still don't scrape not indexing  but for prod live url it's working fine (somewhere giving same  result more then one on search hits) . 

 In local it's indexing only normal HTML content and some script and css but not the content of my document which is build by JavaScript . 

here is my **config** file 
`  {
    "index_name": "payment-page",
    "js_render": true,
    "js_wait": 10,
    "use_anchors": false,
    "user_agent": "Custom Bot",
    "start_urls": [
      "https://a65e-103-159-11-202.in.ngrok.io/payment-page/android/overview/pre-requisites",
      "https://a65e-103-159-11-202.in.ngrok.io/payment-page/android/base-sdk-integration/session",
      "https://a65e-103-159-11-202.in.ngrok.io/payment-page/android/base-sdk-integration/order-status-api",

      "https://a65e-103-159-11-202.in.ngrok.io/payment-page/android/base-sdk-integration/getting-sdk",
      "https://a65e-103-159-11-202.in.ngrok.io/payment-page/android/base-sdk-integration/initiating-sdk",
      "https://a65e-103-159-11-202.in.ngrok.io/payment-page/android/base-sdk-integration/processing-sdk",
      "https://a65e-103-159-11-202.in.ngrok.io/payment-page/android/base-sdk-integration/handle-payment-response",
      "https://a65e-103-159-11-202.in.ngrok.io/payment-page/android/base-sdk-integration/life-cycle-events",

      "https://a65e-103-159-11-202.in.ngrok.io/payment-page/android/resources/error-codes",
      "https://a65e-103-159-11-202.in.ngrok.io/payment-page/android/resources/transaction-status",
      "https://a65e-103-159-11-202.in.ngrok.io/payment-page/android/resources/sample-payloads"

      


    ],
    "sitemap_alternate_links": false,
    "selectors": {
      "lvl0":"h1,.screen2 h2 , .heading-text" ,
      "lvl1": "h3, .label" ,
      "lvl2": ".key-header, .step-card-header-text, .th-row",
      "text":".screen2 p:not(:empty), .hero-welcome, .screen2 li, .main-screen, .only-steps p:not(:empty),td"
    },
    "strip_chars": " .,;:#",
    "custom_settings": {
      "separatorsToIndex": "_",
      "attributesForFaceting": [
        "language",
        "version",
        "type",
        "docusaurus_tag"
      ],
      "attributesToRetrieve": [
        "hierarchy",
        "content",
        "anchor",
        "url",
        "url_without_anchor",
        "type"
      ],
      "synonyms": [
        [
          "js",
          "javascript"
        ],
        [
          "es6",
          "ECMAScript6",
          "ECMAScript2015"
        ]
      ]
    }
  }
  `

used the commend to run the Scraper 
`docker run -it --env-file=/my/clone/scraper/located/path/.env -e "CONFIG=$(cat config.json | jq -r tostring)" d2ebdc22bee2a9f6513e68457d9a3825850f325449a225bc6cde1a1f7339e1e4  
`

my changes broser_handler.py (have to made the changes for rendering JS content on my documentation , before facing same issue for lived url also)
`import re
import os
from selenium import webdriver

from selenium.webdriver.chrome.options import Options
from ..custom_downloader_middleware import CustomDownloaderMiddleware
from ..js_executor import JsExecutor


class BrowserHandler:
    @staticmethod
    def conf_need_browser(config_original_content, js_render):
        group_regex = re.compile(r'$\?P<(.+?)>.+?$')
        results = re.findall(group_regex, config_original_content)

        return len(results) > 0 or js_render

    @staticmethod
    def init(config_original_content, js_render, user_agent):
        driver = None

        if BrowserHandler.conf_need_browser(config_original_content,
                                            js_render):
            chrome_options = Options()
            chrome_options.add_argument('--no-sandbox')
            chrome_options.add_argument('--headless')
            chrome_options.add_argument('user-agent={0}'.format(user_agent))
            chrome_options.add_argument('--disable-dev-shm-usage')

            # CHROMEDRIVER_PATH = os.environ.get('CHROMEDRIVER_PATH',
            #                                    "/usr/bin/chromedriver")
            # if not os.path.isfile(CHROMEDRIVER_PATH):
            #     raise Exception(
            #         "Env CHROMEDRIVER_PATH='{}' is not a path to a file".format(
            #             CHROMEDRIVER_PATH))
            driver = webdriver.Remote(command_executor='http://host.docker.internal:4444', options=chrome_options)
            CustomDownloaderMiddleware.driver = driver
            JsExecutor.driver = driver
        return driver

    @staticmethod
    def destroy(driver):
        # Start browser if needed
        if driver is not None:
            driver.quit()
            driver = None

        return driver
`

output after run the scraper 
<img width="1728" alt="Screenshot 2023-03-09 at 12 48 03 AM" src="https://user-images.githubusercontent.com/83442151/223819702-b7eef561-75ae-4397-a539-897fa07dfcea.png">


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

unable to run the scraper on local url #574

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

unable to run the scraper on local url #574

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions