Skip to content

parse_wikipedia_tables.py - running into AssertionError #5

@tobiasfunke1

Description

@tobiasfunke1

Hi, here is another bug I stumbled across.

When running parse_wikipedia_tables.py I get the following output:

$ ./parse_wikipedia_tables.py -j -p
[+] parsed 249 ISO-3166 country codes
> duplicate entry for MCC 362: Caribbean Netherlands // Curaçao
> duplicate entry for MCC 647: French Indian Ocean Territories // French Indian Ocean Territories
> duplicate entry for MCC 234: Guernsey // Isle of Man
> duplicate entry for MCC 234: Guernsey // Jersey
> duplicate entry for MCC 340: Guadeloupe // Martinique
> duplicate entry for MCC 505: Australia // Norfolk Island
> duplicate entry for MCC 310: Guam // Northern Mariana Islands
> duplicate entry for MCC 425: Israel // State of Palestine
> duplicate entry for MCC 340: Guadeloupe // Saint Barthélemy
> duplicate entry for MCC 340: Guadeloupe // Collectivity of Saint Martin
> duplicate entry for MCC 362: Caribbean Netherlands // Sint Maarten
> duplicate entry for MCC 234: Guernsey // United Kingdom
> duplicate entry for MCC 310: Guam // United States of America
> duplicate entry for MCC 311: Guam // United States of America
[+] parsed 252 MCC entries (238 unique MCC) for 238 ISO-3166 country codes
parse_table_mnc_all: unable to download and / or parse Wikipedia HTML tables ; exception: Exception('unable to find headline title for MNC country name')

I fixed that in a PR (#4), but now the error looks like that:

...
[+] parsed 5 MNC entries for MCC 716
[+] parsed 4 MNC entries for MCC 746
[+] parsed 6 MNC entries for MCC 748
[+] parsed 6 MNC entries for MCC 734
[+] 3325 MCC MNC entries for 3277 unique MCC MNC
parse_table_msisdn_pref: unable to download and / or parse Wikipedia HTML tables ; exception: AssertionError()

This time it seems to be parsing the https://en.wikipedia.org/wiki/List_of_country_calling_codes, but there is no Summary section anymore.

system infos

OS: Ubuntu 24.04.2 LTS
Python: 3.12.3
lxml: 5.4.0
pdftotext: 24.02.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions