-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
Hi, here is another bug I stumbled across.
When running parse_wikipedia_tables.py I get the following output:
$ ./parse_wikipedia_tables.py -j -p
[+] parsed 249 ISO-3166 country codes
> duplicate entry for MCC 362: Caribbean Netherlands // Curaçao
> duplicate entry for MCC 647: French Indian Ocean Territories // French Indian Ocean Territories
> duplicate entry for MCC 234: Guernsey // Isle of Man
> duplicate entry for MCC 234: Guernsey // Jersey
> duplicate entry for MCC 340: Guadeloupe // Martinique
> duplicate entry for MCC 505: Australia // Norfolk Island
> duplicate entry for MCC 310: Guam // Northern Mariana Islands
> duplicate entry for MCC 425: Israel // State of Palestine
> duplicate entry for MCC 340: Guadeloupe // Saint Barthélemy
> duplicate entry for MCC 340: Guadeloupe // Collectivity of Saint Martin
> duplicate entry for MCC 362: Caribbean Netherlands // Sint Maarten
> duplicate entry for MCC 234: Guernsey // United Kingdom
> duplicate entry for MCC 310: Guam // United States of America
> duplicate entry for MCC 311: Guam // United States of America
[+] parsed 252 MCC entries (238 unique MCC) for 238 ISO-3166 country codes
parse_table_mnc_all: unable to download and / or parse Wikipedia HTML tables ; exception: Exception('unable to find headline title for MNC country name')I fixed that in a PR (#4), but now the error looks like that:
...
[+] parsed 5 MNC entries for MCC 716
[+] parsed 4 MNC entries for MCC 746
[+] parsed 6 MNC entries for MCC 748
[+] parsed 6 MNC entries for MCC 734
[+] 3325 MCC MNC entries for 3277 unique MCC MNC
parse_table_msisdn_pref: unable to download and / or parse Wikipedia HTML tables ; exception: AssertionError()This time it seems to be parsing the https://en.wikipedia.org/wiki/List_of_country_calling_codes, but there is no Summary section anymore.
system infos
OS: Ubuntu 24.04.2 LTS
Python: 3.12.3
lxml: 5.4.0
pdftotext: 24.02.0
Metadata
Metadata
Assignees
Labels
No labels