Skip to content
This repository was archived by the owner on Sep 14, 2018. It is now read-only.
This repository was archived by the owner on Sep 14, 2018. It is now read-only.

xml.etree.ElementTree can't read some XMLs #1300

@NValerij

Description

@NValerij

Hello.
There are 3 issues with this file.
Code to reproduce:

import xml.etree.ElementTree as ET
ET.parse('test.xml')
  • BOM is not recognized: xmllib.Error: Syntax error at line 1: illegal data at start of file.
    OK, I can workaround it with ET.parse(codecs.open(r'D:\NLC\LexicalSpanAnnotator\TestData\test.xml', 'r', encoding = 'utf-8'))
  • Symbol with code 8233 brakes parsing: xmllib.Error: Syntax error at line 3: illegal character in content.
    I also can do workaround it (load text and replace this symbol with 
 mnemonic, but it is not a good idea in general).
  • There are no empty line in the end and very strange message about it: xmllib.Error: Syntax error at line 4: data not in content

I've checked this file with ElementTree parser from Python 3.4 (sorry, no 2.7 installed) and with msxml-parser. Both have done this task OK.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions