Skip to content

SAXParser: Add support for LexicalHandler.startEntity() and .endEntity() #209

@philippn

Description

@philippn

Hi there,

I've been looking at migrating some of our code from Apache Xerces to Woodstox. So far it works great (thanks for the great work!), but there is one issue that I stumbled about.

Consider your have an entity reference somewhere in your XML (please bear with me for the silly example)

    <meta>
        <prodname>eyePhone&copyright; 2.0</prodname>
        <company>MomCorp</company>
        <prodtype>mobile phone</prodtype>
        <source>http://theinfosphere.org/EyePhone</source>
    </meta>

The referenced entity is correctly supplied in the DTD.

Now when I parse this file with Xerces and let the receiving handler print out the events it receives, I end up with this:

[main] INFO LoggingXMLFilter - Start element: meta meta
[main] INFO LoggingXMLFilter - Start element: prodname prodname
[main] INFO LoggingXMLFilter - Characters: eyePhone
[main] INFO LoggingXMLFilter - Start entity: copyright
[main] INFO LoggingXMLFilter - Characters: ©
[main] INFO LoggingXMLFilter - End entity: copyright
[main] INFO LoggingXMLFilter - Characters: 2.0
[main] INFO LoggingXMLFilter - End element: prodname prodname

In contrast, Woodstox by default produces this:

[main] INFO LoggingXMLFilter - Start element: meta meta
[main] INFO LoggingXMLFilter - Start element: prodname prodname
[main] INFO LoggingXMLFilter - Characters: eyePhone© 2.0
[main] INFO LoggingXMLFilter - End element: prodname prodname

I have already figured out that when I call doReplaceEntityRefs(false); on the factory, the result changes to

[main] INFO LoggingXMLFilter - Start element: meta meta
[main] INFO LoggingXMLFilter - Start element: prodname prodname
[main] INFO LoggingXMLFilter - Characters: eyePhone
[main] INFO LoggingXMLFilter - Skipped entity: copyright
[main] INFO LoggingXMLFilter - Characters: 2.0
[main] INFO LoggingXMLFilter - End element: prodname prodname

But given that I have supplied a LexicalHandler I would actually expect the same call sequence as with Xerces.

I have looked into this further and my solution for it would be something along those lines: master...philippn:woodstox:feature/start-end-entity

If you find this sensible, I would be happy to provide a pull request. I can also provide the silly sample code, just let me know and I can upload it as well :-)

Thanks in advance and kind regards,
Philipp

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions