Skip to content

Strange lines in eng.tagged corpus #20

@AMR-KELEG

Description

@AMR-KELEG

I am currently using the texts/eng.tagged file for testing the new weighting algorithms.
While using the file, I noticed that it has some lines with just a single double quotation character!
(Example: https://github.com/apertium/apertium-eng/blob/master/texts/eng.tagged#L823)

^the/the<det><def><sp>$
"
^golden/golden<adj>$
^axe/axe<n><sg>$
"
^competition/competition<n><sg>$

Should these lines be fixed?
I don't want to handle it in my script if it's a bug in the tagged corpus and I believe fixing these lines is just a simple find and replace command that any text editor can do easily.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions