Skip to content

Conversation

@ejdweck
Copy link

@ejdweck ejdweck commented Nov 4, 2018

I was using the sentiment library and noticed when I ran analysis on headlines that utilized single quotes, the words were not being properly tokenized.

For example, for the news headline from cnn.com that reads:

Abrams: Trump is 'wrong,' I am qualified to be Georgia's governor

wrong should be tokenized from 'wrong' to wrong.

In its current state, the library successfully tokenizes words from double quotes but not from single quotes (my guess is to preserve apostrophes - if you add an ' to the .replace regex, all single quotes would be removed).

Here is some code to reproduce error:

var Sentiment = require('sentiment');
var sentiment = new Sentiment();

let noQuotes = "Abrams: Trump is wrong, I am qualified to be Georgia's governor";
let singleQuotes = "Abrams: Trump is \'wrong\', I am qualified to be Georgia's governor";
let doubleQuotes = "Abrams: Trump is \"wrong,\" I am qualified to be Georgia's governor"

let noQuotesResult = sentiment.analyze(noQuotes);
var doubleQuotesResult = sentiment.analyze(doubleQuotes);
var singleQuotesResult = sentiment.analyze(singleQuotes);

console.log(noQuotesResult);
console.log(doubleQuotesResult);
console.log(singleQuotesResult);
{ score: -2,
  comparative: -0.18181818181818182,
  tokens:
   [ 'abrams',
     'trump',
     'is',
     'wrong',
     'i',
     'am',
     'qualified',
     'to',
     'be',
     'georgia\'s',
     'governor' ],
  words: [ 'wrong' ],
  positive: [],
  negative: [ 'wrong' ] }
{ score: -2,
  comparative: -0.18181818181818182,
  tokens:
   [ 'abrams',
     'trump',
     'is',
     'wrong',
     'i',
     'am',
     'qualified',
     'to',
     'be',
     'georgia\'s',
     'governor' ],
  words: [ 'wrong' ],
  positive: [],
  negative: [ 'wrong' ] }
{ score: 0,
  comparative: 0,
  tokens:
   [ 'abrams',
     'trump',
     'is',
     '\'wrong\'',
     'i',
     'am',
     'qualified',
     'to',
     'be',
     'georgia\'s',
     'governor' ],
  words: [],
  positive: [],
  negative: [] }

…strophes + 3 unit tests to verify code works as expected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants