Skip to content

Conversation

@sanketverma1704
Copy link

Removed class NLPHandler() and added sentiment analysis functionality in class MLHandler().

To setup a Gramex service for performing sentiment analysis, use the following configuration:

url:
  sentiment-analysis:
    pattern: /$YAMLURL/
    handler: MLHandler
    kwargs:
      backend: transformers
      task: sentiment-analysis
      xsrf_cookies: false

Getting predictions

GET sentiments of short pieces of text as follows:

curl -X GET --data-urlencode "text=This movie is so bad, it's good." http://localhost:9988/

The output will be:

[
  {
    "label": "POSITIVE",
    "score": 0.9997316002845764
  }
]

Files containing text to be classified can also be POSTed to the endpoint, with _action=predict. Any file supported by gramex.cache.open will work. (Download a sample here.)

curl -X POST -F "[email protected]" http://localhost:9988/?_action=predict

The output will be:

[
  {
    "label": "POSITIVE",
    "score": 0.9997316002845764
  },
  {
    "label": "NEGATIVE",
    "score": 0.9974692463874817
  },
  // etc.
]

Measuring model performance

Files containing the text and label fields can be POSTED to the endpoint
with _action=score to get the ROC AUC score of the model against the dataset. (Download a sample dataset here).

curl -X POST -F "[email protected]" http://localhost:9988/?_action=score

The output will be something like:

{
  "roc_auc": 0.9929
}

Training the model

The model can be trained on a dataset by setting _action=train, and POSTing the file.

curl -X POST -F "file=@sentiment_score.json" http://localhost:9988/?_action=train

The output will show the score of the trained model on the dataset:

{
  "roc_auc": 0.8
}

Multiple training options for the transformer are supported, including the number of epochs, batch size and weight decay. These can all be specified in the POST request as follows:

# Train for three epochs instead of the default 1
curl -X POST -F "[email protected]" http://localhost:9988/?_action=train&num_train_epochs=3

The output is the score of the trained model on the dataset after 3 epochs:

{
  "roc_auc": 0.98
}
# Change the batch size to 32 instead of the default 16
curl -X POST -F "[email protected]" \
	http://localhost:9988/?_action=train&per_device_train_batch_size=32&num_train_epochs=3

The output is the score of the trained model on the dataset after 3 epochs and a batch size of 32:

{
  "roc_auc": 0.99
}

@sanand0
Copy link
Contributor

sanand0 commented Jun 21, 2021

Cool! @jaidevd could you please review? Do let me know when to merge

@sanand0 sanand0 requested a review from jaidevd June 21, 2021 15:04
@jaidevd
Copy link
Contributor

jaidevd commented Jun 26, 2021

@MSanKeys963 The target branch has to be gramener/gramex's master branch, not the jd-transformers branch.

@jaidevd
Copy link
Contributor

jaidevd commented Jun 26, 2021

@MSanKeys963 other than these two changes, LGTM

@jaidevd jaidevd changed the base branch from jd-transformers to master June 30, 2021 09:54
@jaidevd
Copy link
Contributor

jaidevd commented Jul 7, 2021

@MSanKeys963 this still showing merge conflicts. Please take a look.

@sanketverma1704
Copy link
Author

@jaidevd I've fixed all the issues mentioned above. Please let me know if there's anything else.

@jaidevd
Copy link
Contributor

jaidevd commented Jul 19, 2021

Thanks, @MSanKeys963

@sanand0 This is ready for merge.

@sanketverma1704
Copy link
Author

@sanand0 I've fixed all the issues. Please check.

@sanand0
Copy link
Contributor

sanand0 commented Jul 30, 2021

@MSanKeys963

  • Can you get this to work, please? sentiment.zip
  • Gramex should still run if PyTorch & Huggingface are not installed

For example, this is how we optionally import ElasticSearch:

def gramexlog(conf):
    try:
        from elasticsearch import Elasticsearch, helpers
    except ImportError:
        app_log.error('gramexlog: elasticsearch missing. pip install elasticsearch')
        return

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants