#OCRJava Optical Character Recognition Service. This is the project for Innovation Day practice, also an important asset of Bluemix and Cognitive CoEs.
#Prerequisite
- Register your Bluemix account
- Create
Text to Speechservice - Install Bluemix and CF CLI
- Install Xcode (macOS only)
- Install Eclipse Java EE IDE for Web Developers as your IDE (Download)
- Setup Tomcat or Websphere Application Server Liberty Profile in the Eclipse for debugging purpose. Drag and drop this link into the Eclipse if you are installing
Websphere Application Server Liberty Profile.
#Installation guide Windows
- Install the
tesseract, download Windows Installer here (tesseract-ocr-setup-3.02.02.exe)
macOS
- Install HomeBrew
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"- Install the
tesseract
brew install tesseract`- Grant authority for the folder by changing owner or grant 766 in case you don't have the access
sudo chown -R $USER /usr/local- Install Xcode, because
tesseractneeds to be compiled as it only provide the source code
Any platform
- Run git command or download the source code here
git clone [email protected]:CognitiveBuild/OCRJava.git- When run the code in
Websphere Application Server Liberty Profileas the web project, you need to downloadjai_imageio-1.1.jarand put it into/Library/Java/JavaVirtualMachines/jdk{version}.jdk/Contents/Home/jre/lib/extfolder. Maybe OSGI cause the problem - Add
Text-to-Speechcredentials in the code file:/OCRJava/src/com/ibm/waston/WastonSpeechHelper.java, obtain the credentials from Bluemix account
private static final String TEXT_TO_SPEECH_USERNAME = "your_username";
private static final String TEXT_TO_SPEECH_PASSWORD = "your_password";- Right click on the Chatbot project, choose
Run As>Run on Serverto open the OCR sample application
#How to use
- Click on
Choose a filebutton, then click onRecognizebutton in the Firefox or Google Chrome
#Dependencies
- Tesseract for Java
- Apache Common IO
- FastJSON
- jai-imageio
- JNA
- Apache HTTP Client (Upload)
- Watson Java SDK
#Issues
- jai_imageio-1.1.jar can't be loaded in the project dependencies if the Java Runtime is the Liberty Profile
- Some Chinese characters cannot be well recognized due to the font issue, so the
tesseractneed to be trained, please check the reference below (Chinese version) http://www.cnblogs.com/mjorcen/p/3800739.html
#License Copyright 2016 GCG GBS CTO Office under the Apache 2.0 license.