Using Workflow to Perform OCR

Microsoft’s Cognitive Services (née Project Oxford) is an interesting collection of APIs that leverage machine learning to determine useful information about any provided data. One useful example of the APIs available is the Computer Vision API. It offers OCR functionality that detects any readable text within an image and outputs the results as plain text. Thanks to Workflow’s new API support, we can harness the power of machine learning to perform some very quick, and very accurate, text recognition.

This workflow prompts you to either select a photo or take one with the camera, then uses it to make an appropriate API request. The detected text is returned back and displayed as plain text that can be easily shared or copied to the clipboard.

OCR of a Notes screenshotOCR of a photo taken of a page in a book

Cool, right? Well it gets even better. The Computer Vision API can OCR images containing text in different languages so Workflow can also translate the detected text into our chosen language. The workflow includes a Translate Text action so you can try this for yourself.

After the OCR process, text can be translated within Workflow

This is really useful, especially when traveling. For instance, if you’re on vacation and are wondering what a street sign means, just run this workflow and take a photo, and you’ll get a translated version of the text.

Similar to other API services, an API key is needed to authenticate your requests. Just register for the service and copy/paste the appropriate subscription key for Computer Vision. There are also some size limits (both file and resolution) you need to consider so I recommend reading through the relevant documentation.