To stay competitive in today’s market, businesses are increasingly relying on data-driven approaches, and many are hitting the same roadblock: historically, all of their data was printed, and is trapped in paper format.
Traditional transcription is expensive, time-consuming, and error-prone, so many organizations are turning to machine learning solutions to keep costs down and improve accuracy as they accelerate their digital transformations.
We’d like to help.
The Sparkfish team has been hard at work the past few weeks, and we’re excited to finally tell you about Augraphy, a library we use internally to generate training data for our machine learning OCR projects.
Plenty of other excellent image augmentation libraries exist, but most are focused on general image transformations, like adding a blur effect or compression artifacts.
Augraphy specializes in generating visually realistic text documents, with problems commonly encountered in business. To name a few:
- the document wasn’t correctly aligned with the scanner bed, and uneven dark borders appeared on the copy,
- before copying, the document was folded, and a crease appeared in the output,
- the printer was running low on ink, and parts of the text are lighter than others,
- too much ink was laid by the printer, and text on the reverse side is visible through the page,
- the print shop was dusty, and little flecks are visible in the ink.
Here’s a visual example of the power of Augraphy: after running the default Augraphy pipeline over the source image, we receive some new images that look like our source, printed on different paper material and by different machines with common problems.
First, the source image, a sample invoice letter from Apple Pages.
Here we have a print onto something like receipt or triplicate paper, with areas of low ink, lines that should be filled in, and fuzzy, lower-resolution text.
Augraphy can also “print” on entirely different surfaces, like this hemp-like texture.
If you want to leverage the power of Augraphy in your own business, you can point your engineers to the project on our GitHub page.
Need help? Experts on the Sparkfish team are ready to help you transform your business.