Share This Article
Have you ever ended up on one of those awful websites where the text is saved as an image, preventing you from selecting or copying it? Or maybe you want to share the text from a comic, screenshot, or meme without painstakingly transcribing it? Now, with the Project Naptha browser extension, you can use your mouse to select the rasterized text from any image that you find on the web, and then paste that text elsewhere. You can also select the text and translate it into another language, or attempt to remove the text entirely (much like Photoshop’s Content-Aware Fill function).
Project Naptha, created by developer Kevin “antimatter15″ Kwok, is essentially a complete text detection and OCR (optical character recognition) suite crammed into a JavaScript browser extension. It isn’t perfect in its functionality, and Kwok admits that it probably lags behind current state-of-the-art tools by “a few years” — but it is definitely good enough and, more importantly, we can’t really complain given that there’s absolutely nothing else like it on the market.
Project Naptha, despite its apparent simplicity, is a very complex piece of software. First, before OCR can be carried out, it has to actually identify whether there are blocks of text in an image. This is quite hard, as the text could be layered on top of any number of backgrounds. To do this, Naptha uses Microsoft Research’s Stroke Width Transform (SWT), a very fast and simple algorithm that relies on the fact that fonts usually have a fairly uniform stroke width, and are thus easy to pick out. Naptha only begins this text detection phase when it detects that your mouse pointer is moving towards an image — it would be very expensive to perform SWT (and later OCR) on every image on the page, after all. Web workers (parallel multiple background threads) are used to perform the text detection without affecting browser performance.
If you then proceed to select some text and hit “Copy Text” (or Ctrl-C), it’s parceled up and sent off to a server running the open-source Ocrad OCR engine. Ocrad tries to translate the rasterized text, which can take a few seconds, and then sends back the digitized characters so that you can then use Paste/Ctrl-V. Ocrad isn’t the best OCR engine out there, but if you right click you can select Google’s much-more-advanced Tesseract engine from the Language menu.
Naptha doesn’t stop there, though. It can go one step beyond OCR and actually translate the text into another language — not only for copy/pasting, but it can actually perform an in-place translation of the text in an existing image (see image above). To do this in-place translation, Naptha uses “inpainting” (think Photoshop’s Content-Aware Fill) to remove the original text, and then attempts to match the font for the translated text. Alternatively, instead of using this function for translation, you can just use Naptha’s inpainting to remove text from images.
Moving forward, it isn’t entirely clear what Kwok’s intentions for Project Naptha are. The extension requires remote computing power that Kwok has to pay for, and services like Tesseract and Google Translate charge per use. Naptha’s functionality is so unique and useful that I’m sure there are plenty of people out there who would pay a small amount of money for it, though.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.