Extract text from image or pdf

less than 1 minute read

How can you effectively extract text from a pdf or an image ? commmonly called OCR (optical character recognition). I found 2 extremly powerfull tools based on the open source engine Tesseract (Official website).

I am using windows and can be both used on this OS. One permit to convert scanned pdf to searchable pdf (as well as copiable). The other permit to get a screenshot from an area of your screen, convert it to text and store it in your clipboard.

Ocrmypdf
- you need to use Ubuntu on windows more info here
- update your apt: sudo apt-get update
- install it: sudo apt install ocrmypdf
- check the documentation for the cmds
  - here an easy example for frencg pdf: ocrmypdf -l fra "input.pdf" "output.pdf"
- To install new languages (for Ubuntu)
  - check which exists: apt-cache search tesseract-ocr
  - install what you need: sudo apt-get install tesseract-ocr-fra
normcap
- easy to install, just use the exe

Have a try :)

Share on

X Facebook LinkedIn Bluesky

Compare 2 files and get the output colored from icdiff in html

less than 1 minute read

I already posted the icdiff tool to compare 2 files.

Fix chrome to open http or local link, file in a new tab when you use scoop

less than 1 minute read

Open chrome Use process explorer with the target to find which exe it is and check the path - Should be something like this : - “C:\Users\u...

Html table with images to xlsx with python

1 minute read

How to convert a html table with images to xlsx with those images included. Here you have to adapt your code with input output (here info.html and info.xlsx).

Add margin to epub manga images for the ebook Sony PRS-T2

1 minute read

I have an old Sony PRS-T2 which works perfectly. I wanted to put some manga on it but there is this bar on the bottom with the page number that is hidding pa...

Dorian Gravier

Extract text from image or pdf

Share on

You May Also Enjoy

Compare 2 files and get the output colored from icdiff in html

Fix chrome to open http or local link, file in a new tab when you use scoop

Html table with images to xlsx with python

Add margin to epub manga images for the ebook Sony PRS-T2