Welcome to pdfminer.six’s documentation!

Pdfminer.six is a python package for extracting information from PDF documents.

  • Parse all objects from a PDF document into Python objects.
  • Analyze and group text in a human-readable way.
  • Extract text, images (JPG, JBIG2 and Bitmaps), table-of-contents, tagged contents and more.
  • Support for (almost all) features from the PDF-1.7 specification
  • Support for Chinese, Japanese and Korean CJK) languages as well as vertical writing.
  • Support for various font types (Type1, TrueType, Type3, and CID).
  • Support for RC4 and AES encryption.

Installation instructions

Before using it, you must install it using Python 3.4 or newer.

$ pip install pdfminer.six

Common use-cases


We welcome any contributors to pdfminer.six! But, before doing anything, take a look at the contribution guide.