Welcome to pdfminer.six’s documentation!

Travis-ci build badge PyPi version badge gitter badge

We fathom PDF.

Pdfminer.six is a python package for extracting information from PDF documents.

Check out the source on github.


This documentation is organized into four sections (according to the Diátaxis documentation framework). The Tutorials section helps you setup and use pdfminer.six for the first time. Read this section if this is your first time working with pdfminer.six. The How-to guides offers specific recipies for solving common problems. Take a look at the Topics if you want more background information on how pdfminer.six works internally. The API Reference provides detailed api documentation for all the common classes and functions in pdfminer.six.


  • Parse all objects from a PDF document into Python objects.
  • Analyze and group text in a human-readable way.
  • Extract text, images (JPG, JBIG2 and Bitmaps), table-of-contents, tagged contents and more.
  • Support for (almost all) features from the PDF-1.7 specification
  • Support for Chinese, Japanese and Korean CJK) languages as well as vertical writing.
  • Support for various font types (Type1, TrueType, Type3, and CID).
  • Support for RC4 and AES encryption.
  • Support for AcroForm interactive form extraction.

Installation instructions

Before using it, you must install it using Python 3.6 or newer.

$ pip install pdfminer.six

Optionally install extra dependencies that are needed to extract jpg images.

$ pip install 'pdfminer.six[image]'


We welcome any contributors to pdfminer.six! But, before doing anything, take a look at the contribution guide.