doc2txt
extract text from epub, pdf and docx
Quick Start
pip install doc2txt
- try with colab
- also see
usage.ipynb
Tooling
- ebooklib for epub
- pypdf for pdf
- python-docx for docx
- BeautifulSoup4 for extracting text from HTML
Questions?
Open a github issue or ping me on Twitter