"Fossies" - the Fresh Open Source Software Archive  

Source code changes of the file "src/ocrmypdf/_sync.py" between
OCRmyPDF-9.5.0.tar.gz and OCRmyPDF-9.6.0.tar.gz

About: OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched.

_sync.py  (OCRmyPDF-9.5.0):_sync.py  (OCRmyPDF-9.6.0)
skipping to change at line 227 skipping to change at line 227
print("Logging problem", file=sys.stderr) print("Logging problem", file=sys.stderr)
traceback.print_exc(file=sys.stderr) traceback.print_exc(file=sys.stderr)
def exec_concurrent(context): def exec_concurrent(context):
"""Execute the pipeline concurrently""" """Execute the pipeline concurrently"""
# Run exec_page_sync on every page context # Run exec_page_sync on every page context
max_workers = min(len(context.pdfinfo), context.options.jobs) max_workers = min(len(context.pdfinfo), context.options.jobs)
if max_workers > 1: if max_workers > 1:
context.log.info("Start processing %d pages concurrent", max_workers) context.log.info("Start processing %d pages concurrently", max_workers)
# Tesseract 4.x can be multithreaded, and we also run multiple workers. We w ant # Tesseract 4.x can be multithreaded, and we also run multiple workers. We w ant
# to manage how many threads it uses to avoid creating total threads than co res. # to manage how many threads it uses to avoid creating total threads than co res.
# Performance testing shows we're better off # Performance testing shows we're better off
# parallelizing ocrmypdf and forcing Tesseract to be single threaded, which we # parallelizing ocrmypdf and forcing Tesseract to be single threaded, which we
# get by setting the envvar OMP_THREAD_LIMIT to 1. But if the page count of the # get by setting the envvar OMP_THREAD_LIMIT to 1. But if the page count of the
# input file is small, then we allow Tesseract to use threads, subject to th e # input file is small, then we allow Tesseract to use threads, subject to th e
# constraint: (ocrmypdf workers) * (tesseract threads) <= max_workers. # constraint: (ocrmypdf workers) * (tesseract threads) <= max_workers.
# As of Tesseract 4.1, 3 threads is the most effective on a 4 core/8 thread system. # As of Tesseract 4.1, 3 threads is the most effective on a 4 core/8 thread system.
tess_threads = min(3, context.options.jobs // max_workers) tess_threads = min(3, context.options.jobs // max_workers)
 End of changes. 1 change blocks. 
1 lines changed or deleted 1 lines changed or added

Home  |  About  |  Features  |  All  |  Newest  |  Dox  |  Diffs  |  RSS Feeds  |  Screenshots  |  Comments  |  Imprint  |  Privacy  |  HTTP(S)