Community ForumCategory: GeneralTables in PDFs
Mauricio Micoski asked 1 week ago

I am testing pdfalchemist.exe to extract text information from a PDF file that contains tables. The convertion to html just writes the table as an image; the convertion to xml showed the text from the table, but:

  • it misses the line breaks;
  • it misses the columns alignments when there are empty columns.

Is there a way to address these two issues?

Datalogics Support Staff replied 4 days ago

What version of PDF Alchemist are you using and on what platform?
I assume the problems are unique to a PDF file. Can you describe the type of table that has this problem (size, complexity). Does it span pages?

Datalogics Support Staff replied 4 days ago

Also are you using any of the OCR options? Can you show us the XML output for a line or two of the table.