For extracting the images from a pdf file, i propose this solution. We are
achieving this using PDFBox API.
The PDFBox-0.7.3.jar should be added to the classpath before executing the code.
The method extractImagesFromPDF() has two arguements:
pdfFileName - The complete path of the pdf file from which the images need to be extracted. Ex: C:/test_image.pdf
imagePath - The complete path where the extracted images needs to be stored as .jpeg file. Ex: C:/test/
If the pdf contain list of images, all the images will be extracted and stored as .jpeg files in the given output location. The images will be stored as image1.jpeg, image2.jpeg etc. This can be customized as per the project requirements.
Attached the ImageExtractionFromPDF.zip file which contains sample java program. This can be executed from the command prompt using the following command:
java ImageExtraction <pdfFile complete path> <Path where the extarcted images should be stored>
Ex:
java ImageExtraction C:/sample.pdf C:/test/
Note:
Before executing the java program, PDFBox-0.7.3.jar should be set to the classpath.
Limitations:
Images with standard image format(.jpeg, .jpg, .tiff etc) can be extracted from pdf. Special or proparatory image format cannot be extracted
The PDFBox-0.7.3.jar should be added to the classpath before executing the code.
The method extractImagesFromPDF() has two arguements:
pdfFileName - The complete path of the pdf file from which the images need to be extracted. Ex: C:/test_image.pdf
imagePath - The complete path where the extracted images needs to be stored as .jpeg file. Ex: C:/test/
If the pdf contain list of images, all the images will be extracted and stored as .jpeg files in the given output location. The images will be stored as image1.jpeg, image2.jpeg etc. This can be customized as per the project requirements.
Attached the ImageExtractionFromPDF.zip file which contains sample java program. This can be executed from the command prompt using the following command:
java ImageExtraction <pdfFile complete path> <Path where the extarcted images should be stored>
Ex:
java ImageExtraction C:/sample.pdf C:/test/
Note:
Before executing the java program, PDFBox-0.7.3.jar should be set to the classpath.
Limitations:
Images with standard image format(.jpeg, .jpg, .tiff etc) can be extracted from pdf. Special or proparatory image format cannot be extracted
/** * This method extracts the images from a pdf file. This method is capable of rendering * all the pictures in a pdf file and it will be stored as a jpeg file. * * @param String The pdf file name * @param String The image Path. */ public static void extractImagesFromPDF(String pdfFileName, String imagePath) throws Exception { int count = 1; String pdfFile = pdfFileName; PDDocument document = null; try { document = PDDocument.load(pdfFile); List pages = document.getDocumentCatalog().getAllPages(); Iterator iter = pages.iterator(); while( iter.hasNext() ) { PDPage page = (PDPage)iter.next(); PDResources resources = page.getResources(); Map images = resources.getImages(); if( images != null ) { Iterator imageIter = images.keySet().iterator(); while( imageIter.hasNext() ) { String key = (String)imageIter.next(); PDXObjectImage image = (PDXObjectImage)images.get( key ); String imageFileName = imagePath + "image" + count + ".jpeg"; System.out.println("Writing image:" + imageFileName ); image.write2file( imageFileName ); count++; } } } } finally { if( document != null ) { document.close(); } } }