More relevantly, TIFF supports many alternative compression methods of which XnConvert and NConvert support at least six. The TIFF format, which generally produces very large file sizes for colour and grayscsle images even with optimum compression, can equally produce very small file sizes for black and white images when optimum compression methods used. I don't in any case think there would be much scope for optimising the ratio within the JPG format, although I might be wrong. I've done a quick text using two programs and both saved a 1-bit test image as a JPG with an 8-bit colour depth. When saving scans the optimum choice of output format and compression method can do much to reduce the file size, important for a scan of a 100-page or more document for example, but optimising the ratio between the file sizes of blank and marked images is an unusual requirement.ĭifferent file formats support different colour modes and compression methods so the choice is partly narrowed down by the fact that the images are black and white (although I suppose in principle they could be converted to grayscale or colour if there were an advantage in doing so).Īre you sure that your images are currently 1-bit colour depth, even if the scans are of black and white material: I don't think the JPG format supports black and white images. If you think it would be helpful you might post one or more sample PDF files for examination. If reading file sizes programatically isn't possible on a Windows computer - I images it likely is - it might well be possible in Linux. Which operation system and version are you using? The NConvert part of the script might possibly be developed in and exported from XnConvert. If the difference in file size is to be maximised, the colour-mode of the images would be a consideration as would the optimum type of file compression to be used.Īutomating the separation of blank and marked files would clearly require a way of reading the file sizes in software and is outside my immediate experience.Ī script to automate the overall process, if that is found to be practical, would call NConvert to open the PDF file, perform the crop operation, and save the resulting image in a suitable format, and would also likely need to call another utility to determine the resulting file sizes to separate blank and marked areas. It seems likely that blank or near-blank areas and reasonably marked areas would produce file sizes that could be readily distinguished by eye in a file listing, without much consideration of the file format. The Crop function, if it can be used, allows precise positioning of the crop area through the use of pixel dimensions the Canvas resize function as stated above may not allow such precise control.Ĭan the blank and marked areas be separated manually, if necessary? How accurately does the crop area need to be positioned? Note that 'auto-crop' normally refers to the automatic determination of the crop area, for instance to crop white space from around a rectangular area of text by automatically determining the position of the text on the page. However, the crop position is specified in a different way that allows less control. XnConvert and NConvert also have a Canvas resize function which is in effect a second crop function: that accepts a crop size specified as a percentage of the image dimensions and so could be used to make a similar crop in images with different pixel dimensions provided the aspect ratio is the same. The Crop function in XnConvert and NConvert only accepts crop positions and crop sizes specified in pixels if the images have different pixel dimensions it is unlikely to be possible to use the same settings to crop a batch of images using the Crop function. That may affect the optimum way to handle the cropped images.Īre the pixel dimensions of each image the same?Īre their dimensions identical or near-identical? That requires that the images are all the same shape or aspect ratio, for example 2:1 landscape orientation, so if they aren't the pixel dimensions of the images clearly aren't the same. PDF is a very versatile format and a file can contain bitmap images (as from a camera or scanner), vector graphics (a form of drawing or illustration) and of course text that can be rescaled for optimum viewing, as well as non-visual content.Įdit: In the probably unlikely event that the images held in the files are not bitmaps, they will be converted to bitmaps when they is rasterised at the current dpi setting as the files are opened: the pixel dimensions of the resulting bitmaps may be significant for later processing.ĭoes each PDF file contain a single image?Ī PDF file can of course contain multiple images or pages.Īre the images to be processed colour, grayscale or black and white? The following may help to define the process you wish to implement:ĭo the PDF files to be processed contain bitmap images?
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |