File Processing - What Happens When I Upload a File?
Below is a list of all the processing actions applied to your documents upon upload.
- Virus scan - each document is scanned, and quarantined if a virus is found.
- Archive analysis - ZIP files are analysed to detect password-protected content (only applies to zip files)
- Timestamp extraction - Any timestamps relevant to the document are extracted and stored.
- Page count - The page count of the document is calculated.
- Checksum calculation - MD5 checksums are calculated for each document.
- Conversion to PDF - Word documents, Spreadsheets, Slideshows, Text documents and emails are converted to PDF.
- Rasterisation - If required, images of each page of the document are generated.
- Native text extraction - Text is extracted from the document's native format for indexing, where possible.
- OCR - Optical Character Recognition is applied to extract text for scanned or imaged documents if required.
- Indexing - The extracted text and any metadata is added to a full-text index to enable searching