Our AI based intelligent document processing API is used to process documents and extract meaningful data that you can use to create your own datasets. It’s used by us also to process both financial and annual reports that companies are submitting to registration authorities.

You can read about the technology we are using and how it works here.

Folders are used to group documents togheter that can be optionally linked to a company. Our API contains public folders that are accessible by all users and private folders that belongs to one or more users.

All folders that you create is private and cannot be reached by “everyone”. But you can still share folders inside your team by giving them access to the account that has ownership of the folder.

Folders

Use folders to group documents togheter. Each folder belongs to an account (owner) and could contain one or many files.

Supported formats

Our IDP can process the following formats:

  • PDF
  • TIFF including multi-page TIFF
  • PNG

We recommend minimum resolution of 300 dpi.

Preprocessing

Our API will take care of preprocessing the documents and their images while maintaining the source resolution. PDF files containing images will be extracted maintaining their original format and compression. If the source file contain 95% or more grayscale pixels the destination file will always be created in grayscale.

Summary of processing events:

  • Extract images
  • Remove metadata
  • Rotate if needed
  • Deskew pages
  • Remove punch holes
  • Remove border lines
  • Adjust for page margins
  • Remove page artifacts (arises from scanners and smudges)
  • Enhance text

Documents

You can list, delete, upload and download documents. Each document can be associated with a company. When you upload a document you can pick four different tranformations.

transformToEntity

This will transform each page in the file to a set of understandable structured entities in JSON format. Think like an Excel sheet. You will get each data in columns and rows. It deals with the complexity around borderless and bordered tables.

transformToSearchablePDF

This will transform the source file to a searchable visually pleasing PDF adhering to PDF version 1.4 and linearized.

transformToOCR

An hOCR result will be provided than can be used to feed a search database that contains the location of text boxes and the contents of the text box. Also a full string representing the content of the page is included.

transformToMetadata

An json file will be provided containing the original metadata of the file. It will for example contain format and resolution of images in PDF files.

Searchable PDF

When you upload a PDF or when a PDF is created from images all metadata will be removed when a new searchable PDF is created. The new PDF will be linearized which means they are optimized to be viewed on mobil and desktop apps by enabling the viewer to incrementally download the pages.

How to upload a file

To upload a file you use a multipart form post as the following simple cURL shows:

cURL
curl  \
--request POST \
--header "Content-Type: multipart/form-data" \
--header "x-api-key:your_api_key" \
--form DocumentTransformType=" \
      transformToEntity, \
      transformToSearchablePDF, \
      transformToOCR" \
--form DocumentType=annualReport \
--form MimeType="application/pdf" \
--form Description="Account Report 2022-03" \
--form ExternalId1=123456 \
--form CompanyId=2692467 \
--form file="@2692467_202104_202203.pdf" \
https://api.tic.io/folders/1/documents

You will receive the stored document id and the status of the transformation. Currentley you need to manually check back for the result. A webhook feature is planned to get the status posted back.

Response
{
  "documentId": 1068,
  "documentFolderId": 1,
  "originsFromDocumentId": null,
  "companyId": 2692467,
  "externalId1": "123456",
  "externalId2": null,
  "description": "Account Report 2022-03",
  "documentType": "annualReport",
  "fileName": "2692467_202104_202203.pdf",
  "fileMimeType": "application/pdf",
  "documentTransformType": "transformToSearchablePDF, transformToOCR, transformToEntity",
  "documentTransformStatusType": "requested",
  "documentFileSystemType": "orginal",
  "isDeleted": null,
  "lastUpdatedAtUtc": "2024-07-15T10:29:52"
}