In the previous article we introduced Robotic Process Automation (RPA) illustrating the first activities to perform to robotise a business process.
Let’s now focus on the components allowing the robot to read textual information from PDF documents, that will then be processed through algorithms. These algorithms process strings with the programming languages the chosen platform makes available, considering the selected PDF document’s structure.
Automating text reading and writing
When talking about structure, we also refer to the page and line where the strings to acquire are located along with their placement within the line and their format, which may be fixed as for dates and tax code, or variable. For example, to identify the first and last name located in the same line and before the fiscal code, the algorithm will first detect the fiscal code having a fixed format and then it will consider all the strings before it as part of the first and last name, thanks to operators who turn a string into an array of strings, concatenating strings and extracting sub-strings from additional strings.
In addition, you may need to capture non-textual information, such as images, from a PDF document. To do so, operators who locate the image are available, but care must be taken to the resolution of the screen, as a variation would lead the robot not to locate it correctly.
There are also components that write and read information in an Excel relying on the use of data structure that must be configured to exactly match rows and columns in tables. You can also read and write information to Microsoft Word, send e-mail messages using Outlook or SMTP servers, and configure the e-mail server you want to use.
Extracting textual information from images
A further case is represented by the extraction of textual information from images. To do so, companies may use components that integrate with the major OCR engines on the market. Extracted textual information has a format that differs depending on the images used in input, which are processed by implementing algorithms that use operators necessary for string extraction.
For the integration between the robot and other customised software products, the platforms available on the market provide components that rely on protocols such as REST and SOAP for message exchange. Of course, it is necessary to process these messages and use the correct data structures to manage the information they contain and that are needed in the ongoing process of robotisation. For example, in order to extract the fields that are necessary for the robot, available operators need to process the messages in REST format. Besides, you can always have connectivity issues that need to be fixed with the support of sys admins who manage network infrastructures, configuring the correct firewall rules.
RPA expert: a requested figure
Although the platforms available on the market provide tools to easily implement robots, we still need process analysis skills to robotise, to design an efficient architecture where the robot must operate, to code and test.
Thanks to the results that process robotisation has in terms of improving the efficiency and quality of business processes, RPA represents an ever-growing trend and the professional figure of RPA expert in the phases of analysis of processes to be automated, architecture design, implementation, testing and monitoring is a highly sought after figure.