Smart OCR: Why we built our own reliable, cost-effective OCR solution

Smart OCR: Why we built our own reliable, cost-effective OCR solution

How we built smart OCR to unlock data trapped in scanned documents and set a new standard for automation

How we built smart OCR to unlock data trapped in scanned documents and set a new standard for automation

How we built smart OCR to unlock data trapped in scanned documents and set a new standard for automation

Why we built our own reliable, cost-effective OCR solution at Agilytic

Agilytic helps companies turn raw data into insights they can act on. Over the years, we’ve noticed an ongoing challenge across industries: unlocking the data trapped in paper documents, scanned PDFs, and image files. A new client might bring us receipts, invoices, or HR records that exist only as scanned pages, making automation difficult.

We decided to build our own internal solution for extracting structured information from these documents. Our goal was simple: process nearly any file reliably, cost-effectively, and with minimal human oversight. This post explores why we developed an internal optical character recognition (OCR) tool (Smart OCR), how it works, and why we believe it represents a new standard in document processing.

Bridging the gap between paper and data

Organizations often handle hundreds or even thousands of documents every day. These documents range from invoices and employee forms to receipts and contracts. Manually entering information from each file is time-consuming and prone to errors. Worse, many existing third-party solutions lack certain features—like custom templating, multilingual support, or consistent reliability at scale.

At Agilytic, we wanted to put users in control. We envisioned a tool that would quickly read text from any file (think PDF, PNG, or a scanned image) and then structure the content according to a customized layout or “template.” By doing that, clients can easily export the processed data into the systems they already use for finance, HR, or analytics.

How smart OCR works

Smart OCR starts by extracting text from a document. We rely on powerful open-source libraries like PaddleOCR, which has shown strong performance on numeric data and multilingual text. Our solution then enriches the raw text with structural cues (for example, detecting tables or headings). Finally, we use a large language model (LLM) to fill a predefined template. We can either tailor that template to match a client’s exact needs or users can easily make their own template.

One of our team members described it best in an internal demo: “It doesn’t require any specific skills to operate—it’s in the cloud, secure, and reliable using the Azure version of OpenAI.” This simple approach means the user can drag and drop files, pick a template (or generate a new one), and let the system handle the rest.

An inside look at the technology

  1. OCR extraction

    We start with PaddleOCR to recognize text within images. PaddleOCR stands out when dealing with handwritten elements or complex layouts. By design, it is more robust with numeric data than older methods.

  2. Structuring the content

    After extracting text, we enrich it with structural markers: headings, tables, and paragraphs. We accomplish this through a “structuring model” that converts the recognized text into a lightweight HTML-like representation. HTML has historically been used by websites to help organize and structure how their content is displayed. If the original document had a table, we want to keep that table intact in the structured result.

  3. prompting a large language model

    The HTML-like content, paired with a user-defined template, is then passed to a large language model (LLM). The LLM uses only the extracted text to fill in the requested fields (for instance, “client name” or “invoice total”). It ignores anything irrelevant. If the requested information is not in the document, the tool simply leaves that field blank.

  4. customizing output

    Our web-based front end makes it easy to change or create new templates. As a result, each user can define which fields matter most. For instance, a manufacturing client might capture part numbers and quantities, while a human resources team could focus on ID numbers and salaries.

Why we built it in-house

Plenty of ready-made solutions exist for document processing, so why did Agilytic decide to build its own? We found that third-party tools often fell short in one or more of these areas:

  • reliability: Some tools used older OCR engines that struggled with numbers or complex layouts.

  • cost-effectiveness: Certain solutions relied on large, multimodal LLMs, driving up costs. Clients might pay several cents to process a single page, which adds up fast.

  • multilingual support: In Belgium and throughout Europe, documents can appear in French, Dutch, English, or other languages. Many standard tools handle only a small subset of languages well.

  • data control: We wanted to keep data secure and avoid sending sensitive content to external services with uncertain privacy measures.

By creating our own platform, we could tune each piece to meet the reliability and cost requirements we hold ourselves to. We use more efficient large language models for text structuring, which keeps cost to a fraction of a cent per document in many cases. We also ensure everything runs securely in the Microsoft Azure environment, where privacy and data residency are clear and well defined.

Key benefits for businesses

We designed Smart OCR to let organizations do more with less. Here are a few of the advantages our clients have reported:

  • High reliability. By combining strong OCR libraries with LLM-based text structuring, we consistently deliver reliable outputs. If the requested information is missing, our tool will not try to invent it.

  • Cost-effectiveness. We use relatively small language models optimized for this specific task. Sending data to an entire multimodal model like GPT-4 Vision can be ten to a hundred times more expensive.

  • Ease of use. Users simply upload a file and select a template. No coding or specialized software is required. A debugging tab is available for advanced users who want to see how the process unfolds.

  • Customizability. Our solution supports new templates on the fly. If you need to capture details unique to a specific industry or form type, you can create a new template in minutes.

Beyond OCR: the value of in-house tools

Building Smart OCR internally is part of a broader approach at Agilytic. We believe that certain technologies—especially those fundamental to secure, accurate data processing—should live in-house. That ensures we can innovate at our own pace and adapt swiftly to client requests.

More important, our teams can collaborate closely with clients to refine these solutions. We are not restricted by rigid vendor roadmaps or licensing models. If a project calls for advanced text analytics, we can fold those capabilities into the pipeline without waiting for external providers to prioritize them.

A glimpse at the real-world impact

During early trials, we used Smart OCR on multiple scanned invoices. Traditional OCR tools struggled with missing fonts, skewed pages, or faint text. Our approach accurately identified item names, amounts, and taxes in seconds. One client with a heavy paper workflow saw a notable drop in manual data entry—and fewer human errors.

We also tested the system on payslips from different countries. By switching to the right template, we extracted relevant fields regardless of the document’s layout or language. Because of the tool’s flexibility, our client saved both time and money, all while gaining the confidence that the extracted data was accurate.

Future outlook

We are always refining Smart OCR. Future updates may include faster handling of documents with dozens of pages, advanced layout analysis, or deeper integration with our analytics pipelines. We also aim to keep the cost near zero for each processed document, empowering organizations to scale up document automation without worrying about ballooning fees.

Although we began developing this tool more than a year ago, we designed it to be evergreen. Its modular design and cloud-based deployment allow for continuous improvements, no matter how the underlying OCR or language modeling technologies evolve.

Let’s talk about possibilities

Does Smart OCR sound like it could help your organization streamline document processing? We would love to learn more about your workflow and show you how our solution can be tailored to your needs.

Think this tool could help you? Let’s talk about possibilities. Schedule a phone call.

Why we built our own reliable, cost-effective OCR solution at Agilytic

Agilytic helps companies turn raw data into insights they can act on. Over the years, we’ve noticed an ongoing challenge across industries: unlocking the data trapped in paper documents, scanned PDFs, and image files. A new client might bring us receipts, invoices, or HR records that exist only as scanned pages, making automation difficult.

We decided to build our own internal solution for extracting structured information from these documents. Our goal was simple: process nearly any file reliably, cost-effectively, and with minimal human oversight. This post explores why we developed an internal optical character recognition (OCR) tool (Smart OCR), how it works, and why we believe it represents a new standard in document processing.

Bridging the gap between paper and data

Organizations often handle hundreds or even thousands of documents every day. These documents range from invoices and employee forms to receipts and contracts. Manually entering information from each file is time-consuming and prone to errors. Worse, many existing third-party solutions lack certain features—like custom templating, multilingual support, or consistent reliability at scale.

At Agilytic, we wanted to put users in control. We envisioned a tool that would quickly read text from any file (think PDF, PNG, or a scanned image) and then structure the content according to a customized layout or “template.” By doing that, clients can easily export the processed data into the systems they already use for finance, HR, or analytics.

How smart OCR works

Smart OCR starts by extracting text from a document. We rely on powerful open-source libraries like PaddleOCR, which has shown strong performance on numeric data and multilingual text. Our solution then enriches the raw text with structural cues (for example, detecting tables or headings). Finally, we use a large language model (LLM) to fill a predefined template. We can either tailor that template to match a client’s exact needs or users can easily make their own template.

One of our team members described it best in an internal demo: “It doesn’t require any specific skills to operate—it’s in the cloud, secure, and reliable using the Azure version of OpenAI.” This simple approach means the user can drag and drop files, pick a template (or generate a new one), and let the system handle the rest.

An inside look at the technology

  1. OCR extraction

    We start with PaddleOCR to recognize text within images. PaddleOCR stands out when dealing with handwritten elements or complex layouts. By design, it is more robust with numeric data than older methods.

  2. Structuring the content

    After extracting text, we enrich it with structural markers: headings, tables, and paragraphs. We accomplish this through a “structuring model” that converts the recognized text into a lightweight HTML-like representation. HTML has historically been used by websites to help organize and structure how their content is displayed. If the original document had a table, we want to keep that table intact in the structured result.

  3. prompting a large language model

    The HTML-like content, paired with a user-defined template, is then passed to a large language model (LLM). The LLM uses only the extracted text to fill in the requested fields (for instance, “client name” or “invoice total”). It ignores anything irrelevant. If the requested information is not in the document, the tool simply leaves that field blank.

  4. customizing output

    Our web-based front end makes it easy to change or create new templates. As a result, each user can define which fields matter most. For instance, a manufacturing client might capture part numbers and quantities, while a human resources team could focus on ID numbers and salaries.

Why we built it in-house

Plenty of ready-made solutions exist for document processing, so why did Agilytic decide to build its own? We found that third-party tools often fell short in one or more of these areas:

  • reliability: Some tools used older OCR engines that struggled with numbers or complex layouts.

  • cost-effectiveness: Certain solutions relied on large, multimodal LLMs, driving up costs. Clients might pay several cents to process a single page, which adds up fast.

  • multilingual support: In Belgium and throughout Europe, documents can appear in French, Dutch, English, or other languages. Many standard tools handle only a small subset of languages well.

  • data control: We wanted to keep data secure and avoid sending sensitive content to external services with uncertain privacy measures.

By creating our own platform, we could tune each piece to meet the reliability and cost requirements we hold ourselves to. We use more efficient large language models for text structuring, which keeps cost to a fraction of a cent per document in many cases. We also ensure everything runs securely in the Microsoft Azure environment, where privacy and data residency are clear and well defined.

Key benefits for businesses

We designed Smart OCR to let organizations do more with less. Here are a few of the advantages our clients have reported:

  • High reliability. By combining strong OCR libraries with LLM-based text structuring, we consistently deliver reliable outputs. If the requested information is missing, our tool will not try to invent it.

  • Cost-effectiveness. We use relatively small language models optimized for this specific task. Sending data to an entire multimodal model like GPT-4 Vision can be ten to a hundred times more expensive.

  • Ease of use. Users simply upload a file and select a template. No coding or specialized software is required. A debugging tab is available for advanced users who want to see how the process unfolds.

  • Customizability. Our solution supports new templates on the fly. If you need to capture details unique to a specific industry or form type, you can create a new template in minutes.

Beyond OCR: the value of in-house tools

Building Smart OCR internally is part of a broader approach at Agilytic. We believe that certain technologies—especially those fundamental to secure, accurate data processing—should live in-house. That ensures we can innovate at our own pace and adapt swiftly to client requests.

More important, our teams can collaborate closely with clients to refine these solutions. We are not restricted by rigid vendor roadmaps or licensing models. If a project calls for advanced text analytics, we can fold those capabilities into the pipeline without waiting for external providers to prioritize them.

A glimpse at the real-world impact

During early trials, we used Smart OCR on multiple scanned invoices. Traditional OCR tools struggled with missing fonts, skewed pages, or faint text. Our approach accurately identified item names, amounts, and taxes in seconds. One client with a heavy paper workflow saw a notable drop in manual data entry—and fewer human errors.

We also tested the system on payslips from different countries. By switching to the right template, we extracted relevant fields regardless of the document’s layout or language. Because of the tool’s flexibility, our client saved both time and money, all while gaining the confidence that the extracted data was accurate.

Future outlook

We are always refining Smart OCR. Future updates may include faster handling of documents with dozens of pages, advanced layout analysis, or deeper integration with our analytics pipelines. We also aim to keep the cost near zero for each processed document, empowering organizations to scale up document automation without worrying about ballooning fees.

Although we began developing this tool more than a year ago, we designed it to be evergreen. Its modular design and cloud-based deployment allow for continuous improvements, no matter how the underlying OCR or language modeling technologies evolve.

Let’s talk about possibilities

Does Smart OCR sound like it could help your organization streamline document processing? We would love to learn more about your workflow and show you how our solution can be tailored to your needs.

Think this tool could help you? Let’s talk about possibilities. Schedule a phone call.

Why we built our own reliable, cost-effective OCR solution at Agilytic

Agilytic helps companies turn raw data into insights they can act on. Over the years, we’ve noticed an ongoing challenge across industries: unlocking the data trapped in paper documents, scanned PDFs, and image files. A new client might bring us receipts, invoices, or HR records that exist only as scanned pages, making automation difficult.

We decided to build our own internal solution for extracting structured information from these documents. Our goal was simple: process nearly any file reliably, cost-effectively, and with minimal human oversight. This post explores why we developed an internal optical character recognition (OCR) tool (Smart OCR), how it works, and why we believe it represents a new standard in document processing.

Bridging the gap between paper and data

Organizations often handle hundreds or even thousands of documents every day. These documents range from invoices and employee forms to receipts and contracts. Manually entering information from each file is time-consuming and prone to errors. Worse, many existing third-party solutions lack certain features—like custom templating, multilingual support, or consistent reliability at scale.

At Agilytic, we wanted to put users in control. We envisioned a tool that would quickly read text from any file (think PDF, PNG, or a scanned image) and then structure the content according to a customized layout or “template.” By doing that, clients can easily export the processed data into the systems they already use for finance, HR, or analytics.

How smart OCR works

Smart OCR starts by extracting text from a document. We rely on powerful open-source libraries like PaddleOCR, which has shown strong performance on numeric data and multilingual text. Our solution then enriches the raw text with structural cues (for example, detecting tables or headings). Finally, we use a large language model (LLM) to fill a predefined template. We can either tailor that template to match a client’s exact needs or users can easily make their own template.

One of our team members described it best in an internal demo: “It doesn’t require any specific skills to operate—it’s in the cloud, secure, and reliable using the Azure version of OpenAI.” This simple approach means the user can drag and drop files, pick a template (or generate a new one), and let the system handle the rest.

An inside look at the technology

  1. OCR extraction

    We start with PaddleOCR to recognize text within images. PaddleOCR stands out when dealing with handwritten elements or complex layouts. By design, it is more robust with numeric data than older methods.

  2. Structuring the content

    After extracting text, we enrich it with structural markers: headings, tables, and paragraphs. We accomplish this through a “structuring model” that converts the recognized text into a lightweight HTML-like representation. HTML has historically been used by websites to help organize and structure how their content is displayed. If the original document had a table, we want to keep that table intact in the structured result.

  3. prompting a large language model

    The HTML-like content, paired with a user-defined template, is then passed to a large language model (LLM). The LLM uses only the extracted text to fill in the requested fields (for instance, “client name” or “invoice total”). It ignores anything irrelevant. If the requested information is not in the document, the tool simply leaves that field blank.

  4. customizing output

    Our web-based front end makes it easy to change or create new templates. As a result, each user can define which fields matter most. For instance, a manufacturing client might capture part numbers and quantities, while a human resources team could focus on ID numbers and salaries.

Why we built it in-house

Plenty of ready-made solutions exist for document processing, so why did Agilytic decide to build its own? We found that third-party tools often fell short in one or more of these areas:

  • reliability: Some tools used older OCR engines that struggled with numbers or complex layouts.

  • cost-effectiveness: Certain solutions relied on large, multimodal LLMs, driving up costs. Clients might pay several cents to process a single page, which adds up fast.

  • multilingual support: In Belgium and throughout Europe, documents can appear in French, Dutch, English, or other languages. Many standard tools handle only a small subset of languages well.

  • data control: We wanted to keep data secure and avoid sending sensitive content to external services with uncertain privacy measures.

By creating our own platform, we could tune each piece to meet the reliability and cost requirements we hold ourselves to. We use more efficient large language models for text structuring, which keeps cost to a fraction of a cent per document in many cases. We also ensure everything runs securely in the Microsoft Azure environment, where privacy and data residency are clear and well defined.

Key benefits for businesses

We designed Smart OCR to let organizations do more with less. Here are a few of the advantages our clients have reported:

  • High reliability. By combining strong OCR libraries with LLM-based text structuring, we consistently deliver reliable outputs. If the requested information is missing, our tool will not try to invent it.

  • Cost-effectiveness. We use relatively small language models optimized for this specific task. Sending data to an entire multimodal model like GPT-4 Vision can be ten to a hundred times more expensive.

  • Ease of use. Users simply upload a file and select a template. No coding or specialized software is required. A debugging tab is available for advanced users who want to see how the process unfolds.

  • Customizability. Our solution supports new templates on the fly. If you need to capture details unique to a specific industry or form type, you can create a new template in minutes.

Beyond OCR: the value of in-house tools

Building Smart OCR internally is part of a broader approach at Agilytic. We believe that certain technologies—especially those fundamental to secure, accurate data processing—should live in-house. That ensures we can innovate at our own pace and adapt swiftly to client requests.

More important, our teams can collaborate closely with clients to refine these solutions. We are not restricted by rigid vendor roadmaps or licensing models. If a project calls for advanced text analytics, we can fold those capabilities into the pipeline without waiting for external providers to prioritize them.

A glimpse at the real-world impact

During early trials, we used Smart OCR on multiple scanned invoices. Traditional OCR tools struggled with missing fonts, skewed pages, or faint text. Our approach accurately identified item names, amounts, and taxes in seconds. One client with a heavy paper workflow saw a notable drop in manual data entry—and fewer human errors.

We also tested the system on payslips from different countries. By switching to the right template, we extracted relevant fields regardless of the document’s layout or language. Because of the tool’s flexibility, our client saved both time and money, all while gaining the confidence that the extracted data was accurate.

Future outlook

We are always refining Smart OCR. Future updates may include faster handling of documents with dozens of pages, advanced layout analysis, or deeper integration with our analytics pipelines. We also aim to keep the cost near zero for each processed document, empowering organizations to scale up document automation without worrying about ballooning fees.

Although we began developing this tool more than a year ago, we designed it to be evergreen. Its modular design and cloud-based deployment allow for continuous improvements, no matter how the underlying OCR or language modeling technologies evolve.

Let’s talk about possibilities

Does Smart OCR sound like it could help your organization streamline document processing? We would love to learn more about your workflow and show you how our solution can be tailored to your needs.

Think this tool could help you? Let’s talk about possibilities. Schedule a phone call.

Ready to reach your goals with data?

If you want to reach your goals through the smarter use of data and A.I., you're in the right place.

Ready to reach your goals with data?

If you want to reach your goals through the smarter use of data and A.I., you're in the right place.

Ready to reach your goals with data?

If you want to reach your goals through the smarter use of data and A.I., you're in the right place.

Ready to reach your goals with data?

If you want to reach your goals through the smarter use of data and A.I., you're in the right place.

© 2025 Agilytic

© 2025 Agilytic

© 2025 Agilytic