Skip to content

Overview

The concept of a store is a way for the Kodexa platform to hold information.

There are two types of store that you can create:

  • Model stores hold the implementation, training and metadata for a model that can be executed on a Model Runtime
  • Document stores hold files (and their associated document representations)
  • Data stores hold the extracted data objects and attribute that have been identified in documents held in a document store.

At a high-level the general design of stores is to hold native files, associated “Document” representations of the unstructured data. Then through the definition of a Data Structure we would add labels to these documents. Then the platform is able to extract the labeled data into a structured form.

Document Stores

A document store is one that is responsible for holding files that will be parsed, labeled and used as a source for structured data.

The actual term document refers to the fact that when you upload a file (like a PDF) we will actually create a container that will hold both the original file (we call it the native) and then one of more Kodexa Documents that will hold the semi-structured representation of the native file.

These containers of native files and documents are called Document Families. The are brought together since all the documents are representations of the original file, and we support holding multiple documents since models (or humans) can label documents independently.

Data Stores

A data store is designed to hold structured data that has been extracted from a set of labeled documents that are held in a document store.

A data store is linked to a Data Structure (internally called a Taxonomy). The Data Structure formalizes the structure of the data into groups and individual data attributes, then the actual data points and their related groups are created in the data store (with lineage back to the document store holding the document representation).

Model Stores

A model store is a very different type of store, while it does hold native files (like a document store), it is designed to support holding the implementation and trained representations of a machine-learning model.

How do store types relate to each other?

The heart of Kodexa is designed to allow you to create a pipeline of processing steps that will take a document from being a native file to being a structured data set. This involves two of the store types, the document store and the data store. Below is a high-level view of how these stores relate to each other.

:include-image: store-overview.png {title: "High-Level Flow", fit: true}