One of the most important parts of the Kodexa Platform is the ability to create and manage models. Models are the core of the platform and are used to extract data from documents. Models are created by using Python and the Kodexa SDK.

In its simpliest form a model is simply a small Python script receives a Document and returns a Document. The model can be as simple as:

def infer(document):
    return document

You would put this code in a module, ie.

model/
    __init__.py
    model.py

In order to deploy the model we need to also create a model.yaml file that describes the model. This file is used to describe the model and also to provide the metadata that is used to deploy the model to the Kodexa Platform.

# A very simple first model that isn't trainable

slug: my-model
version: 1.0.0
orgSlug: kodexa
type: store
storeType: MODEL
name: My Model
metadata:
  atomic: true
  state: TRAINED
  modelRuntimeRef: kodexa/base-model-runtime
  type: model
  provider: Amazon Web Services
  providerUrl: https://aws.amazon.com/textract/
  contents:
    - model/*