One of the most important parts of the Kodexa Platform is the ability to create and manage models. Models are the core of the platform and are used to extract data from documents. Models are created by using Python and the Kodexa SDK.
In its simpliest form a model is simply a small Python script receives a Document and returns a Document. The model can be as simple as:
You would put this code in a module, ie.
In order to deploy the model we need to also create a model.yaml
file that describes the model. This file is used
to describe the model and also to provide the metadata that is used to deploy the model to the Kodexa Platform.
# A very simple first model that isn't trainable
slug: my-model
version: 1.0.0
orgSlug: kodexa
type: store
storeType: MODEL
name: My Model
metadata:
atomic: true
state: TRAINED
modelRuntimeRef: kodexa/base-model-runtime
type: model
provider: Amazon Web Services
providerUrl: https://aws.amazon.com/textract/
contents:
- model/*