File input and output

Besides data tables, processes can handle different types of files as input and output.

The same is true for endpoints. They can be configured to have more than just JSON formatted data as input or output.

Default behavior

To better understand advanced input and output formats, let's briefly recap the default behavior.

Usually, you provide a JSON formatted request body and receive a JSON formatted output, e.g., via curl of when testing an Endpoint.

curl http://localhost:8099/DEFAULT/api/v1/services/mydeployment/score \
    -X POST \
    -H 'Content-Type: application/json' \
    --data '{"data": [{}]}'

The underlying process knows how to handle the JSON input, because the Content-Type explicitly states that the body is in that format. In addition, omitting the Content-Type header yields the same result as JSON is the default input format. The request body payload is automatically parsed and converted to what the process's input requires (an in-memory table representation leveraging the HDF5 data format) when JSON is the input format. With this, the process is executed as usual, similar to when you execute in AI Studio directly. The key difference is that the input is coming from the global process input port (from the outside, not via a Retrieve operator or alike).

The same applies vice versa to global output ports. When connected, the process's result (first one only), is returned and the endpoint automatically converts the in-memory table representation to JSON. This is what you then receive when you've called the endpoint.

Getting started with arbitrary input and output formats

For arbitrary input formats to work, the process being executed needs to receive the proper input. In addition, the underlying Scoring Agent or Web API Agent needs to convert any output format a process returns to a representation adhering to web standards.

Arbitrary input and output formats leverage the fil (file) input and output format which means that non-JSON data is transposed as binary file object into the process.

The process itself is responsible for handling the received file object! This means that you need to design your process to parse that file object correctly. Example: if you provide a CSV file as input, you need to use the Read CSV operator and connect it to the input port as first Operator.

To control input and output formats of endpoints, use the HTTP headers Content-Type (for input) and Accept (for output). Behavior adheres the following rules:

By default, application/json is used for input and output, meaning that any POST request requires body payload in that format (a data table).
Inputs: When the Content-Type is set to something different than JSON, e.g., text/csv, the endpoint assumes that it's an arbitrary file input.
- This means that the input port inside the process is then a fil object and you need to apply proper parsing.
- The fil object is annotated with the Content-Type header value, so you can use it to determine how to parse the file within the process.
Outputs: When the process's result is a file object (fil port connected to output port), the endpoints auto-detects what kind of file it is and sets the Content-Type header of the response accordingly. format.
- When setting the HTTP header Accept to a specific value, but the auto-detected type differs, a warning is displayed and the auto-detected type is used.
- This behavior can be further fine-tuned to match your needs if auto-detection is not returning your desired content type, e.g., if your client expects a specific content type from the response. You can use the Annotate operator on the resulting file object and set a custom annotation with key Content-Type to a valid mime type. When such an annotation is set, the endpoint returns this as HTTP Content-Type header value.
- In the latter case, when you explicitly annotated the file object with a Content-Type, the Accept header (optional) provided in the request should match the annotated type, otherwise a warning is printed.

Here's an example on how to apply a custom annotation to a file object:

img/annotate.png

The endpoint's Content-Type response header is then set to the value of the annotation with key Content-Type. Here it's text/csv.

The following examples show how to use arbitrary data formats as input and receive a specified output format for endpoints. All of them use curl command-line utility to demonstrate the API calls, are deployed as Web API Endpoints ( URL includes the Web API Group), and assume that no authentication is needed.

Example: reading CSV and returning as Excel (binary file)

Here's an example of a process that takes a CSV file as input and returns an Excel file as output.

img/csv_input.png

The Read CSV operator is directly connected to the inp of the process. When the endpoint is called, data flows from the input port (the request's file body) towards the operator being connected. Though, designing a process in such a way is not enough for it to work. When the process gets deployed, calling the deployed endpoint with the proper Content-Type header is necessary.

In addition to connecting the Read CSV operator, the following image shows that also a Write Excel operator needs to be added. Furthermore, the connection to the process's result port needs to be made.

When a CSV is provided as file input in the request body, the endpoint transposes the file object to the process. Read CSV parsed the file object, forwards it to Write Excel which determines the process's result format (file).

The following example shows how to call the endpoint with a CSV file as input and receive an Excel file as output.

curl https://$DOMAIN/DEFAULT/api/v1/services/any-io/csv-in-xlsx-out \
    -X POST \
    -H 'Content-Type: application/octet-stream' \
    --data-binary @/path/to/input.csv \
    --output /path/to/output.xlsx

Example: reading CSV and returning JSON

When changing the example from above such that the Read CSV operator is directly connected to the process's result port, then JSON is returned.

Remember: the process's result determines the output format!

To call, the location of the output file can be omitted. The result is returned as JSON formatted data.

curl https://$DOMAIN/DEFAULT/api/v1/services/any-io/csv-in-xlsx-out \
    -X POST \
    -H 'Content-Type: application/octet-stream' \
    --data-binary @/path/to/input.csv

This call yields the following result (abbreviated):

{
  "data": [
    {
      "att1": -3.9360351555111546,
      "att2": 5.356000658667869,
      "att3": -5.388412850201785,
      "att4": 7.3295106089687385,
      "att5": -1.6675003630680458,
      "att6": "Todd Miller",
      "att7": "Aachen",
      "label": "cluster29"
    }
  ]
}

Example: return an image

To return an image, the process's result needs to be a file object. The following example shows a process which does that. It also needs no input, so it can be called with the GET HTTP method.

The process randomly fetches a file and returns it.

If your client (not terminal), supports visualizing images, you directly see the image if you call the following:

curl https://$DOMAIN/DEFAULT/api/v1/services/any-io/no-in-img-out \
  -H 'Accept: image/jpeg' \
  --output /path/to/myimage.jpg

The result is a JPEG image. The Accept header specifies the desired output format and auto-detection returned the same type.

Limits

A 2GB limit applies to all binary input data per request. File input data is temporarily swapped to disk to transpose it to the process. It's cleaned after execution. If you expect a lot of concurrent requests to your endpoints, ensure to assign enough disk space to the Scoring Agent / Web API Agent to do the swapping.

Categories

Versions