File input and output
Previous: Request results
Besides data tables, processes can handle different types of files as input and output.
The same is true for endpoints. They can be configured to have more than just JSON formatted data as input or output.
Default behavior
To better understand advanced input and output formats, let's briefly recap the default behavior.
Usually, you provide a JSON formatted request body and receive a JSON formatted output, e.g., via curl
of when
testing an Endpoint.
curl http://localhost:8099/DEFAULT/api/v1/services/mydeployment/score \
-X POST \
-H 'Content-Type: application/json' \
--data '{"data": [{}]}'
The underlying process knows how to handle the JSON input, because the Content-Type
explicitly states that the body is
in that format. In addition, omitting the Content-Type
header yields the same result as JSON is the default input
format. The request body payload is automatically parsed and converted to what the process's input requires (an
in-memory table representation leveraging the HDF5 data format) when JSON is the input format. With this, the process is
executed as usual, similar to when you execute in AI Studio directly. The key difference is that the input is coming
from the global process input port (from the outside, not via a Retrieve operator or alike).
The same applies vice versa to global output ports. When connected, the process's result (first one only), is returned and the endpoint automatically converts the in-memory table representation to JSON. This is what you then receive when you've called the endpoint.
Getting started with arbitrary input and output formats
For arbitrary input formats to work, the process being executed needs to receive the proper input. In addition, the underlying Scoring Agent or Web API Agent needs to convert any output format a process returns to a representation adhering to web standards.
Arbitrary input and output formats leverage the fil
(file) input and output format which means that non-JSON data is
transposed as binary file object into the process.
The process itself is responsible for handling the received file object! This means that you need to design your process to parse that file object correctly. Example: if you provide a CSV file as input, you need to use the Read CSV operator and connect it to the input port as first Operator.
To control input and output formats of endpoints, use the HTTP headers Content-Type
(for input) and Accept
(for
output). Behavior adheres the following rules:
- By default,
application/json
is used for input and output, meaning that anyPOST
request requires body payload in that format (a data table). - Inputs: When the
Content-Type
is set to something different than JSON, e.g.,text/csv
, the endpoint assumes that it's an arbitrary file input.- This means that the input port inside the process is then a
fil
object and you need to apply proper parsing. - The
fil
object is annotated with theContent-Type
header value, so you can use it to determine how to parse the file within the process.
- This means that the input port inside the process is then a
- Outputs: When the process's result is a file object (
fil
port connected to output port), the endpoints auto-detects what kind of file it is and sets theContent-Type
header of the response accordingly. format.- When setting the HTTP header
Accept
to a specific value, but the auto-detected type differs, a warning is displayed and the auto-detected type is used. - This behavior can be further fine-tuned to match your needs if auto-detection is not returning your desired
content type, e.g., if your client expects a specific content type from the response. You can use the Annotate
operator on the resulting file object and set a custom annotation with key
Content-Type
to a valid mime type. When such an annotation is set, the endpoint returns this as HTTPContent-Type
header value. - In the latter case, when you explicitly annotated the file object with a
Content-Type
, theAccept
header (optional) provided in the request must match the annotated type, otherwise a400 Bad Request
is returned.
- When setting the HTTP header
Here's an example on how to apply a custom annotation to a file object:
The endpoint's Content-Type
response header is then set to the value of the annotation with key Content-Type
. Here
it's text/csv
.
The following examples show how to use arbitrary data formats as input and receive a specified output format for
endpoints. All of them use curl
command-line utility to demonstrate the API calls, are deployed as Web API Endpoints (
URL includes the Web API Group), and assume that no authentication is needed.
Example: reading CSV and returning as Excel (binary file)
Here's an example of a process that takes a CSV file as input and returns an Excel file as output.
The Read CSV operator is directly connected to the inp
of the process. When the endpoint is called,
data flows from the input port (the request's file body) towards the operator being connected. Though, designing a
process in such a way is not enough for it to work. When the process gets deployed, calling the
deployed endpoint with the proper Content-Type
header is necessary.
In addition to connecting the Read CSV operator, the following image shows that also a Write Excel operator needs to be added. Furthermore, the connection to the process's result port needs to be made.
When a CSV is provided as file input in the request body, the endpoint transposes the file object to the process. Read CSV parsed the file object, forwards it to Write Excel which determines the process's result format (file).
The following example shows how to call the endpoint with a CSV file as input and receive an Excel file as output.
curl https://$DOMAIN/DEFAULT/api/v1/services/any-io/csv-in-xlsx-out \
-X POST \
-H 'Content-Type: application/octet-stream' \
--data-binary @/path/to/input.csv \
--output /path/to/output.xlsx
Example: reading CSV and returning JSON
When changing the example from above such that the Read CSV operator is directly connected to the process's result port, then JSON is returned.
Remember: the process's result determines the output format!
To call, the location of the output file can be omitted. The result is returned as JSON formatted data.
curl https://$DOMAIN/DEFAULT/api/v1/services/any-io/csv-in-xlsx-out \
-X POST \
-H 'Content-Type: application/octet-stream' \
--data-binary @/path/to/input.csv
This call yields the following result (abbreviated):
{
"data": [
{
"att1": -3.9360351555111546,
"att2": 5.356000658667869,
"att3": -5.388412850201785,
"att4": 7.3295106089687385,
"att5": -1.6675003630680458,
"att6": "Todd Miller",
"att7": "Aachen",
"label": "cluster29"
}
]
}
Example: return an image
To return an image, the process's result needs to be a file object. The following example shows a process which does
that. It also needs no input, so it can be called with the GET
HTTP method.
The process randomly fetches a file and returns it.
If your client (not terminal), supports visualizing images, you directly see the image if you call the following:
curl https://$DOMAIN/DEFAULT/api/v1/services/any-io/no-in-img-out \
-H 'Accept: image/jpeg' \
--output /path/to/myimage.jpg
The result is a JPEG image. The Accept
header specifies the desired output format and auto-detection returned the same
type.
Limits
A 2GB limit applies to all binary input data per request. File input data is temporarily swapped to disk to transpose it to the process. It's cleaned after execution. If you expect a lot of concurrent requests to your endpoints, ensure to assign enough disk space to the Scoring Agent / Web API Agent to do the swapping.