1 Introduction

The Protocols and Structures for Inference (PSI) project aims to develop an architecture for presenting machine learning algorithms, their inputs (data) and outputs (predictors) as resource-oriented RESTful web services1 in order to make machine learning technology accessible to a broader range of people than just machine learning researchers.

Currently, many machine learning implementations (e.g., in toolkits such as Weka, Orange, Elefant, Shogun, SciKit.Learn, etc.) are tied to specific choices of programming language, and data sets to particular formats (e.g., CSV, svmlight, ARFF). This limits their accessability, since new users may have to learn a new programming language to run a learner or write a parser for a new data format, and their interoperability, requiring data format converters and multiple language platforms. To address these limitations, the aim of the PSI service architecture is to present the main inferential entities – relations, attributes, learners, and predictors – as web resources that are accessible via a common interface. By enforcing a consistent interface for the entities involved in learning, interoperability is improved and irrelevant implementation details can be hidden to promote accessibility.

The purpose of this document is to specify how learning algorithms, data, and predictors can be presented as RESTful web resources. Learning algorithms, data, predictors, and other resources conforming to this specification will be collectively referred to as PSI services. Although examples are used in this specification, this document is not intended as an introduction to or tutorial for programming against PSI services. Tutorials will be made available separately. Also, this document explicitly does not specify how to implement any particular learning algorithms or data sets as PSI services. Reference libraries and example code for implementing PSI services may be released separately as part of the PSI project.

There are two intended audiences for this specification: providers of PSI services and consumers of PSI services. Providers may include machine learning researchers who wish to present their data and learning algorithms as PSI services for others to use. Consumers may include developers who have little background in machine learning but wish to build predictive models for data they may have. We assume both audiences have some background in the use of RESTful web service APIs. In particular, we assume some familiarity with the HyperText Transfer Protocol (HTTP),2 and the JavaScript Object Notation (JSON).3

This document is organised as follows. The remainder of this introduction describes what machine learning problems are, the design decisions that were made to model these problems, and brief overview of values and schema. Section 2 details the resulting architecture in terms of the main PSI resource types: relations, attributes, learners, and predictors. Two key features of architecture are explained in detail: the use and composition of attributes – functions for extracting values from data – to enable configurable structured data representation; and a simple but flexible schema language for describing and prescribing the structures used to represent data. The purpose of these two features is to make it easy for human or machine consumers of PSI services to “glue” services together, that is, to readily determine what input representation a PSI service requires and construct a conforming representation from the output of other PSI services. Section 3 presents an example of how a hypothetical implementation of PSI specification might be used to solve a simple machine learning problem. Section 4 outlines planned revisions to the specification. Appendix A gives details of the PSI schema language and Appendix B defines a number of useful schema that are provided with any PSI implementation.

1.1 Modelling Machine Learning Problems

There are a diverse range of problems within machine learning that can be cast in a data-learner-predictor framework. These include classification, regression, density estimation, ranking, dimensional reduction, collaborative filtering, structured prediction, and more. As the name of their grouping suggests, each of these problems can be solved by the application of a learning algorithm to a dataset of instances to produce a predictor that can make inferences about additional, previously unseen instances.

The following three scenarios are representative of three common types of machine learning problems – classification, ranking, and probability estimation – and differ in several important ways.

Iris Classification

R.A. Fisher’s famous problem of categorising irises4 into one of three species (setosa, virginica and versicolor) based on measurements of their petal and sepal dimensions is an example of a simple classification problem. The canonical data set for this problem consists of 150 instances represented by feature vectors with four real-valued measurements and a species label.

Real Estate Ranking

Properties in a housing market are described using three features: their suburb name, the number of bedrooms, and their land area. Examples of a buyer’s preferences for properties are available as pairs of houses described by their features, the first being preferred to the second. A learner for this problem is required to infer a scoring function for houses (including new, unseen ones) based on their features after being shown example preferences. The scores are used to construct a total order over houses, with higher scores indicating more preferred houses.

Document Class Probability Estimation

Documents from a corpus are described using a “bag of words” representation: a map from words appearing in the document to their term frequency—inverse document frequency (TF-IDF) weight. Given a collection of documents, each labelled as “spam” (i.e., junk), “ham” (ordinary), or “bacon” (important), a learner is required to estimate the probability over these three categories for new documents.

These three problems all exhibit different prediction types – categories, ranks, and probabilities – and different input structures – numeric feature vectors, pairs of mixed feature vectors, and bag of words. One aim of the PSI project is to provide a framework that is flexible enough to model all of these scenarios in a simple, consistent manner.

1.2 Design Decisions

This section outlines the reasoning behind the key design choices that were made during the development of this specification. At a high-level the main tension was between the desire to develop an architecture that is flexible enough to allow many different types of machine learning problems to be modelled (including the examples above), and simple enough so as not to dissuade people from implementing against it.

1.2.1 Choice of Architectural Style

Due to its conceptual elegance, sound principles, widespread adoption, and ease of interoperability with other web services, the PSI service specification is designed using a Resource-Oriented Architectural style (ROA). In this RESTful style, every interaction between a client and a server can be described as a service Responding to a Request with a Representation of a Resource. By adopting this style, several constraints must be adhered to, which influence the decisions below. These include the identification of resources using URIs (Uniform Resource Identifiers), the use of a small number of request modes (GET, POST, PUT, DELETE), the inspection and manipulation of resources through their representations, and the management of client-server state through linked resources.

Details about the Resource-Oriented Architectural style can be found in the book RESTful Web Services5 by Richardson and Ruby.

1.2.2 Choice of Resources

In the data-learner-predictor style of machine learning problems that this specification aims to formalise, solutions to learning problems are typically described in terms of the relations, learners, and predictors involved. These are therefore natural choices to present as services. However, as described below, the communication between these services relies heavily on attributes and schema which are composed easily when presented as services. Thus, relations, learners, predictors, attributes, and schema are resources within the PSI framework.

Details of these resources can be found in Section 2 below.

1.2.3 Choice of Instance Representation

In ROA architectures, resources are never accessed directly. Instead, representations of the resources are used to form requests and responses to and from these resources. The “common currency” shared by PSI resources are instances: relations are collections of instances of the same “shape”; learners take relations as input to create predictors; predictors take instances as input to make predictions. In order for these resources to interact they must have a shared representation for instances. In the PSI framework this shared representation is JSON – the JavaScript Object Notation.6 This is a widely used, well-supported, lightweight, easy-to-parse data structure that is rich enough to model many common machine learning data types such as dense and sparse feature vectors and matrices, probability distributions, sets, graphs, and text. For this reason, all instances within the PSI service architecture are represented as JSON values.

Details of the JSON specification and syntax can be found at http://json.org.

1.2.4 Use of Attributes and Schema

The variety of data structures used by machine learning algorithms and the flexibility of JSON for modeling them means there is no single choice of JSON structure that can be used for every learning problem. For example, a support vector machine algorithm may require instances to be represented as feature vectors of real numbers; an algorithm for classifying documents might require that documents are represented as “bags of words”; and a ranking algorithm may require pairs of mixed-type feature vectors. This variety leads to two important design constraints:

  1. It must be possible to easily construct novel structures for representing instances for many different types of learner;
  2. Learners must be able to describe the structure of the instance representation they require and attributes must be able to describe the structure of the values they emit.

The first constraint is met in the PSI framework through the use of attributes. These can be thought of as functions that map instances to values. Like JSON values, attributes can be composed through array and object structures to create new attributes that return more complex values. For example, an attribute that returns an integer and another that returns a string may be composed into a attribute that returns arrays containing one integer and one string. This flexibility is the reason that all instance representations in the PSI service architecture are described using attributes.

Details of attributes and their composition rules can be found in Section 2.4.

Describing the range of values an attribute can produce or a learner can take as input can be done through the use of a schema language. Each schema within a schema language describes a range of values through a set of constraints on the structure those values can have and the range of values within that structure. A value that meets the constraints of a schema is said to validate against that schema. For example, a schema might constrain its values to arrays of integers between 0 and 10 in which case the value [2,3,7] would validate against it while [0,-2,11] would not. The main desiderata for a PSI schema language were that it be:

  1. Expressive enough to describe natural JSON representations of existing machine learning data structures;
  2. Machine interpretable to allow for automated validation;
  3. Simple to read and describe.

The JSON schema language7 is an existing schema language for JSON values which uses JSON objects to describe a variety of constraints on JSON values. It meets the first two requirements above but is arguably too verbose to meet the third requirement. The PSI schema language proposed here mitigates this verbosity through a number of “shorthands” for common JSON schema constraints. This means, like JSON schema, the new schema language can be represented using JSON values, which minimises parsing. However, since the shorthand can be readily translated into a subset of JSON schema, its semantics are grounded by those of JSON schema, which assists validation. Thus, the PSI framework uses a custom schema language to describe the values that may be passed to or returned from PSI resources.

An overview of the PSI schema language can be found in Section 1.3. Its details, including of how it is compiled to JSON schema, can be found in Appendix A. The JSON schema specification is available at http://tools.ietf.org/html/draft-zyp-json-schema-03.

1.2.5 Composition via References

In order to promote modularity, consistency, and re-use of schema, this specification makes heavy use of, and extends, the reference mechanism of JSON schema. In JSON schema references allow parts of schema to be defined elsewhere and referred to by URI. The references are resolved when needed via a HTTP GET request.

This specification makes two extensions to JSON schema references: the specification of some standard schema that are associated with short names instead of URIs and resolved by the PSI framework; and a simple mechanism for passing arguments to URIs during reference resolution.

Schema within the PSI framework can refer to other schema via parameterised names and URIs to enable modularity and reuse.

An overview of schema references and resolution are given in the next section. The details can be found in Appendix A.

1.3 Values and Schema

Values within the PSI framework are used to represent instances and schema when these are required in a request to, or response from, a PSI service.

Values fall into two categories: atomic and structured. Atomic values are integers, (real) numbers, strings, or booleans. Structured values are arrays or objects. An array is an ordered sequence of values (atomic or structured). An object is an unordered collection of properties: key-value pairs where the key is a always a string and the value can be structured or atomic.

In this specification, values are indicated using a monospaced font. Examples of values as defined by the JSON standard include: 11 (integer), -36.6 (number), "setosa" (string), true (boolean), ["Ryde", 3, 703.2] (an array), and { "suburb": "Ryde", "bedrooms": 3, "area": 703.2 } (an object).

A complete specification of the JSON syntax used here is available at http://json.org.

A schema is a value that describes the structure of other values. Specifically, each schema is an object whose properties describe constraints which other values must satisfy in order to validate against that schema.

A full description of the PSI schema language is given in Appendix A.

An example of a schema that highlights several of these properties is given below:

{   "/version=":    2,
    "/id":          "$integer",
    "/name": {
        "?first":   "$string",
        "/last":    "$string"
    },
    "/addresses": { 
        "$array": { 
            "allItems": "$http://example.org/schema/address"
            minItems: 1
        }
    }
}

Each property in this schema defines a constraint determined by the property’s key and value. A property key of the form /KEY indicates that a value that validates against the schema must be an object and must contain a field KEY. A key of the form ?KEY indicates an optional property. If a value S is associated with a key /KEY or ?KEY it indicates that valid values for the schema must have values associated with KEY that are valid for the schema S. A property with /KEY= and value VALUE indicates that valid values must have a property with key KEY with associated value VALUE.

The prefix $ on a string indicates a schema reference. The references "$integer" and "$string" are local schema that are standard within the PSI framework to describe integer and string values, respectively. The "$array" reference denotes a local schema template. It takes in arguments in the form of a JSON object which controls the schema it returns when resolved. References that are URIs are global schema that must be resolved via HTTP requests. References and their resolution are described in Section A.2 below. Local schema and schema templates are described in Appendix B.

The schema above therefore describes object values with four mandatory fields:

A value that would match this schema (given a reasonable definition of the address schema at the URI) is:

{   "version":  2,
    "id":       231,
    "name":     { "first": "Amy", "last": "Jones" },
    "address":  [ { "number": 14, "street": "Bird St.", "suburb": "Epping" } ] 
}

2 The PSI Service Architecture

The main components of the PSI framework are the relation, learner, and predictor resources. As shown in Figure 1, these resources are central to the three primary activities within the framework: training, prediction, and updating. In all of these activities attributes are required to construct instance representations as JSON values in order to conform to the task, input, and update schema published by learner and predictor resources.

Training a learner involves sending it a request with all the information the learner requires to construct a predictor. The totality of this information is called a task and typically consists of parameters to configure the learner and one or more representations instances from a relation. The instance representations are expressed through the use of attributes so as to match the structure the learner requires. The learner expresses its parameter and resource requirements through an task schema. As Figure 1 shows the flow of data in the PSI architecture, task schema is not explicitly represented in the figure.

To make a prediction an instance representation is required to match the structure required by the predictor resource. Once again, this is expressed through the use of schema and attributes to construct suitable representations. If a suitable value is given to a predictor as input the predictor will return a predicted value.

Some predictor resources can be updated after they are initially created. Those that can be updated will provide an update schema to express how updating values must be represented before they can be used to update the predictor.

Figure 1: PSI Service Architecture. Resources are indicated via a resource tag. Representations are shaded boxes. Schema for resources are marked using dashed boxes.

Figure 1: PSI Service Architecture. Resources are indicated via a “resource” tag. Representations are shaded boxes. Schema for resources are marked using dashed boxes.

The remainder of this section describes all of the resources within the PSI framework, the ways in which they can be called, and their request and response structures.

2.1 Conventions

In the tables below only relative URIs are given. It is assumed that resources have base URIs of the form http://example.org/{RelURI} where RelURI denotes a relative URI. The use of {curly brackets} to denote variable terms within a URI template is used throughout as per the Draft URI Template standard.8

The data sent via a PUT and POST requests to an attribute or attribute array resource must be represented in JSON. If a method is described as GET then the arguments passed in via the GET query are assumed to be appropriately URI encoded.9 Type names refer to those types as represented in JSON. In addition to the standard JSON types, the following pseudo-types, whose names appear in italics, are used to represent additional semantics:

The string-valued properties requestType and responseType are common to request and response messages, respectively. Their values may be used to disambiguate requests sent to the same URI using different methods, and also to validate the structure of the rest of the message. When a parameterised message can be sent via GET, the requestType does not need to be included.

2.2 Schema

Schema are resources within the PSI framework since their representation may be requested during schema resolution as described in Section A.2. The response to a GET request to a schema resource is the JSON value representing that PSI schema.

Purpose Method URI Arguments Returns
Retrieve schema S GET /{S} Schema dependant or template=true The PSI schema for S

Some schema are parameterised in the sense that they can take in arguments and return different variants of the same kind of schema. A description of the parameters a schema takes (if any) can be obtained by setting the template argument in the GET request to true. Schema requested in this way will return a JSON value called a schema template that describes the parameterised values within a schema using strings of the form %ARG, where ARG denotes an argument name.

For example, a schema for numeric values might have a URI at http://example.org/schema/number. When called with no arguments it may return the schema { "type": "number" }. When a GET request to http://example.org?template=true is the following schema template is returned:

{ "type": "number", "minimum": "%min", "maximum": "%max" }

indicating that the schema resource has two arguments min and max that control the range of values for this schema.

When the same schema resource is called using arguments via a GET to http://example.org/schema/number?min=10 the schema resource returns { "type": "number", "minimum": 10 }, which constraints valid values to be 10 or larger.

These parameterised schema are used extensively in the pre-defined schema described in Appendix B.

2.3 Relations and Instances

In the PSI framework, the term instance is used to describe a single record of some phenomenon of interest. For example, in the Iris problem, an instance corresponds to a particular flower; in the house ranking problem an instance is a pair of houses ordered by preference; in the document problem an instance is a document. Instances are only ever accessed through their representations and a single instance may support several different representations. For example, a flower might be represented by an array of numbers such as [ 5.1, 3.5, 1.4, 0.2 ] or an object like

{ "sepal": { "length": 5.1, "width": 3.5 }, 
  "petal": { "length": 1.4, "width": 0.2 } }

Abstractly, a relation is a collection of instances which share a common “shape” – that is, they can be represented using the same attributes. Attributes are discussed in more detail below but can be thought of as functions from instances to values. In the Iris example a simple attribute may report the sepal length of an instance while another, more complicated attribute may return an array with all the measurements. Formally then, a relation is a collection of instances that, if one instance is in an attribute’s domain, all other instances in that relation are too.

2.3.1 Relations as Resources

As a resource, the main function of a relation is descriptive. It must specify how many instances it contains and may provide a URI for a default attribute that can be used represent instances. Each instance within a relation is referred to via its unique index. For example, the 7th instance of a relation with URI /R is referenced using the URI /R/7. Relations report the number n of instances they contain, and guarantee that indices in the range 1–n refer to instances.

Both the instance count and default attribute are retrieved as part of a GET request to the relation resource. The GET request takes no arguments.

Purpose Method URI Arguments Returns
Describe relation R GET /{R} None Description response for R

The response a relation resource returns to a GET request contains an identifying response type, the URI of the relation that the GET request was sent to, the size of the relation (i.e., the number of instances), and (optionally) the URI of the relation’s default attribute. The default attribute is used to provide some standard way to represent the instances in the relation.

Description Response:

Property Type Required Description
responseType string Y “relation#description”
uri URI Y URI of the relation
description string N A human-readable description of the relation
size integer Y The number of instances in the relation
attribute URI N The default attribute for this relation

2.4 Attributes

Abstractly, an attribute is function that, when applied to an instance, returns a value. Attributes that return atomic values will be referred to as atomic-valued attributes and those that return structured values will be referred to as structure-valued attributes. Each attribute has an associated schema. Every value returned by an attribute is valid for that attribute’s schema.

Practically, attributes act as the interface between instances and their representation for learning and prediction. Different learning services require different representations of relations in order to extract the information they need to construct predictors. The aim of attributes is to provide a simple, flexible way to build representations of instances that are appropriate for input to a variety of learning and prediction services. Much of this flexibility comes from the ability to compose attributes.

2.4.1 Attribute Composition

As with values, arrays and objects can be used to compose attributes to create structured attributes. Like structured values, a structured attribute can be created via the recursive use of array and object structures that contain other attributes.

Formally, given n attributes A1, …, An, their array composition is the attribute that, when given an instance I, returns the array value [V1,,Vn] where Vi is the value obtained by applying attribute Ai to instance I. The object composition of the attributes A1, …, An with keywords K1, …, Kn is the attribute that, when given instance I, returns the object value {K1:V1,,Kn:Vn} where the Vi are as before. The recursive application of these two types of composition can be used to construct more complex structured attributes.

The schema associated with the composition of two or more attributes is the corresponding composition of those attributes’ schema. If A is the array composition of attributes A1, …, An with schema S1, …, Sn then the schema for A will be the array composition of the schema S1, …, Sn. If O is the object composition of the same attributes with keys K1, …, Kn then the schema for O will be the object composition of S1, …, Sn with the same keys. The details of schema composition are given in Section A.1.5.

For example, suppose A1 and A2 are attributes that return the sepal length and sepal width of an iris and both have the schema $number. Furthermore, suppose that the values those attributes return on a particular iris I are 5.1 and 3.5, respectively. The array composition of those attributes would have schema [ $number, $number ] and return the array value [ 5,1, 3.5 ] when applied to I. The object composition of A1 and A2 with keys length and width would have schema { "/length": $number, "/width": $number } and return the object value { "length": 5.1, "width": 3.5 } when applied to I.

2.4.2 Attributes as Resources

Like every resource, an attribute is identified by its URI. There are three types of requests that can be made of an attribute resource via REST calls to its URI: describe, apply, and create. The format of these requests and their responses are described in the subsections below.

2.4.2.1 Describe

When queried with a GET request with no arguments, an attribute resource with URI /{A} returns a description of itself.

Purpose Method URI Arguments Returns
Describe attribute A GET /{A} None Description response for A

The description response contains the URI of the attribute, a short textual description of the attribute and its schema. If the attribute is composed from other attributes, URIs for these sub-attributes are provided in an array. The optional relation property holds a URI for a relation that the attribute can be applied to.

Description Response: 


Property Type Required Description
responseType string Y “attribute#description”
uri URI Y URI of the attribute
description string N A short, human-readable description of the attribute
schema schema Y Schema describing the output of the attribute
relation URI N The default relation for the attribute
subattributes array N URIs for sub-attributes if this attribute is structured

2.4.2.2 Apply

The main function of an attribute is to convert instances and relations into values that can be sent to a learner or predictor resource. For this reason, an attribute resource can accept the URI of an instance via a GET or POST request and return a JSON value representing it. An attribute resource can also accept a relation URI via GET or POST, in which case the attribute is applied to every instance in the relation to create an array of instance representations.

Purpose Method URI Arguments Returns
Get value of instance I or relation R using A GET/POST /{A} An apply request Value response for attribute A and instance I or relation R

An apply request via POST to an attribute resource has the form described below. The symbol Y* in the “Required” column means exactly one of the instance or relation fields are required. If the request is sent via a GET then the requestType field is not required.

Apply Request:

Property Type Required Description
requestType string Y “attribute#apply”
instance URI Y* URI of the instance to apply the attribute to
relation URI Y* URI of the relation to apply the attribute to

The value response to such a request contains the JSON representation of the instance or relation in the value field along with the instance or relation that was passed as input via the apply request. The Y* symbol in the table denotes that exactly one of the instance or relation fields will appear in the response. In the case of a relation, the value is an array. Each item in that array is the representation of an instance from the relation. The URI of the attribute that the request was sent to is also included in the response.

Value Response:


Property Type Required Description
responseType string Y “attribute#value”
value any Y Value produced by applying the attribute
attribute URI Y URI of the attribute that was applied
instance URI Y* URI of the instance the attribute was applied to
relation URI Y* URI of the relation the attribute was applied to

2.4.2.3 Create

Since attributes can be composed to form new attributes, it is necessary that there be a way of assigning a new URI to these new attributes. This is the purpose of the create request via the REST PUT method. The URI the PUT request is sent to is the desired URI for the new attribute. If the desired URI is already associated with a PSI resource a different URI is chosen for the new attribute and reported in the response to the create request.

Purpose Method URI Arguments Returns
Create a new attribute A PUT /{A} A create request Created response for A

A create request contains a JSON value defining a new attribute to create. This value is either an object (for a new object-composed attribute) or an array (for a new array-composed attribute). Every value in such an object or array is either another value representing an object- or array-composed attribute or a string containing the URI of an existing attribute resource. For example, to create an array composition of the attributes with URIs http://A1 and http://A2, the JSON value to send in the attribute field of the create request would be [ "http://A1", "http://A2" ].

The create request may also contain a textual description of what the new attribute is for. This is presented in the description field in description responses for the new attribute.

Create Request:

Property Type Required Description
requestType string Y “attribute#create”
attribute attribute Y Definition of the attribute
description string N Human-readable description of the new attribute

The response to a create request contains the URI of the newly created attribute. The schema and other information about the newly created attribute can be obtained via a describe request to the returned URI.

Created Response:


Property Type Required Description
responseType string Y “attribute#created”
uri URI Y URI of the new attribute

If the create request was malformed the HTTP client error 400 (Bad Request) is returned. If the server was not able to process the create request for some other reason the HTTP server error 500 (Internal Server Error) is returned.

2.5 Learners

A learner is process for generating predictors from relations. Each learner provides a schema for the input it requires to construct a predictor. A value that is valid for a learner’s schema is called a task and described below. A task typically consists of learning parameters of as well as a relation and attributes to represent instances from the relation in a way that is suitable for the learner. A learner uses the information provided in a task to construct and return a predictor.

2.5.1 Tasks and Task Schema

Algorithms that are presented as PSI services have different restrictions on the type of information they can process. For example, an implementation of naive Bayes or a decision tree learner may be able to feature vectors that contain categorical values whereas a support vector machine learner can only accept feature vectors containing real numbers. Similarly, an algorithm for solving a regression problem would require labels for instances to be real numbers while a classifier would need categories for labels. Some learners may additionally require values for parameters that control the learning process. For example, a k-nearest neighbour algorithm requires the number of examined neighbours to be given while a support vector machine may require kernel and regularisation parameters to be set.

The PSI schema language provides a flexible way of specifying the kinds of information a learner requires and the way it is structured. As mentioned above, there are essentially two kinds of values required by a learner: parameters and resources. These two types of value are specified and handled differently.

Task parameters can be specified anywhere within a task schema using the PSI schema language. The validation of parameters against those parts of a task schema are handled like any other value validation (see Appendix A). Task resources must be specified within a resources property within a task schema. Each value within the resources property of a task value must be a URI denoting a PSI resource. These resource values are dereferenced before being validated against the resource parts of a task schema. Specifically, each URI within the resources property of a task value is replaced by the response obtained from a GET request to that URI. The returned value is then validated against the corresponding part of the task schema.

As an example, a support vector machine learning algorithm presented as a PSI service for solving classification problems may return a task schema that specifies that the learner requires: an optional regularisation parameter (with a default value of 1), a relation, a source attribute for representing instances in the relation as numeric feature vectors, and a target attribute for representing the class labels for instances as -1 or 1. Since the relation, target attribute, and source attribute are all PSI resources, these must be specified within the resources property of the task schema. Such a task schema might look like the following:

{ "?lambda":   { "$number": { "min": 0, "default": 1 } },
  "/resources": {
      "/relation": "$relation", 
      "/source":   { "$arrayAttribute": { "allItems": "$number" } },
      "/target":   { "$fixedAttribute": { "values": [-1, 1] } } 
  }
}

Here the schema references for $number, $relation, $arrayAttribute, and $fixedAttribute are for pre-defined schema described in Appendix B. A valid task value for this task schema would be the following:

{ "lambda": 2.3,
  "resources": {
      "relation": "http://example.org/data/iris",
      "source":   "http://example.org/attribute/iris/features",
      "target":   "http://example.org/attribute/iris/species"
  }
}

This above example assumes that the values returned by GET requests to the resource URIs are valid for the corresponding schema in the resources section of the task schema. Further examples of task schema can be found in Section 3 below.

2.5.2 Learners as Resources

The overall purpose of learner resource is to take in a task value and return the URI of a trained predictor resource for that task. A describe request can be made via GET to a learner resource to determine the structure of the task the learner requires. Once a suitable task is assembled a process request can be sent via POST to the learner to construct a predictor resource and return its URI.

2.5.2.1 Describe

A description of a learner resource is obtained by sending a description request via GET to the learner’s URI. No arguments are required for this request.

Purpose Method URI Arguments Returns
Describe learner L GET /{L} None A description response for L

The response to a description request contains the URI of the learner being described, some text describing the learner, and a schema for what is required in the task field of a process request to this learner.

Description Response:

Property Type Required Description
responseType string Y “learner#description”
uri URI Y URI of the learner being described
description string Y Short, human-readable description of learner
taskSchema schema Y Schema specifying format of tasks this learner can process

2.5.2.2 Process

A process request sent via POST to a learner resource is used to start learner training on the task specified in the process request.

Purpose Method URI Arguments Returns
Apply learner L to a task T POST /{L} A learner process request describing T A predictor#description response

The process request must contain a JSON value representation of a task which must be valid for the schema returned by the learner in the taskSchema field of its description response.

Process Request:

Property Type Required Description
requestType string Y “learner#process”
task object Y Definition of the learning task to be processed; must conform to the schema defined in the taskSchema property of a learner’s description

If training a predictor completes quickly the learner resource may respond immediately to a process request with a status response. This contains information about the status of the training (e.g., text describing whether it succeeded or failed). The status field may contain any textual information that a user may find informative. If training has completed the complete field will have the value true and the predictor field will contain the URI for a trained predictor resource.

Status Response:

Property Type Required Description
responseType string Y “learner#status”
uri URI Y The URI of the learner
complete boolean Y true only if the learning process has completed
status string Y Text describing the status of the learning process
predictor URI The URI of the trained predictor; only available if complete is true.

Since training a predictor can take hours or even days, a learner resource may also respond to a process request with processing response containing a URI for a temporary processing resource.

Processing Response:

Property Type Required Description
responseType string Y “learner#processing”
uri URI Y The URI of a processing resource

If the URI for a processing resource is polled with a GET request it returns a status response for the training, as described above. Once training is complete, the progress resource returns the URI for the trained predictor.

2.6 Predictors

A predictor can be thought of as an attribute that is constructed by a learner. The key difference between a predictor and an attribute is that a predictor computes values in two stages. First, a conformal attribute maps an instance into an intermediate value, then a summariser reduces the intermediate value into the predicted value. These are described in more detail below.

2.6.1 Description

The details of a predictor can be obtained via an argumentless GET request.

Purpose Method URI Arguments Returns
Describe predictor P GET /{P} None A description response for P

The description response has some overlap with the description response for an attribute. It is guaranteed to contain the URI of the predictor resource that the request was sent to, the conformal attribute for the predictor, and a schema describing the structure of the predicted values the predictor will output. Optionally, the description can also contain a textual description of the predictor, an object describing the provenance of the predictor (e.g., the learner that created it, when it was created, etc.), and an update schema. The update schema is described in more detail below.

Description Response:

Property Type Required Description
responseType string Y “predictor#description”
uri URI Y URI of the predictor being described
description string N Short, human-readable description of predictor
provenance object N Structure describing how this predictor was created
attribute attribute Y The conformal attribute for the predictor
schema schema Y The schema for the output (i.e., predictions) of the predictor
updateSchema object N Schema defining the structure of update objects used in update requests

2.6.2 Instance Prediction

Predictors are like attributes in that they map instances to values. The request for making predictions in this way is identical to the apply request described for attributes in Section 2.4.2.2 above.

Purpose Method URI Arguments Returns
Get value of instance I or relation R using predictor P GET/POST /{P} An apply request Prediction for predictor P and instance I or relation R

The response to the above request is of the same type as the value response described in the Attributes section above.

2.6.3 Rebinding

The reason predictors are described in terms of a conformal attribute and a summariser is to allow for the application of predictors to instances other than those from the relation they were train on. To apply a predictor to a new relation, it is necessary to replace the conformal attribute with a new one. The new attribute must be compatible with the original attribute in the sense that every value the new attribute returns must validate against the schema for the original attribute.

For instance, if another dataset describing iris flowers becomes available, but instances are stored as objects with named properties rather than arrays of measurements, a new attribute may be used to map this object structure to the array of numbers expected by the predictor.

New predictors are created using a create request sent via a PUT to the desired URI for the new predictor.

Purpose Method URI Arguments Returns
Create a new predictor P by rebinding P’ with attribute A PUT /{P} A create request A predictor#created response for P

The attribute that is used to replace a predictor’s conformal attribute is called the binding attribute and is specified as part of a create request along with the URI of the predictor that is to have its conformal attribute replaced by the binding attribute.

Create Request:

Property Type Required Description
requestType string Y “predictor#create”
predictor URI Y URI of the predictor P’ to rebind
binding attribute Y The attribute to use to rebind the predictor

The response to a create request contains the URI of the newly created predictor. If the desired URI specified in the create request was not available, a URI that was available is returned in the created response.

Created Response:

Property Type Required Description
responseType string Y “predictor#created”
uri URI Y URI of the new predictor

A description request sent to the newly created predictor will return a description response with the attribute property equal to the binding attribute in the create request and the same value in the schema property as the schema for the predictor in the create request.

2.6.4 Raw Prediction

As well as rebinding, predictors can also be given “raw” values to predict from. Raw values are values that may not necessarily be a representation of an instance in a relation. Prediction from raw values can be thought of as accessing the summarising part of a predictor directly. In the PSI framework, this is done by a predict request sent via a POST to the predictor’s URI.

Purpose Method URI Arguments Returns
Predict using P on instance value I POST /{P} A predict request for I A prediction response for P and I

A predict request must contain a value that is valid for the schema of the predictor’s conformal attribute.

Predict Request:

Property Type Required Description
requestType string Y “predictor#predict”
value any Y A value conforming to predictor’s conformal attribute

A predictor resource returns a prediction response to a predict request that contains a value valid for the schema given in the predictor’s description response.

Prediction Response:

Property Type Required Description
responseType string Y “predictor#prediction”
prediction any Y A value containing the predictor output

2.6.5 Updating Predictors

Some predictors support being modified in response to additional training examples. Such predictors called updatable predictors and expose an additional interface through which these extra training examples can be given.

Any new training example that is to be used to update a predictor is called an update value. Update values must be valid for the schema returned in the updateSchema property of description reponse for that predictor. An update value is sent as part of an update request via POST to the predictor’s URI.

Purpose Method URI Arguments Returns
Update predictor P with the value V POST /{P} An update request with value V An updated response

Update Request:

Property Type Required Description
requestType string Y “predictor#update”
update object Y Object containing a value that is valid for this predictor’s updateSchema

An updated response to an update request contains the URI for the updated predictor. This may be the same as the predictor URI that the update request was sent to, or it may be a new URI. The latter case may be useful if the implementaiton of the PSI framework wants predictor resources to be immutable yet still allow for incremental training.

Updated Response:

Property Type Required Description
responseType string Y “predictor#updated”
uri URI Y URI for the updated predictor

3 Example Usage

This section presents how a hypothetical set of PSI services might be interacted with. This includes: examining a relation consisting of iris instances using its default attribute; examining a learner to determine its task schema; creating a new attribute for the relation in order to construct an appropriate task for the learner; examining the resultant predictor; predicting with the predictor; updating the predictor; binding the predictor with a new attribute so as to make it conform to a different relation.

The example calls to PSI resources are presented using the following schema, where each part is enclosed in angle brackets < >:

<HTTP method> <Example resource's URI>
<JSON encoded request object, if applicable>
----
<HTTP response code and label>
<JSON encoded response object, if applicable>

3.1 Examining a relation using its default attribute

A version of the Iris data set is made available as a PSI relation at the URI http://example.org/data/iris. A description of the relation is obtained using a GET request to that URI.

GET http://example.org/data/iris
----
200 OK
{
    "responseType":     "relation#description",
    "uri":              "http://example.org/data/iris",
    "description:       "The iris data set, courtesy of Sir R. A. Fisher",
    "size":             150,
    "attribute":        "http://example.org/attribute/iris"
}

This description response shows that the relation has 150 instances and they can be represented using the default attribute at http://example.org/attribute/iris. A GET request to the attribute URI reveals its description, schema, default relation, and sub-attributes:

GET http://example.org/attribute/iris
----
200 OK
{
    "responseType": "attribute#description",
    "uri":          "http://example.org/attribute/iris",
    "description:   "A structured attribute for presenting iris dimensions",
    "relation":     "http://example.org/data/iris",
    "schema":       {
        "/sepal": { "/length": "$number", "/width": "$number" },
        "/petal": { "/length": "$number", "/width": "$number" } 
        "/species": { "enum": [ "setosa", "versicolor", "virginica" } },
    "subattributes": [
        "http://example.org/attribute/iris/sepal",
        "http://example.org/attribute/iris/petal"
        "http://example.org/attribute/iris/species"
    ]
}

From this description it is clear that this attribute returns values that are objects with a sepal and petal properties, and the value for those properties are objects with length and width properties that must contain numeric values. It also shows that this attribute returns values with a species property that can take on the string values “setosa”, “versicolor”, or “virginica”.

The description also shows that this structured attribute has three sub-attributes, accessible at http://example.org/attribute/iris/sepal, http://example.org/attribute/iris/petal, and http://example.org/attribute/iris/species. GET requests to these URIs can be used to obtain more information. For example, the petal sub-attribute responds with:

GET http://example.org/attribute/iris/petal
----
200 OK
{
    "responseType": "attribute#description",
    "uri":          "http://example.org/attribute/iris/petal",
    "relation":     "http://example.org/data/iris",
    "schema":       { "/length": "$number", "/width": "$number" },
    "subattributes": [
        "http://example.org/attribute/iris/petal/length"
        "http://example.org/attribute/iris/petal/width"
    ]
}

The first instance in the relation has URI http://example.org/data/iris/1. Applying the default attribute for the relation to this instance via a GET request looks like this (note that the instance URI is URL encoded using the standard method for URI query strings):

GET http://example.org/attribute/iris?instance=http%3A%2F%2Fexample.org%2Fdata%2Firis%2F1
----
200 OK
{
    "responseType": "attribute#value",
    "value":        {
        "sepal": { "length": 5.1, "width": 3.5 },
        "petal": { "length": 1.4, "width": 0.2 },
        "species": "setosa"
    },
    "attribute":    "http://example.org/attribute/iris",
    "instance":     "http://example.org/data/iris/1"
}

Applying the species attribute to the entire Iris relation yields an array of string values (the ellipsis denotes 145 elided values in the array):

GET http://example.org/attribute/iris/species?instance=http%3A%2F%2Fexample.org%2Fdata%2Firis
----
200 OK
{
    "responseType": "attribute#value",
    "value":        [ 
        "setosa", "setosa", "setosa", ..., "virginica", "virginica"
    ],
    "attribute":    "http://example.org/attribute/iris",
    "relation":     "http://example.org/data/iris"
}

3.2 Examining a learner

A simple k-nearest neighbour learning algorithm is made available as a PSI resource with URI http://example.org/learner/knn. A GET request to this URI gives the following response:

GET http://example.org/learner/knn
----
200 OK
{
    "responseType": "learner#description",
    "uri":          "http://example.org/learner/learningAlg",
    "description":  "A k-nearest neighbour algorithm that takes feature vectors as input",
    "taskSchema":   {
        "?k": { "$integer": { "default": 1 },
                "description": "The number of nearest neighbours to examine" },
        "/resources": {
            "/relation": "$relation",
            "/target":   { "$nominalAttribute": { "allItems": "$string" },
            "/source":   { "$arrayAttribute": { "allItems": "$numericValue" } }
        }
    }
}

The above description shows (with reference to the pre-defined schema in Appendix B) that the learner requires a relation URI, a target attribute that returns string values to be interpreted as nominal values, and a source attribute that returns arrays of numbers (i.e., feature vectors). The learner can also take an optional parameter k which controls the number of nearest neighbours to consider. By default, this is set to 1.

3.3 Constructing a task

To build a task for the knn learner that will learn to classify irises the information provided by the default attribute for the iris relation needs to be re-organised to present the iris dimensions as an array of numbers. This can be done by creating a new attribute using a create request sent via a PUT to a new URI:

PUT http://example.org/attribute/iris/custom
{
    "requestType":  "attribute#create",
    "description":  "A feature vector representation of iris dimensions",
    "attribute": [
        "http://example.org/attribute/iris/sepal/length",
        "http://example.org/attribute/iris/sepal/width",
        "http://example.org/attribute/iris/petal/length",
        "http://example.org/attribute/iris/petal/width"
    ]
}
----
200 OK
{ 
    "responseType": "attribute#created",
    "uri":          "http://example.org/attribute/iris/custom2"
}

Note that the originally requested URI for the PUT request is not the same as the one in the response. This could be because the requested URI was not available so the service found a similar one that was available and used it instead.

A GET request to the new URI reveals the schema of the new created attribute:

GET http://example.org/attribute/iris/custom2
----
200 OK
{
    "responseType": "attribute#description",
    "uri":          "http://example.org/attribute/iris/custom2",
    "description":  "A feature vector representation of iris dimensions",
    "relation":     "http://example.org/data/iris",
    "schema":       [ "$number", "$number", "$number", "$number" ],
    "subattributes": [
        "http://example.org/attribute/iris/custom2/1"
        "http://example.org/attribute/iris/custom2/2"
        "http://example.org/attribute/iris/custom2/3"
        "http://example.org/attribute/iris/custom2/4"
    ]
}

The above response shows that schema for the newly created attribute is an array of numbers. The default relation was inherited from the original attributes. Also, new sub-attributes are shown that access each of the items in the array values returned by the new attribute.

3.4 Training a predictor

Since the newly created attribute returns arrays of numbers, it is valid for the source part of the task schema in the knn learner’s description shown above. The following process request can be sent to the knn learner via a POST:

POST http://example.org/learner/knn
{
    "requestType":  "learner#process",
    "task": {
        "k":            3,
        "resources": {
            "relation":     "http://example.org/data/iris",
            "source":       "http://example.org/attribute/iris/custom2",
            "target":       "http://example.org/attribute/iris/species"
        }
    }
}
----
200 OK
{
    "responseType": "learner#status",
    "uri":          "http://example.org/learner/knn",
    "complete":     true,
    "status":       "Training successful; processed 150 instances",
    "predictor":    "http://example.org/predictor/20111130210543"
}

K-nearest neighbour algorithms are “lazy” algorithms in the sense that training consists of merely memorising the instances. This means the training time for this task is very short and so the learner is able to return a status response with a URI for the predictor immediately, rather than return a temporary processing resource for polling.

3.5 Examining a predictor

A GET request to the URI of the new predictor results in a description response:

GET http://example.org/predictor/20111130210543
----
200 OK
{
    "responseType": "predictor#description",
    "uri":          "http://example.org/predictor/20111130210543",
    "description":  "kNN trained predictor",
    "provenance":   {
        "learner":  "http://example.org/learner/knn",
        "task":     {
            "k":            3,
            "resources": {
                "relation":     "http://example.org/data/iris",
                "source":       "http://example.org/attribute/iris/custom2",
                "target":       "http://example.org/attribute/iris/species"
            }
        },
        "created":  "2011-11-30 21:05:43"
    },
    "attribute":    "http://example.org/attribute/iris/custom2",
    "schema":       { "enum": [ "setosa", "versicolor", "virginica" ] },
    "updateSchema": { 
        "/target":  { "enum": [ "setosa", "versicolor", "virginica" ] },
        "/source":  {
            "$array": { "items": [ "$number", "$number", "$number", "$number" ] }
        }
    }
}

This response gives a lot of detail about the predictor and how to use it. The optional provenance property contains an object that gives the learner URI and task that was used to create this predictor. The attribute property gives the predictor’s conformal attribute. In this case it is the custom attribute built in the examples above. The schema shows that this predictor’s output values are one of three possible string values: “setosa”, “versicolor”, or “virginica”. Finally, the presence of the updateSchema property indicates this predictor can be updated with additional training instances. The representation of updating instances must be valid for the schema provided in the updateSchema property.

3.6 Using a predictor

To use a predictor, a test instance must be presented as a value that is valid for the schema of the predictor’s conformal attribute. In this case the conformal attribute is the same as the custom attribute with URI http://example.org/attribute/iris/custom2 that was described above. The schema for this attribute is [ "$number", "$number", "$number", "$number" ] so the value [ 6.1, 2.1, 4.1, 1.7 ] would be an appropriate one to request a predictor for in the following way:

POST http://example.org/predictor/20111130210543
{
    "requestType":  "predictor#predict",
    "instance":     [ 6.1, 2.1, 4.1, 1.7 ]
}
----
200 OK
{
    "responseType": "predictor#prediction",
    "uri":          "http://example.org/predictor/20111130210543",
    "instance":     [ 6.1, 2.1, 4.1, 1.7 ],
    "value":        "versicolor"
}

In this case the predictor returned a value of "versicolor" for the given iris dimensions, along with the predictor’s URI and the instance that the predictor was applied to.

3.7 Updating a predictor

If a new iris was found, manually classified, and the appropriate measurements made, the predictor created above could be updated to take into account this instance. To do so the instance and its classification would need to be represented as a value that is valid for the predictor’s update schema.

In this case, the schema to conform to for updates is:

{   "/target":  { "enum": [ "setosa", "versicolor", "virginica" ] },
    "/source":  {
        "$array": { "items": [ "$number", "$number", "$number", "$number" ] }
    }
}

If the new iris was determined to be a “virginica” with sepal length of 6.4, a sepal width of 3.1, a petal length of 6.5, and a petal width of 2.1, this information would be presented to the predictor in the following update request:

POST http://example.org/predictor/20111130210543
{
    "requestType":  "predictor#update",
    "instance": {
        "target":   "virginica",
        "source":   [ 6.4, 3.1, 6.5, 2.1 ]
    }
}
----
200 OK
{
    "responseType": "predictor#updated",
    "uri":  "http://example.org/predictor/20111130210543"
}

The response returns the URI for the updated predictor which, in this case, is the same as the original predictor’s URI. A description request to this URI shows how the predictor has been updated:

GET http://example.org/predictor/20111130210543
----
200 OK
{
    "responseType": "predictor#description",
    "uri":          "http://example.org/predictor/20111130210543",
    "description":  "kNN trained predictor",
    "provenance":   {
        "learner":  "http://example.org/learner/knn",
        "task":     {
            "k":            3,
            "relation":     "http://example.org/data/iris",
            "source":       "http://example.org/attribute/iris/custom2",
            "target":       "http://example.org/attribute/iris/species"
        },
        "created":  "2011-11-30 21:05:43",
        "updated":  "2011-11-30 21:12:34"
    },
    "attribute":    "http://example.org/attribute/iris/custom2",
    "schema":       { "enum": [ "setosa", "versicolor", "virginica" ] },
    "updateSchema": { 
        "/target":  { "enum": [ "setosa", "versicolor", "virginica" ] },
        "/source":  {
            "$array": { "items": [ "$number", "$number", "$number", "$number" ] }
        }
    }
}

In this case, the only thing that has changed is the addition of the updated property to the provenance part of the response. This is not a requirement of the PSI specification and other predictors may choose to represent changes due to updates in a different manner.

3.8 Adapting a predictor to a new relation

One important use of trained predictors is as attributes for new relations.

Suppose an online flower retailer wanted to automatically classify irises it had in its stock database using the predictor constructed above. It could do so by providing an attribute that converts instances in the database into values that conform to the predictor’s input schema and bind that attribute to the predictor.

The online flower retailer’s database has sepal and petal measurements for its irises but these are accessed through the following attribute:

GET http://flowers.com/data/irises/dimensions
----
200 OK
{
    "responseType": "attribute#description",
    "uri":          "http://flowers.com/data/irises/dimensions",
    "description":  "Petal and sepal dimensions as [ length, width ]",
    "schema":   {
        "/sepal": [ "$number", "$number" ],
        "/petal": [ "$number", "$number" ],
    },
    "subattributes": [
        "http://flowers.com/data/irises/dimensions/sepal"
        "http://flowers.com/data/irises/dimensions/petal"
    ]
}

By inspecting the above description, it is apparent that the sub-attributes can be organised into an array of four number-valued attributes which conforms to the predictor. The following binding request constructs a new attribute that predicts the species of irises in the retailer’s database:

PUT http://flowers.com/attributes/species
{
    "requestType":  "predictor#bind",
    "predictor":    "http://example.org/predictor/20111130210543",
    "binding":      [
        "http://flowers.com/data/irises/dimensions/sepal/1",
        "http://flowers.com/data/irises/dimensions/sepal/2",
        "http://flowers.com/data/irises/dimensions/petal/1",
        "http://flowers.com/data/irises/dimensions/petal/2"
    ]
}
----
200 OK
{
    "responseType": "predictor#created",
    "uri":          "http://flowers.com/attributes/species"
}

4 Future Revisions

The following will be the subject of future revisions of the PSI specification:


5 Appendix A: The PSI Schema Language

This appendix provides technical definitions of the PSI schema language and details of how it is compiled down to the JSON schema language.

5.1 Schema

A schema is a JSON value that describes the structure of other JSON values. Specifically, each schema is an object composed of zero or more properties with predefined keys. Each of a schema’s properties defines a constraint on the values that are valid for that schema. Specifically, we will say a value V is valid for a schema S if V satisfies all the constraints represented by properties in S.

An important feature of schema is that, since schema are also JSON values, their structure can be described and validated by a “meta-schema” that is handled in exactly the same way as a normal schema.

The following constraints are a subset of those defined in by JSON Schema:10

5.1.1 Type and Range Constraints

The following constraint restricts the type of JSON value that is considered valid. The variable T denotes either a string describing a JSON type (e.g., integer, number, string, boolean, array, object), or is an array of such strings, denoting a disjunction of types.

"type": T
The value T in this type constraint can be one of the following six strings: "integer", "number", "string", "boolean", "array", or "object"; or T can be an array value containing a subset of those strings. If T is a string value, the constraint is only satisfied by values V corresponding to T (e.g., if T is ‘“integer”’ then V must be an integer, etc.). If T is a list value then the constraint is only satisfied for values corresponding to one of the type in that list (e.g., if T is ["integer", "string"] then V must be an integer or a string).

The next two constraints can be used to limit the allowable range of a numeric value (i.e., a JSON number or integer). The variable N denotes a value that is a number or an integer.

"minimum": N

This minimum value constraint is satisfied by any numeric value greater than or equal to N.

"maximum": N

This maximum value constraint is satisfied by any numeric value less than or equal to N.

The next constraint is only satisfied by a finite number of specific values. The variable A denotes the array of values that can be matched.

"enum": A
This enumeration constraint is satisfied by any value V that appears in the array A.

5.1.2 Array Constraints

The following constraints restrict the size and contents of a JSON array. The variable A denotes an array of schema; S denotes a schema; and N denotes an integer value.

"minItems": N

This minumum items constraint is satisfied by any array value with at least N items.

"maxItems": N

This maxmimum items constraint is satisfied by any array value with at most N items.

"allItems": S

This all items constraint is satisfied by any array value whose items are all valid for the schema S.

"items": A

This items constraint is satisfied by any array value V with the same number of items as the array A and each item in V is valid for the schema in A at the corresponding position.

5.1.3 Object Constraints

In the following constraint definitions the variable K denotes a keyword that can be JSON strings except those with prefix / or ? or suffix = or *; the variable S denotes some arbitrary schema; and C denotes any JSON value.

"/K": S

This mandatory property constraint is satisfied by any value V that is an object containing a key K with associated value A such that A is valid for the schema S.

"?K": S

This optional property constraint is satisfied by any value V that is an object and does not contain the key K. It is also satisfied by any value V that is an object with key K provided the its associated value A such that A is valid for the schema S.

"/K=": C

This mandatory property value constraint is satisfied by any value V that is an object containing the key K with associated value equal to C.

"?K=": C

This optional property value constraint is satisfied by any value V that is an object and does not containing the key K. It is also satisfied by any value V with key K provided its associated value equal to C.

"/*": S

This additional properties constraint will only validate a value V that is an object such that each property in V that is not matched by another constraint has a value that is valid for the schema S.

5.1.4 Locally Defined Schema

In order to avoid long schema having to be written out twice, the PSI schema language allows names to temporarily associated with schema in the resolution and compilation processes described below. A locally defined schema is indicated by a property key that begins with the # character.

"#K": S
This property associates a name K with the schema S. References of the form "$K" elsewhere in the same object as the one containing the property are resolved to the schema S.

5.1.5 Schema Composition

A collection of schema can be structurally composed through arrays and objects to define new schema. These composition operators are required to define the schema of structured attributes, described in Section 2.4 above.

If S1, …, Sn denote n schema then their array composition is the schema A = { "type": "array", "items": [ S1,,Sn] }. The new schema A describes array values of size k where each of its items is valid for the corresponding schema in the "items" field of A.

If K1, …, Kn are strings representing keys then the object composition of the schema S1, …, Sk with those strings is the schema O = { "/K1":S1,, "/Kn":Sn }. The new schema O describes object values that must contain the keys K1, …, Kn and the values corresponding to each of these keys must be valid for S1, …, Sn respectively.

For example, if S = { "/age": "$integer" } and S’ = "$boolean" are schema then their array composition is the schema { "type": "array"; "items": [ { "/age": "$integer" }, "$boolean" ] } which validates values such as the array [ { "age": 12 } , true ].

Similarly, the object composition of S and S’ using the keys K = "stats" and K’ = "alive" results in the following schema { "/stats": { "/age": "$integer" }, "/alive": "$boolean" } which validates values such as the object { "stats": { "age": 321 }, "alive": false }.

5.2 Schema References and Resolution

The $ in the above example denotes a reference and indicates that part of the schema being described is found elsewhere. A reference is always denoted by a string value beginning with $, and every string in a schema beginning with $ is treated as a reference. The interpretation and resolution of a reference within a PSI schema is determined by its address and its parameters.

The string after the $ is the reference’s address. If a reference’s address is a URI (i.e., begins with http: or some other URI scheme) then the address is said to global, otherwise it is local. If a reference appears as a key in an object then it is a parameterised reference and the value associated with that key is the reference’s parameters. Reference parameters are always bundled as a JSON object value. References that appear elsewhere in a schema are non-parameterised references.

Reference resolution is performed as part of the compilation process described in the next section. The compilation process manages a resolution context which is a map between local addresses and global addresses or schema. This context is always pre-populated with the names given in Appendix B. Once a local address has been resolved to a schema its resolution is complete. If a local address is resolved to a global address it is subsequently resolved like a global address. Global addresses are always resolved via GET requests to the address’s URI.

Reference parameters are converted to a HTTP query as follows. Each property of the form "key": value in the object holding the reference parameters is converted into a query fragment key=evalue where evalue is a URL encoded version of value. The query fragments are joined with & characters to form the final query string used in the GET request.

5.3 Compilation to JSON Schema

The PSI schema language can be “compiled” to JSON schema in the sense that each of the above constraints can be converted into one or more JSON schema properties that express the same constraints.

The following procedure describes how a PSI schema S can be translated into its corresponding JSON schema S’. The resolution context C is assumed to be initialised with all of the pre-defined schema given in Appendix B.

Compile(S,C):

5.4 Schema Validation

PSI schema validation is performed by first compiling the schema into JSON schema and then using JSON schema validation semantics and implementations to carry out the validation. See the JSON Schema specification for details.11

6 Appendix B: Predefined Schema

This appendix contains definitions of predefined schema that a PSI service must provide. Some schema are templates that accept, via GET, values for those parts that are variable. If a schema property’s value is not provided, that property is absent from the schema object returned in a GET request. As described in Section 2.2 on schema resources, template arguments are preceded by the percent symbol (%).

All pre-defined schema resolve to a global address with relative URI /schema/{name} where {name} denotes the name of the pre-defined schema. For example, $integer resolves to /schema/integer. For a PSI service with base URI at http://example.org this relative URI would resolve to http://example.org/schema/integer.

Each pre-defined schema is listed below with a brief description of its purpose and the template that is returned via a GET request.

6.1 Value Schema

$integer

The integer schema only validates integer values. Its template can take arguments min to specify an inclusive minimum value, max to specify an inclusive maximum value, and default to specify a default value.

GET /schema/integer?template=true
----
200 OK
{ 
    "type": "integer",
    "minimum": "%min",
    "maximum": "%max",
    "default": "%default"
}
$number

The number schema only validates number values. Its template can take arguments min to specify an inclusive minimum value, max to specify an inclusive maximum value, and default to specify a default value.

GET /schema/number?template=true
----
200 OK
{ 
    "type": "number",
    "minimum": "%min",
    "maximum": "%max",
    "default": "%default"
}
$boolean

The boolean schema only validates boolean values. Its template can take the argument default to specify a default value.

GET /schema/boolean?template=true
----
200 OK
{ 
    "type": "boolean",
    "default": "%default"
}
$string

The string schema only validates string values. Its template can take the argument default to specify a default value.

GET /schema/string?template=true
----
200 OK
{ 
    "type": "string",
    "default": "%default"
}
$object

The string schema only validates object values. Its template can take the argument default to specify a default value.

GET /schema/object?template=true
----
200 OK
{ 
    "type": "object",
    "default": "%default"
}
$array

The string schema only validates array values. Its template can take the arguments: items to specify a schema that all items in the array must match, or an array of schema in which case matching values are described as in Appendix A. The size argument specifies the number of items a match array value must contain.

GET /schema/array?template=true
----
200 OK
{   
    "type":         "array",
    "items":        %items,
    "minItems":     %size,
    "maxItems":     %size
}
$atomicValue

This schema only validates atomic values, that is, values that are integer, number, boolean, or string.

GET http://example.org/schema/atomicValue
----
200 OK
{
    "type": [ "integer", "number", "boolean", "string" ]
}

6.2 Resource Schema

$relation

This schema defines the structure of relation description response.

GET /schema/relation
----
200 OK
{
    "/responseType=": "relation#description",
    "/uri":           "$string",
    "?description":   "$string",
    "?size":          "$integer",
    "?schema":        "$object"
}
$attribute

A schema that is only valid for attribute description responses.

GET /schema/attribute
----
200 OK
{
    "/responseType=":   "attribute#description",
    "/uri":             "$string",
    "/schema":          "$object",
    "?description":     "$string",
    "?provenance":      { "type": [ "string", "object" ] },
    "?relation":        "$string",
    "?subattributes":   { "$array": { "allItems": "$string" } }
}
$arrayAttribute

A schema that only validates attributes with schema that define array values with items that match the schema passed in through the allItems argument.

GET /schema/arrayAttribute?template=true
----
200 OK
{
    "/responseType=":   "attribute#description",
    "/uri":             "$string",
    "/schema": {
        "/type=": "array",
        "/items": { "$array": { "items": %allItems } }
    }
    "?description":     "$string",
    "?provenance":      { "type": [ "string", "object" ] },
    "?relation":        "$string",
    "?subattributes":   { "$array": { "allItems": "$string" } }
}
$fixedAttribute

A schema that only validates attributes with schema that only validate values from a fixed set of alternatives. This set of alternatives is provided via the values argument.

GET /schema/fixedAttribute?template=true
----
200 OK
{
    "/responseType=":   "attribute#description",
    "/uri":             "$string",
    "/schema": {
        "/items": { 
            "/enum=":   "%values"
    }
    "?description":     "$string",
    "?provenance":      { "type": [ "string", "object" ] },
    "?relation":        "$string",
    "?subattributes":   { "$array": { "allItems": "$string" } }
}
$nominalAttribute

A schema that only validates attributes with schema that define
values that come from a fixed set of alternatives which are not specified in advance. The allItems argument can be used to specify what type of nomimal values the attribute can return.

GET /schema/nominalAttribute?template=true
----
200 OK
{
    "/responseType=":   "attribute#description",
    "/uri":             "$string",
    "/schema": {
        "/items": { 
            "/enum": { "$array": { "allItems": "%allItems" } } } }
    }
    "?description":     "$string",
    "?provenance":      { "type": [ "string", "object" ] },
    "?relation":        "$string",
    "?subattributes":   { "$array": { "allItems": "$string" } }
}
$atomicAttribute

A schema that only validates attributes with schema that only validate for atomic values.

GET /schema/atomicAttribute
----
200 OK
{
    "/responseType=":   "attribute#description",
    "/uri":             "$string",
    "/schema": {
        /type": { "enum": [ "integer", "number", "boolean", "string" ] }
    }
    "?description":     "$string",
    "?provenance":      { "type": [ "string", "object" ] },
    "?relation":        "$string",
    "?subattributes":   { "$array": { "allItems": "$string" } }
}

7 References


  1. Richardson, Leonard; Sam Ruby (May 2007). RESTful Web Services. O’Reilly. ISBN 0–596–52926–0.

  2. HTTP - Hypertext Transfer Protocol http://www.w3.org/Protocols/.

  3. Introducing JSON. http://www.json.org/.

  4. http://en.wikipedia.org/wiki/Iris_flower_data_set

  5. Richardson, Leonard; Sam Ruby (May 2007). RESTful Web Services. O’Reilly. ISBN 0–596–52926–0.

  6. Introducing JSON. http://www.json.org/.

  7. A JSON Media Type for Describing the Structure and Meaning of JSON Documents. IETF Draft 03, 2010. http://tools.ietf.org/html/draft-zyp-json-schema-03

  8. http://tools.ietf.org/html/draft-gregorio-uritemplate-04

  9. Uniform Resource Identifier (URI): Generic Syntax http://tools.ietf.org/html/rfc3986.

  10. A JSON Media Type for Describing the Structure and Meaning of JSON Documents. IETF Draft 03, 2010. http://tools.ietf.org/html/draft-zyp-json-schema-03

  11. A JSON Media Type for Describing the Structure and Meaning of JSON Documents. IETF Draft 03, 2010. http://tools.ietf.org/html/draft-zyp-json-schema-03