Generating a product label
The primer
In this tutorial we will write a small processor which uses a simple PDS4 "type template" to generate data product labels.
Our processor has been tasked with adding some missing label attributes1 to a raw image product generated by a telemetry processor. Using Passthrough (PT), we can define a blueprint for how our modified output product should look - its type template - and use this to instantiate a "partial label" for the processor to populate.
From here on we will assume that you have some familiarity with PDS4 labels, XML and Python, but the finer points will be explained as we go along.
The input
We have been provided with two sample input products which include img:filter_number
-
the number of the active filter as reported by our instrument. Our processor will be
adding in the corresponding filter name, ID, bandwidth and centre wavelength attributes.
As we are here only interested in a very small part of the input product's label
(and PDS4 labels tend to get very long!), we will cut to the chase and hide the
uninteresting bits behind <!-- ... -->
comments. Please use your imagination to
fill in the blanks.
The sample input products differ only in the filter number they report; their structures are:
<Product_Observational xmlns="http://pds.nasa.gov/pds4/pds/v1"
xmlns:img="http://pds.nasa.gov/pds4/img/v1">
<!-- ... -->
<Observation_Area>
<!-- ... -->
<img:Imaging>
<!-- ... -->
<img:Optical_Filter>
<img:filter_number>2</img:filter_number>
</img:Optical_Filter>
<!-- ... -->
</img:Imaging>
<!-- ... -->
</Observation_Area>
<!-- ... -->
</Product_Observational>
<Product_Observational xmlns="http://pds.nasa.gov/pds4/pds/v1"
xmlns:img="http://pds.nasa.gov/pds4/img/v1">
<!-- ... -->
<Observation_Area>
<!-- ... -->
<img:Imaging>
<!-- ... -->
<img:Optical_Filter>
<img:filter_number>0</img:filter_number>
</img:Optical_Filter>
<!-- ... -->
</img:Imaging>
<!-- ... -->
</Observation_Area>
<!-- ... -->
</Product_Observational>
The template
Passthrough language (PTL for short) type templates look a lot like regular PDS4
labels, but with some extra XML markup thrown in. This markup takes the form of PT
"properties" - XML attributes such as pt:fetch
that direct Passthrough in how a
template should be processed. The values of these XML attributes are (with the exception
of pt:sources
) XPath expressions, which most prominently are used to imbue templates
with conditional logic.
To achieve our goals we have come up with the following template:
<Product_Observational xmlns="http://pds.nasa.gov/pds4/pds/v1"
xmlns:img="http://pds.nasa.gov/pds4/img/v1"
xmlns:pt="https://github.com/ExoMars-PanCam/passthrough">
<!-- ... -->
<Observation_Area>
<!-- ... -->
<img:Imaging>
<!-- ... -->
<img:Optical_Filter pt:sources="input">
<img:filter_number pt:fetch="true()"/>
<img:filter_name/>
<img:filter_id pt:required="//img:Optical_Filter/img:filter_number != '0'"/>
<img:bandwidth unit="nm" pt:required="//img:Optical_Filter/img:filter_number != '0'"/>
<img:center_filter_wavelength unit="nm" pt:required="//img:Optical_Filter/img:filter_number != '0'"/>
</img:Optical_Filter>
<!-- ... -->
</img:Imaging>
<!-- ... -->
</Observation_Area>
<!-- ... -->
</Product_Observational>
Attribute inheritance
A key trait of PTL is the ability to pass through attribute values from an input
product to the output product. Above, we are declaring the fetch
property on
img:filter_number
, indicating that we want PT to retrieve its value from a source
label. The source label in question is declared by the parent class' sources
property
to be the one given the nickname "input" - a moniker that our processor will associate
with the input product during processing.
Optional attributes
Following on we have added in the img:filter_name
attribute, and the absence of any
markup here indicates that this attribute's value is expected to be populated by our
processor.
The last three attributes also should be filled in by our processor, but there's a catch: knowledge of our imager's operating modes tells us that it might report the filter number as 0 if a human has configured it incorrectly. In this case, our processor should just set the filter name to "UNKNOWN" and leave the remaining attributes out.
We make this condition known in the template by declaring the required
property,
its value determined by an XPath expression which checks whether the img:filter_number
of the "input" source product is equal to 0. If it is, The img:filter_id
,
img:bandwidth
and img:center_filter_wavelength
attributes can be omitted from the
output product2.
The processor
Over in Python-land, our processor will make use of Passthrough's Template
handler
class to pre-process the template into a partial label that can be populated.
from passthrough import Template
def process(input_path, output_name):
sources = {"input": input_path}
partial = Template("./template.xml", sources, keep_template_comments=True)
# Determine the current product's filter number
filter_number = partial.label.find(
"//img:Optical_Filter/img:filter_number", partial.nsmap
)
filter_number = int(filter_number.text)
# Define the attribute values to populate for the range of filter numbers we expect
filter_attributes = {
"filter_name": [
"UNKNOWN",
"Broadband Red",
"Broadband Green",
"Broadband Blue",
],
"filter_id": [None, "C01", "C02", "C03"],
"bandwidth": [None, "100", "80", "120"],
"center_filter_wavelength": [None, "640", "540", "440"],
}
# Populate our attributes (but only if we actually have values for them)
for attr_name, values in filter_attributes.items():
value = values[filter_number]
if value is None:
continue
attr = partial.label.find(
f"//img:Optical_Filter/img:{attr_name}", partial.nsmap
)
attr.text = value
# Write the completed label to disk
partial.export("./", output_name)
if __name__ == "__main__":
queue = {
"./sample_input_1.xml": "result_1.xml",
"./sample_input_2.xml": "result_2.xml",
}
for input_path, output_name in queue.items():
process(input_path, output_name)
After importing the Template
class, we make sure to associate the "input" moniker for
pt:sources
with the sample input product (specifically its path), before instantiating
the partial
label.
With the partial
label created we can read out its filter_number
using an XPath
query. This will allow us to select the correct values for the attributes we want to
populate.
After defining the range of values for our attributes (which in a more realistic scenario might have involved loading a dedicated calibration product), we use a loop to populate them, taking care to omit this step if we don't have a sensible value to populate them with.
Product generation flow
From the layout of the processor we can surmise that the product generation flow with passthrough follows three broad steps:
- gather the input product(s) and pre-process the template into a partial label object
- populate the remaining label attributes
- export the completed label, allowing PT to prune any unpopulated optional attributes and run its consistency checks.
The results
When we run our processor, we are presented with two resultant product labels, each corresponding to one of the sample input product labels:
<?xml version='1.0' encoding='UTF-8'?>
<Product_Observational xmlns="http://pds.nasa.gov/pds4/pds/v1" xmlns:img="http://pds.nasa.gov/pds4/img/v1">
<!-- ... -->
<Observation_Area>
<!-- ... -->
<img:Imaging>
<!-- ... -->
<img:Optical_Filter>
<img:filter_number>2</img:filter_number>
<img:filter_name>Broadband Green</img:filter_name>
<img:filter_id>C02</img:filter_id>
<img:bandwidth unit="nm">80</img:bandwidth>
<img:center_filter_wavelength unit="nm">540</img:center_filter_wavelength>
</img:Optical_Filter>
<!-- ... -->
</img:Imaging>
<!-- ... -->
</Observation_Area>
<!-- ... -->
</Product_Observational>
<?xml version='1.0' encoding='UTF-8'?>
<Product_Observational xmlns="http://pds.nasa.gov/pds4/pds/v1" xmlns:img="http://pds.nasa.gov/pds4/img/v1">
<!-- ... -->
<Observation_Area>
<!-- ... -->
<img:Imaging>
<!-- ... -->
<img:Optical_Filter>
<img:filter_number>0</img:filter_number>
<img:filter_name>UNKNOWN</img:filter_name>
</img:Optical_Filter>
<!-- ... -->
</img:Imaging>
<!-- ... -->
</Observation_Area>
<!-- ... -->
</Product_Observational>
As intended, we see that result_2.xml
, which was created with the 0-filter
sample_product_2.xml
as input, omits the ID, bandwidth and centre wavelength
attributes as we intended. Success!
The conclusion
The scenario we have been working through in this tutorial of course only scratches the
surface of what PT and PTL can do. Type templates can grapple with multiple simultaneous
input products, automatically fill in attributes using XPath extension functions, and
instantiate and manage blank payload data structures from the template's File_Area_*
.
But the usage pattern of working with Passthrough's Template
class that we have
established remains largely the same.
In the next section we will look at where you can go from here to learn more about the individual components of Passthrough.
-
In this documentation, the terms attribute and class always refer to PDS4 attribute and PDS4 class, respectively. In other words, XML elements. When there is a need to refer to XML attributes, this is spelled out. ↩
-
By default, all elements in a template are assumed to be required, i.e. present in the output product, unless explicitly declared otherwise. This follows the principle that a type template should directly reflect the structure of the product type it defines. The goal is to avoid surprises for users, and allow templates to act as formal definitions of product types and further as the interfaces between processors of a project's product pipeline. ↩