How to add new data-service¶
DAS has pluggable architecture, so adding a new CMS data-service should be a relatively easy procedure. Here we discuss two different ways to add a new service into DAS.
Plug and play interface¶
This work is in progress.
A new data-service can register with DAS by providing a file describing the interface and available APIs. This configuration includes the data-service URL, the data format provided, an optional expiration timestamp for the data, the API name, necessary parameters and optional mapping onto DAS keys.
A new DAS interface will allow this information to be added via a simple configuration file. The data-service configuration files should be presented in [YAML] data-format.
An example configuration follows [1]:
# SiteDB API mapping to DAS
system : sitedb
format : JSON
# API record
---
# URI description
urn : CMSNametoAdmins
url : "https://a.b.com/sitedb/api"
params : {'name':''}
expire : 3600 # optional DAS uses internal default value
# DAS keys mapping defines mapping between query names, e.g. run,
# and its actual representation in DAS record, e.g. run.number
daskeys : [
{'key':'site', 'map':'site.name', 'pattern':''},
{'key':'admin', 'map':'email', 'pattern':''}
]
# DAS search keys to API input parameter mapping
das2api : [
{'das_key':'site', 'api_param':'se', 'pattern':""}
]
---
# next API
---
# APIs notation mapping maps data-service output into
# DAS syntax, e.g
# {'site_name':'abc'} ==> {'site':{'name':'abc'}}
notation : [
{'notation':'site_name', 'map': 'site.name', 'api': ''}
]
The syntax consists of key:value pairs, where value can be in a form of string, list or dictionary. Hash sign (#) defines a comment, the three dashes (—) defines the record separator. Each record starts with definition of system and data format provided by data-service.
# comment
system: my_system_name
format: XML
Those definitions will be applied to each API defined later in a map file. The API section followed after the record separator and should define: urn, url, expire, params and daskeys.
# API section
---
urn: api_alias
url: "http://a.b.com/method"
expire: 3600 # in seconds
params: {} # dictionary of data-service input parameters
daskeys: [{}, {}] # list of dictionaries for DAS key maps
- the urn is the API name or identifier (any name different from the API name itself) and used solely inside of DAS
- the url defines the data-service URL
- the params are data-service input parameters
- the daskeys is a list of maps between data-service input parameters and DAS internal key representation. For instance when we say site we might mean site CMS name or site SE/CE name. So the DAS key will be site while DAS internal key representation may be site.name or site.sename. So, each entry in daskeys list is defined as the following dictionary: {‘key’:value, ‘map’:value, ‘pattern’:’‘}, where pattern is a regular expression which can be used to differentiate between different arguments where they have different structures.
- the (optional) das2api map defines mapping between DAS internal key and data-service input parameter. For instance, site.name DAS key can be mapping into _name_ data-service input parameter.
The next API record can be followed by the next record separator, e.g.
---
# API record 1
urn: api_alias1
url: "http://a.b.com/method1"
expire: 3600 # in seconds
params: {} # dictionary of data-service input parameters
daskeys: [{}, {}] # list of dictionaries for DAS key maps
---
# API record 2
urn: api_alias2
url: "http://a.b.com/method2"
expire: 1800 # in seconds
params: {} # dictionary of data-service input parameters
daskeys: [{}, {}] # list of dictionaries for DAS key maps
At the end of DAS map there is an optional notation mapping, which defines data-service output mapping back into DAS internal key representation (including converting from flat to hierarchical structures if necessary).
---
# APIs notation mapping maps data-service output into
# DAS syntax, e.g
# {'site_name':'abc'} ==> {'site':{'name':'abc'}}
notation : [
{'notation':'site_name', 'map': 'site.name', 'api': ''}
]
For instance, if your data service returns runNumber and in DAS we use run_number you’ll define this mapping in notation section.
To summarize, the YAML map file provides
- system name
- underlying data format used by this service for its meta-data
- the list of apis records, each record contains the following:
- urn name, DAS will use it as API name
- url of data-service
- expiration timestamp (how long its data can live in DAS)
- input parameters, provide a dictionary
- list of daskeys, where each key contains its name key, the mapping within a DAS record, map, and appropriate pattern
- list of API to DAS notations (if any); different API can yield data in different notations, for instance, siteName and site_name. To accommodate these syntactic differences we use this mapping.
- notation mapping between data-service provider output and DAS
Footnotes
[1] | This example demonstrates flexibility of YAML data-format and shows different representation styles. |
Add new service via API¶
You can manually add new service by extending
DAS.services.abstract_service.DASAbstractService
and
overriding the api method.
To do so we need to create a new class
inherited from DAS.services.abstract_service.DASAbstractService
.
class MyDataService(DASAbstractService):
"""
Helper class to provide access to MyData service
"""
def __init__(self, config):
DASAbstractService.__init__(self, 'mydata', config)
self.map = self.dasmapping.servicemap(self.name)
map_validator(self.map)
optionally the class can override .. function:: def api(self, query)
method of DAS.services.abstract_service.DASAbstractService
Here is an example of such an implementation
def api(self, query):
"""My API implementation"""
api = self.map.keys()[0] # get API from internal map
url = self.map[api]['url']
expire = self.map[api]['expire']
args = dict(self.map[api]['params']) # get args from internal map
time0 = time.time()
dasrows = function(url, args) # get data and convert to DAS records
ctime = time.time() - time0
self.write_to_cache(query, expire, url, api, args, dasrows, ctime)
The hypothetical function call should contact the data-service and fetch, parse and yield data. Please note that we encourage the use of python generators [Gen] in function implementations.