Metadaten zur Dokumentation
Zuständigkeiten | |
---|---|
Product Owner | ? |
Entwickler/innen | |
Versionsnummer (Software) | 1.0 |
Based on the concept discussed in early 2018 the Binary Service is meant to handle all kinds of binaries provided during a data-ingest process as it occurs in the DDB. Binaries are submitted to this service by passing a URL to the service from where the binary can be retrieved. After retrieval the service will store the binaries and provide them to the user on demand. A special focus lies on the handling images. These are not only retrieved and provided on demand, they are also scaled to predefined resolution which can be accessed using a IIIF image url syntax.
The Binary Service is made up of two components: The first component is responsible for retrieving the binaries from the provided URL, scaling images and saving the data to the database. The second component is the delivery component which is targeted at the user and delivery data stored in the database.
The Binary Service stores its data in a Cassandra Database. Cassandra is a cluster-based distributed database capable of safely storing large amounts of data.
The ingest component is a standalone Java application utilizing FIZPro and Apache Spark to process data on the cluster.
The delivery component is a simple Java servlet.
The ingest component is a standalone Java application meant to run on a cluster node.
It offers an API endpoint to submit a URL together with a context saved along with the URL. The endpoint is non-blocking and immediately returns a reference which can later be used to access the binary and all of its derivatives. Once submitted the URL is saved in the database together with the context and the generated reference. It is also enqueued for processing. As soon as there are enough resources on the cluster, the URL will be processed, meaning the following:
There are different ways of handling errors during this process, especially if the server from which the binary is supposed to be downloaded responds with an error, the URL may be re-enqueued to try the download again later.
A further API endpoint provides access to the status of the binaries assigned to a given context. The status information is minimal, consisting of only the number of queued, finished and failed binaries.
A third endpoint allows deleting all references assigned to a given context.
NOTE: Initially both components were in one project, therefore the package structure should be reworked, therefore it will not be documented here at the moment.
()
Sources: https://dev.fiz-karlsruhe.de/stash/projects/DDB/repos/ddb-administration-binaries-service
Sonar: https://dev.fiz-karlsruhe.de/sonar/dashboard?id=de.fiz-karlsruhe.binaries-service
Bamboo: https://dev.fiz-karlsruhe.de/bamboo/browse/DDB-DABS
The delivery component is a standard Java Servlet using the JAX-RS implementation Jersey to provide access to the stored binaries and their derivatives.
de.fiz.binariesservice.server:
Contains the application main classes and the two classes responsible for serving content for /binary and /image.
de.fiz.binariesservice.models:
Contains JAX-B classes for xml/json mappings.
de.fiz.binariesservice.utils:
Utility classes (should be moved to other packages with less generic names)
Sources: https://dev.fiz-karlsruhe.de/stash/projects/DDB/repos/ddb-administration-binaries-server
Sonar: https://dev.fiz-karlsruhe.de/sonar/dashboard?id=de.fiz-karlsruhe.binaries-server
Bamboo: https://dev.fiz-karlsruhe.de/bamboo/browse/DDB-DDBBS2
The table structure is documented on an own page:
Cassandra Tabellenstruktur des Binaries Services
METHOD | URL | Description | |
---|---|---|---|
GET | /binaries/{context} | Provides a basic status summary of the binaries assiciated with the given status | |
POST | /binaries | Enqueues the URL provided in the JSON in the request body for processing. Returns a reference which can be used to access the binary via the delivery component. | Example request body: { "url": "http://exqample.com/file.pdf", } |
DELETE | /binaries/reference/{reference} | Deletes the given reference | |
DELETE | /binaries/context/{context} | Deletes all references associated with this context |
|
METHOD | URL | Description | |
---|---|---|---|
HEAD | /binary/{reference} | Provides access to the metadata headers of the according GET request | |
GET | /binary/{reference} | Allows downloading the original binary, supports streaming using HTTP range requests | Downloading image binaries over this endpoint is not supported. |
GET | /image/{reference}/info.json | IIIF image info request. | Deprecated, will redirect to /image/2/{reference}/info.json |
GET | /image/2/{reference}/info.json | IIIF image info request compliant with IIIF version 2.1. | |
GET | /image/{reference}/{region}/{size}/{rotation}/default.jpg | IIIF image request. | Deprecated, will redirect to image/2/{reference}/.... |
GET | /image/2/{reference}/{region}/{size}/{rotation}/default.jpg | IIIF image request compliant with IIIF version 2.1. | Currently only a subset is supported. Supported values are: |
Local Binary Provider
There are currently two deployments of the Ingest component, one on the development cluster and one on the production cluster.
A snapshot of the current development status of the delivery component is automatically deployed to the development network by bamboo after each commit, It can be reached at https://dev-ddb.fiz-karlsruhe.de/binaries-service/.
In the production network there is a redundant deployment accessible via https://iiif.deutsche-digitale-bibliothek.de