Metadaten zur Dokumentation

Zuständigkeiten
Product Owner 
Entwickler/innen
Versionsnummer (Software)  

 

Inhalt

Vorhandene Dokumentation - bitte beim Erstellen neuer Dokumentationsteile zum Binary Service einarbeiten: https://wiki.deutsche-digitale-bibliothek.de/x/iYth

 

WORK IN PROGRESS

 

Introduction

Based on the concept discussed in early 2018 the Binary Service is meant to handle all kinds of binaries provided during a data-ingest process as it occurs in the DDB. Binaries are submitted to this service by passing a URL to the service from where the binary can be retrieved. After retrieval the Binaries Service will store the binaries and provide them to the user on demand. A special focus lies on the handling of images. These are not only retrieved and provided on demand, they are also scaled to 4 predefined resolution which can be accessed using a IIIF image url syntax

Components

The Binary Service is made up of two components: The first component is responsible for retrieving the binaries from the provided URL, scaling images and saving the data to the database. The second component is the delivery component which is targeted at the user and delivery data stored in the database.

Technology

The Binary Service stores its data in a Cassandra Database. Cassandra is a cluster-based distributed database capable of safely storing large amounts of data.

The ingest component is a standalone Java application utilizing Spark to process data on the cluster. 

The delivery component is a simple Java servlet. 

Implementation Details

Ingest component

Sources: https://dev.fiz-karlsruhe.de/stash/projects/DDB/repos/ddb-administration-binaries-service

The ingest component as a standalone Java application meant to run on a cluster node. It offers an API endpoint to submit a URL together with a context saved along with the URL. The endpoint is non-blocking and immediately returns a reference which can later be used to access the binary and all of its derivatives. Once submitted the URL is saved in the database together with the context and the generated reference. It is also enqueued for processing. As soon as there are enough resources on the cluster, the URL will be processed, meaning the following: 

  • Has this URL been processed before: if not, the download file. 
  • If yes, the check Last-modified header and E-Tag provided 

Java Package description 

 

Delivery component

 

Cassandra table structure

 

API Documentation

 

Side project: Local Binary Provider

 

Deployments