Store and Share data

From SD4H wiki
Jump to navigation Jump to search

Intro

SD4H has a large Object Store. Objects are accessible through a web API via the Rados Gateway (radosgw) service. Both the S3 and swift API standards are supported by the radosgw.

In an Object Store, an object is the equivalent of a file on a posix file system. The object store gives users a lot of flexibility, but the steps to do simple tasks like sharing and transferring data involve somewhat of a learning cure. We propose a procedure here so this curve is as gentle as possible. Once done, the procedure will be both more secure and more flexible than sharing data on a share HPC platform or on some VM owned by your group.

Configuring S3 access

  1. You first need to have your OpenStack Client installed and configured.
  2. Then, with the client you generate an e2c/S3 id and secret

With the client is installed and the RC files downloaded in step 1 you can create the S3 ID and secret.

$ source  $HOME/id/myproject-openrc.sh
Please enter your OpenStack Password for project po-test as user poq: 
# Use the same password that you used to connect to the [https://juno.calculquebec.ca/ Juno web page].
# you can now create the credentials
$ openstack ec2 credentials create
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field      | Value                                                                                                                                                                       |
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| access     | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX                                                                                                                                            |
| links      | {'self': 'https://juno.calculquebec.ca:5000/v3/users/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/credentials/OS-EC2/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'} |
| project_id | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                                                                                            |
| secret     | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                                                                                            |
| trust_id   | None                                                                                                                                                                        |
| user_id    | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                                                            |
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The important values here are access and secret which are the S3 [aws_]access_key_id and [aws_]secret_access_key respextively. AWS stands for Amazon Web Services, they are the creator of the S3 API.


Manage you S3 buckets with Globus

See the Globus documentation

Use a S3 client to manage your bucket

There are a few clients that can be used to access Ceph S3 api. We recomend rclone, it is fast and and flexible.