Share Object Store Data

Revision as of 17:25, 7 October 2025 by Poq (talk | contribs) (Created page with "We recommend using Globus to share data Store in our Object Store. But you can also use bucket policies to share data with groups that are also tenants on our platform. = Share data with Bucket Policies = == Create the right policy == Note that we are only documenting the use of the S3 API, but note that setting will also be [https://docs.ceph.com/en/latest/radosgw/bucketpolicy/#swift transfer to the swift API]. We will show you you can create a [https://docs.c...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

We recommend using Globus to share data Store in our Object Store. But you can also use bucket policies to share data with groups that are also tenants on our platform.


Share data with Bucket Policies

Create the right policy

Note that we are only documenting the use of the S3 API, but note that setting will also be transfer to the swift API.

We will show you you can create a bucket policy to share selected objects from that bucket to other tenants of the platform.

First ask the other group what their project number on the Juno is. It is a 33 digit hexadecimal number located here. We will denote that number as <Remote-Project>. Here is a policy.jon example file that can share the content of bucket-to-share with all members of <Remote-Project>. Note that you cannot share the bucket with a specific member of that project; you share it with the project as a whole.


{
    "Version": "2012-10-17",
    "Id": "S3PolicyId1",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {"AWS": ["arn:aws:iam::<Remote-Project>:user/<Remote-Project>"]},
            "Action": [
              "s3:ListBucket",
              "s3:GetObject"
            ],
            "Resource": [
                       "arn:aws:s3:::bucket-to-share",
                       "arn:aws:s3:::bucket-to-share/*"
            ]
        }
    ]
}

Note that you need to give access to bucket-to-share itself, so you can read information about it. You also need to give access to the objects with the glob's notation *. It also means that you can give access to a specific list of objects like this:

{
    "Version": "2012-10-17",
    "Id": "S3PolicyId1",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {"AWS": ["arn:aws:iam::<Remote-Project>:user/<Remote-Project>"]},
            "Action": [
              "s3:ListBucket",
              "s3:GetObject"
            ],
            "Resource": [
                       "arn:aws:s3:::bucket-to-share",
                       "arn:aws:s3:::bucket-to-share/prefix-*",
                       "arn:aws:s3:::bucket-to-share/specific-file.txt"
            ]
        }
    ]
}

Apply the Policy to the Bucket

You first need to create an s3 key and secret pair. Then use the client of you choosing to interact with the S3 API. We like the official aws client, since it is flexible and well documented.

aws  s3api put-bucket-policy --bucket bucket-to-share --policy file://policy.json


How can the other group can read the data now?

That part is a bit more complicated... Every tenant in our Ceph cluster lives in its own namespace, which means that unlike the AWS object store, out namespace configuration is not global. It uses multi-tenancy... to an extent. When you are accessing the object store of your own tenant, it does not make a difference if the namespace is global or not. When accessing the data from another tenant, your tenant’s id is prepended to the bucket name like this : <Remote-Project>:<bucket-name>. This means that you need to share your bucket project id with the group you want to share the data with. Lest say that your project id is <my-project>, your collaborator will be able to use the rclone client out of the box like this:


rclone ls  rclone-config-name:<my-project>:mybu/


However some clients will not accept the : in the bucket name. If you are using boto3, the official aws python s3 client, you will need to add the : character to the VALID_BUCKET list of accepted `character before using it:


import boto3
from botocore.config import Config
import botocore.handlers
import re
botocore.handlers.VALID_BUCKET = re.compile(r'^[:a-zA-Z0-9.\-_]{1,255}$')

[...]
# Specify the bucket name including the tenant
bucket_name = "tenant:bucketname"

# Example: List objects in the bucket
response = s3_client.list_objects_v2(Bucket=bucket_name)
[...]


Finally, the aws client is also built on top of the boto3 library, and changing the VALID_BUCKET value in the awscli/botocore/handlers.py installed with you aws client would also let you use that client. While it is a working hack, we would not recommend that last solution as a sustainable way of using that tool.