Skip to content Skip to sidebar Skip to footer

S3 Boto List Keys Sometimes Returns Directory Key

I've noticed a difference between the returns from boto's api depending on the bucket location. I have the following code: con = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_K

Solution 1:

Thanks to Steffen, who suggested looking at how the keys are created. With further investigation I think I've got a handle on whats happening here. My original suposition that it was linked to the bucket region was a red herring. It appears to be due to what the management console does when you manipulate keys.

If you create a directory in the management console it creates a 0 byte key. This will be returned when you perform a list.

If you use boto to create/upload a file then it doesn't create the folder. Interestingly, if you delete the file from within the folder (from the AWS console) then a key is created for the folder that used to contain the key. If you then upload the bey again using boto, then you have exactly the same looking structure from the UI, but infact you have a spurious additional key for the directory. This is what was happening to me, as I was testing our application I was clearing out keys and then finding different results.

Worth knowing this happens. There is no indicator in the UI to show if a folder is a created one (one that will be returned as a key) or an interpreted one (based on a keys name).

Solution 2:

I don't have a definite answer for your question, but can throw in some partial ones at least:

Background

Directory/Folder simulation

Amazon S3 doesn't actually have a native concept of folders/directories, rather is a flat storage architecture comprised of buckets and objects/keys only - the directory style presentation seen in most tools for S3 (including the AWS Management Console itself) is based solely on convention, i.e. simulating a hierarchy for objects with identical prefixes - see my answer to How to specify an object expiration prefix that doesn't match the directory? for more details on this architecture, including quotes/references from the AWS documentation.

API differences per region

I noticed there is a different policy for naming buckets in Ireland, do different locals have their own version of the api's?

That's apparently the case indeed for Amazon S3 specifically, which is one of their oldest offerings, see e.g. Bucket Restrictions and Limitations:

In all regions except for the US Standard region, You must use the following guidelines when naming a bucket. [...] [emphasis mine]

These specifics for the US Standard region are seen in other places of the S3 documentation as well, and US Standard is an unusual construct itself compared to the otherwise clearly geographically constrained Regions:

US Standard — Uses Amazon S3 servers in the United States

This is the default Region. The US Standard Region automatically routes requests to facilities in Northern Virginia or the Pacific Northwest using network maps. To use this region, select US Standard as the region when creating a bucket in the console. The US Standard Region provides eventual consistency for all requests. [emphasis mine]

This implicit CDN behavior is unique for this default Region of S3 (i.e. US Standard) and not seen elsewhere on any other AWS service I think.

Likely Cause

I have a faint memory of S3 actually placing a zero byte object/key into a bucket for the simulated directory/folder in more recent regions (i.e. all but US Standard), whereas the legacy solution for the US Standard region might be different, for example simply based on the established naming convention for directory separation by / and omitting a dedicated object/key for this altogether.

Solution

If the analysis is correct, there is nothing you can do but maintain separate code paths for both cases, I'm afraid

Good luck!

Solution 3:

I've had the same problem. As a work around you can filter out all the keys with a trailing '/' to eliminate the 'directory' entries.

def files(keys):
    return (key for key in keys if not key.name.endswith('/'))

s3 = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = s3.get_bucket(S3_BUCKET_NAME)
keys = bucket.list(path)
for key in files(keys):
    print(key)

Solution 4:

I'm using the fact that a "Folder" has no "." in its path. A file does. media/images will not be deleted media/images/sample.jpg will be deleted

e.g. clean bucket files

defdelete_all_bucket_files(self,bucket_name):
        bucket = self.get_bucket(bucket_name)
        if bucket:
            for key in bucket.list():
                #delete only the files, not the foldersif period_char in key.name:
                    print'deleting: ' + key.name
                    key.delete()

Solution 5:

you could use the size parameter to exclude the prefix:

forkeyin keys: 
  ifkey.size > 0:
  print key

Post a Comment for "S3 Boto List Keys Sometimes Returns Directory Key"