Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.

Tuesday, June 24, 2014

Quick Primer on Python Shelve

Python shelve is a convenient means of storing a Python data object to disk for later use. The feature behaves similarly to a Python dictionary object and uses a lot of the same syntax.  Each object that is shelved has a key which is associated with the object so it can be quickly and efficiently accessed from disk. A shelved object must be something that can be pickled with the pickle package, essentially making shelves an easy way to organize and store pickled objects. "Pickling" is the process of making a Python object into a byte stream, and the inverse, "unpickling" is where the byte stream is restored back into the original Python object. Pickling is synonymous with serialization, marshalling, and flattening of data. Example Python objects that can be pickled include: integers, lists, tupules, sets, dictionaries, and classes. In addition to saving items to disk, shelves allow for quick access to portions of large data objects and can store them in binary format by specifying a protocol greater than 0.  Here are some simple Python scripts that hopefully serve as a useful tutorial to learn about shelves and their syntax.  The example focuses on saving dictionaries into a shelve, but it can be easily extended into other objects as well.

import shelve
# Example dictionaries
dict1={"a":1,"b":2,"c":3,"d":4,"e":5}
dict2={"f":6,"g":7,"h":8,"i":9,"j":10}
# Create example shelf
out_shelf=shelve.open("shelf_filename.db", flag="c", protocol=2)
# flags:
# c=create new shelf; this can't overwrite an old one, so delete the old one first
# r=read
# w=write; you can append to an old shelf
# protocol:
# 0=original ASCII protocol
# 1=old binary format
# 2=new, more efficient format
# Add dictionaries to shelf
out_shelf["dict1"]=dict1
out_shelf["dict2"]=dict2
# Close shelf
out_shelf.close()
# Read in shelf
in_shelf=shelve.open("shelf_filename.db", flag="r")
# Count number of shelf keys
len(in_shelf)
# Lookup shelf keys
in_shelf.keys()
# Access dictionary directly from shelf
in_shelf["dict1"]["a"] # 1
in_shelf["dict2"]["j"] # 10
# Copy saved dictionaries to memory
dict1=in_shelf["dict1"]
dict2=in_shelf["dict2"]
# Alternatively, dictionary keys can be saved as shelf keys
dict_shelf=shelve.open("shelf2_filename.db", flag="c", protocol=2)
dict_shelf["dict1"]=dict1
dict_shelf.update(dict1)
del dict_shelf["dict1"] # Remove dict1 key
dict_shelf.keys() # ['b', 'd', 'a', 'c', 'e']
dict_shelf["a"] # 1
dict_shelf["b"] # 2
dict_shelf["c"] # 3
dict_shelf["d"] # 4
dict_shelf["e"] # 5
dict_shelf.close()
# Similarly, shelves can be created in the same way keys are added to dictionaries
dict2_shelf=shelve.open("shelf3_filename.db", flag="c", protocol=2)
count=1
for i in ["a","b","c","d","e"]:
dict2_shelf[i]=count
count+=1
dict2_shelf # {'a': 1, 'c': 3, 'b': 2, 'e': 5, 'd': 4}
dict2_shelf.close()
view raw shelves.py hosted with ❤ by GitHub

1 comment:

  1. New to python and have been struggling with Shelves and using them. When I found this script it sounded like what was needed, more to the point not overwriting the existing data from one run to another. This script is above my knowledge base which I am attempting to improve on but for now can you detail what this script does and how, Would like to implement using input() and append, etc. on my script?

    ReplyDelete