TITLE OF PAPER: The Complete File System: File System Virtualization Using Python URL OF PRESENTATION: _URL_of_powerpoint_presentation_ PRESENTED BY: Christopher Gillett REPRESENTING: Compete, Inc. CONFERENCE: PyCon 2005 DATE: Thursday, March 24, 2005 LOCATION: GWU Cafritz Conference Center, Grand Ballroom -------------------------------------------------------------------------- REAL-TIME NOTES / ANNOTATIONS OF THE PAPER: {If you've contributed, add your name, e-mail & URL at the bottom} Compete Inc. How to drive advert, marketing, sales Analyzing lots of data Need to manage lots of files CFS Compete File System Application level (runs in user space) Uses MySQL with Python A few seconds per transaction But jobs run for hours, so it doesn't matter Still it may matter for real-time/fast applications File System Virtualization Mapping multiple and potentially disparate file systems int a monolithic view such that applications, scripts, etc. need not know the physical location of the files that are being manipulated. Common directory structure for all participating file systems CFS uses Scheduler to select "next" device Database handles logical file name mapping to physical filenames End User perspective Has to integrate cleanly with Unix scripts We solved this using cfsopen command cfsopen maps logical filenames to physical filenames for use in scripts Must have a programmable API Compete Data Access Layer manages many different data sources Wrap CFS functionality behind CDAL Then users can use CDAL as they always do. Module that maps virtual filenames to real filenames -- all file open/closing done with this module Users must be able to catalog and delete files as needed cfsls and cfsrm commands Compete File System architecture Scheduler Which real filesystem to use next? Database manager Maps virtual filenames to real filenames Two of these, since replication wasn't available at the time Straightforward implementation Just map filenames Allow access to CFS file as "normally" as possible for end users Stateless model of sorts: No daemons to worry about Better portability for CFS Simple code Scheduling and Load Balancing Version 1 Multiple file systems on individual NFS servers Assume all file systems are about the same size Used a round robin approach Worked ok but large files & small files made problems Version 1.1 Best fit scheduler - selects based on size of files Problem with many small/temp files that grow, since the files are not distributed very well Future Predictive Scheduler Score-boarding file system sizing Predicts sizes of new files based on average size in directory Allows applications to pre-allocate space as a hint to CFS Open file reaper tracks implicit file closes and updates stats CFS Internal State Stateless processes State information stored in database Losing the CFS Database due to ___ Role of Python Easy to think about algorithms, functionality Consideration given to other languages for deployment -C, C++ but rejected for the "usual" reasons Whining and ranting Python VM footprint and speed are good, but a compiled lang could be better I/O in Python is a problem (dealing w/ files that are hundreds of GBs) Results of Building and Deploying CFS Happy management Zero Worries Effective storage resource management Hundreds of thousands of files across multiple devices Multiple terabytes of data under coherent structure -------------------------------------------------------------------------- REFERENCES: {as documents / sites are referenced add them below} -------------------------------------------------------------------------- QUOTES: -------------------------------------------------------------------------- CONTRIBUTORS: {add your name, e-mail address and URL below} Linden Wright Abhay Saxena -------------------------------------------------------------------------- E-MAIL BOUNCEBACK: {add your e-mail address separated by commas } -------------------------------------------------------------------------- NOTES ON / KEY TO THIS TEMPLATE: A headline (like a field in a database) will be CAPITALISED This differentiates from the text that follows A variable that you can change will be surrounded by _underscores_ Spaces in variables are also replaced with under_scores This allows people to select the whole variable with a simple double-click A tool-tip is lower case and surrounded by {curly brackets / parentheses} These supply helpful contextual information. -------------------------------------------------------------------------- Copyright shared between all the participants unless otherwise stated...