Fast Networking with Python.notes

Wednesday, March 30, 2005

TITLE OF PAPER: Fast Networking with Python
URL OF PRESENTATION: _URL_of_powerpoint_presentation_
PRESENTED BY: Itamar Shtull-Trauring
REPRESENTING: _name_of_the_company_they_represent_

CONFERENCE: PyCon 2005
DATE: March 25, 2005
LOCATION: Grand Ballroom, GWU
--------------------------------------------------------------------------

REAL-TIME NOTES / ANNOTATIONS OF THE PAPER:
{If you've contributed, add your name, e-mail & URL at the bottom}

What do we mean by "fast"?
    I/O bound
    CPU bound

I/O Bound
    Not enough bandwidth (no point in optimizing your code)
    High latency (same comment about optimizing code)
    Solution: improve protocol

CPU Bound
    Can't keep up with nework
    Networking code is slow, or
    Business/application logic is slow
    Example: Batch processing
        If the processing of the data your sending is slow, then it doesn't
        matter how fast you can send it
        
Our Goal
  Network code fast enough
    if you can keep up with the network thats good enough, you don't need to be faster than the data being received over the wire - you might as well be doing something else
  I/O bound

Lazy Optimization
  Upgrade to Python 2.4
  Buy new hardware
    (sometimes cheaper than new code)

Algorithms
  Python can be faster than C
    if the algorithm is designed better in python than C it's faster
    
Example: Scheduling
  while True:
    for t in getScheduledEvents():
      t.run()
    sleep(getTimeUntilNextEvent())
  Want to only get the events that are going to happen now, because you don't care about the rest right now.
  
Example: Scheduling
  glib: iterate over all timers O(n)  [looks at every event, thousands of times a second]
  Twisted: heapq of timers, only get newest O(k)  [because the event list is sorted by time you can stop iterating over the events when you hit something that is scheduled for later time]
  Twisted is faster for large number of timers
  

Data Copying
  Python strings are immutable
  Adding two strings copies data
    (When you concatenate strings in Python, you're copying them and allocating more memory; doing this the wrong way can be extremely expensive.)

Bad String Concatenation
    s = ""
    for i in chunksOfData:
        s += i
    (Actually in 2.4 this special case is optimized, but not if 's' were an attribute of an object)
    (If you do 1000 writes, it's actually doing half million copies of your string length, and it's really, really slow.)


Do this Instead
   l = []
   for i in chunksOfData:
       l.append(i)
   s = "".join(l)

  (For n chunks, it's really inexpensive.)

Comparison graph:

(green line on floor, horizontal, orange line linear increasing: message: use lists)

0.08    /
0.06   /
0.04  /
0.02 /
     --------

Lower-Level Optimization
  Reduce allocation
  Reduce memory copying

  (Networking code shouldn't be copying the strings - you are just pushing bytes - you should pass the memory chunk around and process it)

Buffer() is your friend (Kinda)
    buffer() is like char*
    Non-copying reference to substring
    (will look at but will not copy)

Buffer() vs string copy
    (scatter plot: near linear increasing, vs flatline)
    (jump around 250 or so is probably due to imalloc())

Writing Without Buffer()
  def doWrite(self):
    amountWritten = write(self.data)
    if amountWritten:
      self.data = self.data[amountWritten:]

    (you have large chunk to write, won't write all at once; make copy of remaining data, write it out next time -- copying over and over)

Writeing With Buffer()
  def doWrite(self):
    slice = buffer(self.data, self.offset, len(self.data) - self.offset)
    amountWriten = write(slice)
    self.offset += amountWritten
    if self.offset == len(self.data);
      self.data = ""; self.ooffset = 0

    (can take slice of buffer, only parts we haven't written yet: writing without having to do any copying; then update offset. Real code probably won't be done just like this -- you might have a MB buffer which sits around during the write process, sucking up memory for no good reason.)


Improving Python
  Control over memory allocation
  Better buffer type
  Changes to existing built-ins

  (want better control over memory allocation
  allocate a chunk of buffer, and only copy if you really need it)


Buffer API
  C API for getting at char*
  Implemented by strings, buffer(), mmap
  Suported by many APIs, e.. file.write()
  Needs to be supported more
    >>> s = "10lala"; b = buffer(s, 0 ,2)
    >>> str(b)
    '10'
    >>> int(b)
    ValueError: invalid literal for int() ...
    (it's complaining about original string, not what the buffer is pointing to;
    python needs much better support for Buffer API)

Array.Array
  Can be used as mutable array of bytes
  Very expensive to str()
  Can't do non-copy slices
  Missing .split(), .find(), etc.
  (all kinds of annoying limitations)
  (Buffer doesn't have find() or split() either)
  (expensive to take Array and make it a string)
  

Example: Non-Copying Split()
  str.split() copies into new strings
  Modified C code that returns buffer()
  Non-copying version is 8% faster
  No improvement for small strings


Sockets and Files
  .file.read(), socket.recv() allocate on read
  .file.readinto() - unsupported (undocumented, may not be in next release)
  .socket.recvinto() - nonexistent
  (there are other design issues with mutable buffers; you might want to use one as a key in a dict without making a copy of it, but you can't have mutable objects in a dict; if you do dict key +1 you don't know if it modify the key or if it will modify the original copy of the key or will it create a new copy and lose the connection)

Other Options
  C/Pyrex/C++ extensions
  Fusion: C++ protocols for Twisted (http://itamarst.org/software/
  Flumotion technique (Twisted + glib C code)
  (do most of their client requests in Twisted, C for the rest, data intensive operations)

If these problems are interesting to you, we're hiring. :-)

Q. There was some talk about the buffer() being deprecated
A. I'm not the one to ask but it may have been the buffer type but he isn't really the one to ask

Q. about what size do you start seeing the benefit being less
A. I didn't do any systematic timings, the 8% was done using a 500 byte string into 70 byte chunks

Q. Numarray - is it useful for this
A. Python would not be quite fast enough to read/write to the c++ numarray library - python would be fast enough



--------------------------------------------------------------------------
REFERENCES: {as documents / sites are referenced add them below}

http://itamarst.org


--------------------------------------------------------------------------
QUOTES:

"... you could be doing something else, like video games"

--------------------------------------------------------------------------
CONTRIBUTORS: {add your name, e-mail address and URL below}

Chris Shenton <chris@shenton.org>
Erik Rose <corp@grinchcentral.com>

--------------------------------------------------------------------------
E-MAIL BOUNCEBACK: {add your e-mail address separated by commas }



--------------------------------------------------------------------------
NOTES ON / KEY TO THIS TEMPLATE:
A headline (like a field in a database) will be CAPITALISED
    This differentiates from the text that follows
A variable that you can change will be surrounded by _underscores_
    Spaces in variables are also replaced with under_scores
    This allows people to select the whole variable with a simple double-click
A tool-tip is lower case and surrounded by {curly brackets / parentheses}
    These supply helpful contextual information.

--------------------------------------------------------------------------
Copyright shared between all the participants unless otherwise stated...