We covered the basics of protocols in an earlier post. The first tutorial focused on data transmissions and message handling. In this tutorial, we will focus on two other critical ideas:
- Connection Management
- Memory Management
There are at least three distinct phases: pre-connection, connection, post-connection.
The pre-connection phase is after the protocol class has been created, but before it has network connectivity. For most service protocols, such as C2C and PTCL, you don’t have to worry about much. But for application layer protocols that may provide an API to other classes, you need to ensure that the API either returns a deferred OR internally stores the API call until the connection is made. Suppose, for example, that you had an FTP protocol. It might have a method called “login” for logging into the FTP server. Once the protocol class is instantiated, calling code could attempt to login using that method. But if the network connection doesn’t yet exist, it will fail.
Once network connectivity is established, the protocol’s method “makeConnection(transport)” is called. You should almost never overwrite this method and, if you do, make sure to call the superclass’s version. This method sets the transport of the protocol, and calls “connectionMade()”. When this happens, your protocol knows that it can begin to send (and receive) data. Although I did not say so earlier in the class, I would now stress that when you overwrite this callback that you also call the super class’s method as well.
Finally, post-connection. This is when the connection is lost. It’s important to treat this distinctly from “pre-connection.” For example, it doesn’t work to simply have a boolean flag that indicates “connected” or “not connected.” Consider the FTP login procedure we discussed in the earlier paragraph. If someone calls “login” during pre-connection, you should probably just hold that call until the connection is made and then call it. But if someone calls “login” after a connection, you may want to throw an exception.
Connections can generally be lost in two ways: either by calling the transport.loseConnection() or because some network activity is cauing it to shutdown. Either way, the “connectionLost” method should be called. As a practical matter, please make sure that your PTCL protocol is calling “connectionLost” on the upper layer in either scenario. The connectionLost method takes an optional “reason” parameter. In Playground, reason is just a string or None. If you’re closing the upper protocol in response to it calling “connectionLost”, or if you’re closing the upper protocol because you received a FIN, you could give a useful string such as “Normal Shutdown” for the reason. On the other hand, if you’re shutting down because of a timeout, you could use the string “No ACK in %d seconds” as the shutdown message.
By way of additional clarification, calling .transport.loseConnection() is the higher protocol’s way of signaling the network that it has finished. On the other hand, when the network signals a protocol by calling connectionLost(), it is signaling that the network disconnect (requested or otherwise) is complete.
In class the other day, we discussed whether you should call “connectionLost” if no connection has been made (e.g., a timeout during handshake). The idealized answer is “no” and you would implement a connection failed on the factory class. However, for simplicity, I haven’t had the class doing much with factory classes, and splitting that functionality out is actually somewhat complicated. So I suggest using “connectionLost” to signal connection failures. As a corollary, application protocols need to monitor connectionLost callbacks for these kind of alerts.
For reasons that I’m about to explain in memory management, please call the super method for connectionLost. You’re code should look like this:
def connectionLost(self, reason=None): SimpleMessageHandlingProtocol.connectionLost(self, reason) # do my own cleanup here
It is very easy to get memory leaks using Twisted because of circular references. For those unfamiliar with Python garbage collection, any memory without a pointer is cleaned up. If you create an object without assigning it somewhere, it will get reclaimed. If a variable goes out of scope, and it was the only reference to an object, that object will be reclaimed.
Where things break down is when there are circular references. Suppose object A points to object B and object B points to object A. Now if A and B go out of scope, they can never be cleaned up. Because they both reference each other, they are not garbage collected. And if they’ve gone out of scope, the user doesn’t have access to them any more.
The way Twisted factory and protocol classes work introduce this problem. When the Protocol class is instantiated, it maintains a pointer to the factory. And the factory often has a pointer to the Protocol (otherwise, the Protocol might go out of scope and get reclaimed).
To solve this problem, make sure that when the connectionLost method is called, you clean up any circular references.
Now, the base class connectionLost already disconnects the circular reference to the factory and, for this reason, you should always call the base class’s connectionLost method when overwriting your own. It will also set the transport to None (in case the transport has a circular reference back to the Protocol). If you’re made your own circular references, you should clean those up as well (e.g., your own local reference to the Factory).