|
||
|
GP Mailing List
ATXGPSIG List
|
Net2: IntroductionDownload
Source in .tar.gz Part1:
Introduction Why I Wrote ItThe Net2 library (Net2 because it is a second layer above SDL_net) started life as a test case to test the SDL (Simple DirectMedia Layer) thread and network IO libraries. You can test code in many different ways. Unit testing is used to prove that a piece of code does what the specification says it is supposed to do. And, I did quite a bit of unit testing of those libraries to make sure I understood them and to make sure they were worth working with. But, the real test of a library of code is how easy is it to use in your programs. When I studied and tested the SDL thread library I found a clean, simple to use, library that gives you all the primitives that you'll find in any book on threaded programming. The SDL thread library is as easy to use and understand as any thread library I've ever used. It is also reliable, portable, and does a good job reporting errors. My personal opinion is that if you need more than what you find in this library, maybe your design is too complicated. The SDL networking library. SDL_net, has all the same qualities that the thread library has, which is another reason I wrote the Net2 library. You see, I have never really liked any networking library I've ever used. They just don't seem to fit with the way I write code, and they don't really work well with event driven systems. You wind up writing code that first processes input events, then goes off to the network code and checks for connections and pending input. Then you process the network input. I would like everything to be driven by the event loop. I'm biased, I know, by my background in discrete event simulation where everything is an event. Once, you get used to thinking in terms of events it hurts to have to treat some part of the problem a different way. It causes real confusion that leads to hard to find bugs. Design GoalsWhen I started the project I sat down and asked my self what would I really like in a high level network API. Especially one that was designed to work in an event based system like SDL? I wanted a library that hides all the details of making and receiving network connections. I wanted a library that made it easy to set up and tear down the data structures used to represent a connection. And, most of all I wanted a system that just works. So, I wrote down a list of pseudocode statements that described the kind of operations I wanted the library to support. Some of that pseudocode looks like: Accept TCP/IP connections on port 4321 Accept UDP/IP messages on port 5432 Send message to "www.gameprogrammer.com" Connect to host 192.168.1.1 When a connection comes in I want to get an event that tells me about the connection. I want the connection to be identified by a nice unique integer number. When a connection goes away, whether I kill it, the remote host kills it, or it dies from an error, I want to get an event with that same integer ID in it telling me what happened. Whenever there is input available on that connection I want to get an event with that same integer ID in it. And, I want to recycle the connection numbers so that the library uses as few of them as possible. Why do I want all those conditions to apply? Because I'm a lazy guy and I don't want to work any harder than I have to. By having the connection identified by an integer that gets reused I can be sure that I can use simple data structures to store per connection data. Using simple datastructures helps me find and avoid memory leaks by making it likely I'll use the same slot in a vector or the same key in a hash to identify a new connection. If I reuse keys I'm more likely to notice that I didn't clean up after myself when the last connection with the same key was closed. The simpler the code, the more reliable it is. Oh yeah, and when I send data out on a connection I want to use that same integer to select which connection gets the data. I wrote a lot more of a specification than that, but you get the picture of what I wanted to do. Why Use Threads?I knew before I started designing the library that it was going to use at least one thread. In every server anyone has ever written there is a chunk of code that is essentially a while loop that waits on the equivalent of select() call waiting for something to happen on the network. When something happens, be it a connection attempt or input arriving, the waiting routine returns with a set of flags that tell you why it returned. The next thing in the loop is a block of code that analyzes what happened and takes some kind of action. Repeat until done. It makes sense to put all that code in a separate thread that communicates with the main body of the program through events and shared data structures. If you put the code in the main body of the program you have to poll for network activity. You can't just wait for network activity because while you are waiting you aren't processing other input and you aren't animating the action on the screen. You might waste time polling too often, or you might lose input and connections because you don't poll often enough. (This is especially true with UDP messages.) By putting that loop in a thread the code can actually wait for network events to occur. The code only wakes up when there is something to do, so it doesn't waste any time. And, while you can still lose input and connections if the system is overloaded, it isn't as likely because the OS will wake up your networking thread as quickly as it can. (Or at least that's what the OS is supposed to do...) Not to mention that if you have a machine with multiple CPUs, like a lot of servers, you can actually be running both threads at the same time and make use of both of those CPUs. By the time I had written several pages of pseudocode I had a pretty good idea what I wanted and a pretty good idea how I was going to implement it. But, I didn't have enough knowledge of the underlying system to be sure of the low level coding details. That means I had to do a lot of testing of SDL_net to make sure I understood how it worked. And then I had to develop code to fill in when it didn't work the way I needed it to. I started with the most basic part of the system, the event handling code. You can find that story in “Fast Event Processing in SDL.” After I had to tested SDL I had to test the SDL network library to see if it was thread safe. You can't write a threaded library without knowing if the libraries you are using are thread safe. I couldn't tell from the documentation or the code if SDL_net was thread safe or not. I could see that there was no code in the library to ensure that is was thread safe which is a hint that it isn't, but having a hint is not the same as knowing that it is or is not thread safe. A couple of days of testing later I had proved that the SDL network library is not thread safe. A few functions are thread safe but the library in general is not. So, I had to develop work arounds for that problem to. Coding the Threaded BeastI had no choice but to build thread safe wrappers for all the SDL_net function that I use in Net2. If you look in net2.c, the main source file of the Net2 library, you'll find a lot of code that looks like: static __inline__ SDLNet_SocketSet snAllocSocketSet(int size)
{
SDLNet_SocketSet ss = NULL;
lockSDLNet();
ss = SDLNet_AllocSocketSet(size);
unlockSDLNet();
return ss;
}These are thread safe wrappers for SDL_net functions. Each of these wrappers has the same name as the SDL_net function it wraps with “SDLNet” replaced by “sn.”Each wrapper uses a local variable to capture the return value of the wrapped function. Each wrapper locks a mutex immediately before and after calling the wrapped function. It might have been a little more efficient to use the SDL_net calls directly and lock and unlock the mutex only when I absolutely had to. Doing it that way might have been more efficient. But, it would have resulted in much more complex code. The kind of code where any change is likely to introduce hard to find dead lock bugs caused by misplaced locks and unlocks. It is always better to write code so it is easy to read and modify. If it turns out to be too slow you can fix it after you have had a chance to profile the code and find out where the problems really are. There are some other things to note about the wrappers. You might notice that the code that actually locks and unlocks mutexes is “wrapped” in a function. I do that because those functions allows me to have the error checking and reporting code in one place for each mutex and because those functions are great places to put debugging and trace code for run time tracing as well as great places to put break points when you're using a debugger. Also, the call to lock the mutex is always just after the declarations and the call to unlock it is directly before the return statement. Organizing the code this way ensures that the mutexes are always locked and unlocked in the proper order and it makes it easy to visually inspect the code for correctness. I wouldn't organize the code this way if I were using C++ or Java. In C++ you can use constructor destructor pairs to ensure correct locking and unlocking of mutexes. In Java you just declare a function to be“synchronized” to get the same effect. But, I am using plain old C so I use all the tricks I know to make sure it works correctly and to make sure that I can find problems quickly when it doesn't. Note, that I did not wrap the entire SDL_net library, just the functions I use in Net2. Also note that I use the same wrapper structure for my own routines that need to lock and unlock mutexes. For each NET2 API that locks a mutex there is a static raw_NET2 that does the work while the NET2 API manages the mutexes. Like this: static __inline__ int raw_NET2_UDPAcceptOn(int port, int size)
{
int s = -1;
...
return s;
}
int NET2_UDPAcceptOn(int port, int size)
{
int val = -1;
lockData();
val = raw_NET2_UDPAcceptOn(port, size);
unlockData();
return val;
}What About output?With all this talk about threaded input loops you might wonder how the network output code works. The output code is not threaded in anyway. It must lock some of the shared data used by the input thread to make sure that it is in a consistent state and stays consistent while the output is being sent. But, aside from that the output code is not affected by the multithreaded design of the library. That means that when you call a function to send a message, the message is sent before the function returns. Or at least it has been sent to the OS for processing. To the limit that a program can control such things, the message has been sent. Error ReportingLike most of SDL the Net2 functions return -1 if they encountered an error and a non-negative value (a value greater than or equal to zero) if everything went OK. But, Net2 can encounter errors that didn't happen during a function call. These errors occur asynchronously in another thread. To report these errors Net2 sends error events with as much information as possible about the error. In all cases the error event contains a pointer to a string that describes the error, and either the ID of the socket that had the error or -1 (minus one) if the socket ID is not available or not applicable. Programmers need to pay attention to error events because they are usually indications that the Net2 library is overloaded or is suffering from corrupted memory. Misfeatures and Other Stuff You Need to KnowThere are a few features of the code that I'm not happy with, things that feel like kludges, that I either couldn't think of a better way to handle or that I was just too lazy to fix. One big problem is with event types. SDL only provides one event type for applications to use for there own purposes, SDL_UserEvent. That structure provides one int field for subtypes and two void * fields for data. The data fields are fine because you can cast a pointer to any type of data and store it in a void * field. And an int gives you plenty of different subtypes. I'd hate to imagine an application that needed more subtypes than that. The problem is that SDL doesn't have a mechanism for allocating the subtypes of an SDL_UserEvent. That creates a problem for a library writer who wants to use SDL_UserEvent to communicate. What happens if the application also uses SDL_UserEvent? An application might want to report timer events or to talk to another thread using events. The event subtype codes used by the library and the application are almost guaranteed to conflict with each other. And the programmer has to make sure that they don't conflict with each other. That leaves me with the choice of building a mechanism to allocate SDL_UserEvent subtypes or of just warning you about the problem and letting you edit the net2.h file as you see fit. So I decided to trust the programmer. Look for the following declaration and edit it as you see fit. /* NET2 event types */
enum
{
NET2_ERROREVENT,
NET2_TCPACCEPTEVENT,
NET2_TCPRECEIVEEVENT,
NET2_TCPCLOSEEVENT,
NET2_UDPRECEIVEEVENT,
};There are a couple of global flags used to keep track of state that you would think shouldn't need to be known outside of a single function. One flag, dataLocked is just a consistency check. It is used to make sure that the input thread actually locks the shared data when it should. This flag and the code that uses it could be removed. I left it in just to help catch problems that may occur while modifying the code in the future. Another global flag is waitForRead. It is set when the input thread needs to put data into a full buffer. This flag is used to ensure that the input thread waits until the main thread has gotten around to reading out of the buffer. The flag is set in the IO thread and cleared by the Net2 function that reads from the buffer. This approach to the problem seems a little heavy handed. I could have just blocked the one socket instead of blocking the whole input thread. But, this is a simple approach and it makes sure that the main application thread gets a chance to do some work at a time when the network thread may be getting overloaded. Basically this is a problem that needs careful analysis to decide the right way to handle it, but doesn't seem like enough of a problem to spend time on. There are also a few of configuration parameters that you may need to change. These parameters are in net2.c near the top and look like: //---------------------------------------- // // Configuration parameters // #define maxSockets (1024) #define tcpQueLen (1024) #define udpQueLen (tcpQueLen / (sizeof(UDPpacket *))) The first one maxSockets sets the limit on how many sockets can be open at one time. You might want to increase this value for a server and decrease it for a client. The socket table is a statically allocated vector of pointers so the size of the tables is maxSockets times the size of a pointer on your system. The value of tcpQueLen controls the number of bytes of input buffer allocated per TCP active socket. The UDP message queue stores pointers to SDL_net UDPpacket structures and udpQueLen controls the number of UDP messages that can be in the input queue of an active UDP socket. The NET2_Socket structure contains a union of the TCP and UDP input buffers and since the UDP queue contains pointers it makes sense to define the UDP queue length in terms of the TCP queue length. But, you can change the sizes to tune performance for your application. Part1:Introduction, Part2: Examples, Part3:Documentation, Part4:Sets Copyright © 2002 Robert C. Pendleton. All rights reserved.
|
|