[Tutor] socket communications and threading

Discussion:

richard kappler

2015-10-27 14:52:52 UTC

I'm having difficulty wrapping my arms around sockets and threading, not so
much from a 10,000 foot/ network perspective, but from a low level
perspective.

In our production environment we have three machines that generate data and
forward it over tcp to a computer that stores the data, parses it and sends
it on to another computer over tcp for analytics.

In our test environment we have simulated this by building three vm's. VM1
has a python script that sends raw data over tcp to VM2 which parses the
data and sends it over tcp to VM3 upon which we are developing our
analytics apps.

Note that VM1 script reads from a .log file which is the actual output from
three real machines, differentiated by a unique deviceID, and each line or
'event' in the .log file is read and sent over tcp with a 0.5 second delay
between each read.

VM2 has a python script that receives through a socket the raw data over
tcp, parses it and sends it on through a socket over tcp to VM3.

My lack of confusion arises for a couple reasons.

1. The data from the three different machines each gets it's own thread in
production, so that would have to happen on 'VM2' as the 'VM1' are actually
just microcontrollers out in production. From a socket and threading
perspective, which would be considered the client and which the server,
VM1 (the sender) or VM2 (the receiver)?

2. The process has worked mediocre at best thus far. When I developed the
two python scripts (tunnelSim to send over socket and parser to rx and tx
over socket) I started by just reading and writing to files so I could
concentrate on the parsing bit. Once that was done and worked very well I
added in sockets for data flow and commented out the read from and to files
bits, and everything seemed to work fine, VM1 sent a number of 'lines', VM2
received the same number of 'lines', parsed them and, seemed to send them
on. Some days analytics (VM3) got them all, some days it did not. Not sure
where to look, and any thoughts on troubleshooting this would be helpful,
but the main point of the entire email is question 1, threading.

regards, Richard
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Alan Gauld

2015-10-27 17:44:36 UTC

Permalink

Post by richard kappler
In our test environment we have simulated this by building three vm's. VM1
has a python script that sends raw data over tcp to VM2 which parses the
data and sends it over tcp to VM3 upon which we are developing our
analytics apps.
...
1. The data from the three different machines each gets it's own thread in
production, so that would have to happen on 'VM2' as the 'VM1' are actually
just microcontrollers out in production. From a socket and threading
perspective, which would be considered the client and which the server,
VM1 (the sender) or VM2 (the receiver)?

Client and server are about roles. The question is therefore
which machine is requesting a service and which is providing
it? Sounds like for the first transaction VM1 is asking VM2 to
store the data, so VM1 is client, VM2 is server.

However, for the analytics part, VM2 is asking for analysis and
VM3 doing the work so VM2 is client in that transaction and VM3
the server.

Post by richard kappler
2. The process has worked mediocre at best thus far. When I developed the
two python scripts (tunnelSim to send over socket and parser to rx and tx
over socket) I started by just reading and writing to files so I could
concentrate on the parsing bit. Once that was done and worked very well I
added in sockets for data flow and commented out the read from and to files
bits, and everything seemed to work fine, VM1 sent a number of 'lines', VM2
received the same number of 'lines', parsed them and, seemed to send them
on. Some days analytics (VM3) got them all, some days it did not. Not sure
where to look, and any thoughts on troubleshooting this would be helpful,
but the main point of the entire email is question 1, threading.

Where is the threading question in #1? I only saw a question about
client/server - which has nothing at all to do with threading?

Slightly confused.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

richard kappler

2015-10-27 17:51:57 UTC

Permalink

Sorry, thought it was clear. Each of the three different data generating
machines (in the test env, the python script that sends the data with 3
different device names) goes over a different thread so the developers tell
me. In production, those three machines are microcontrollers, not full
blown computers. VM2 is a computer, so that must be where the threads
'occur' (again, I don't understand this well) but in the examples I read,
it was about which was server and which was client, hence the connection
between client/server and threading. With your explanation I am off to
re-read the tutorials and examples, thank you.

regards, Richard

Post by richard kappler
In our test environment we have simulated this by building three vm's. VM1

Post by richard kappler
has a python script that sends raw data over tcp to VM2 which parses the
data and sends it over tcp to VM3 upon which we are developing our
analytics apps.
...

1. The data from the three different machines each gets it's own thread in

Post by richard kappler
production, so that would have to happen on 'VM2' as the 'VM1' are actually
just microcontrollers out in production. From a socket and threading
perspective, which would be considered the client and which the server,
VM1 (the sender) or VM2 (the receiver)?

Client and server are about roles. The question is therefore
which machine is requesting a service and which is providing
it? Sounds like for the first transaction VM1 is asking VM2 to
store the data, so VM1 is client, VM2 is server.
However, for the analytics part, VM2 is asking for analysis and
VM3 doing the work so VM2 is client in that transaction and VM3
the server.
2. The process has worked mediocre at best thus far. When I developed the

Post by richard kappler
two python scripts (tunnelSim to send over socket and parser to rx and tx
over socket) I started by just reading and writing to files so I could
concentrate on the parsing bit. Once that was done and worked very well I
added in sockets for data flow and commented out the read from and to files
bits, and everything seemed to work fine, VM1 sent a number of 'lines', VM2
received the same number of 'lines', parsed them and, seemed to send them
on. Some days analytics (VM3) got them all, some days it did not. Not sure
where to look, and any thoughts on troubleshooting this would be helpful,
but the main point of the entire email is question 1, threading.

Where is the threading question in #1? I only saw a question about
client/server - which has nothing at all to do with threading?
Slightly confused.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
http://www.flickr.com/photos/alangauldphotos
_______________________________________________
https://mail.python.org/mailman/listinfo/tutor

--
All internal models of the world are approximate. ~ Sebastian Thrun
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Alan Gauld

2015-10-27 23:58:14 UTC

Permalink

Post by richard kappler
Sorry, thought it was clear. Each of the three different data
generating machines
(in the test env, the python script that sends the data with 3
different device names)
goes over a different thread so the developers tell me.

OK, You need to get the concepts clear.
Threads are about parallel processing. They occur on a processor and
have nothing to do with communications. They are one of several mechanisms
that enable multiple tasks to run in parallel on a single computer. Each
thread
is a sub-process of the parent process. Its common, but not necessary,
for the
threads to all be clones of each other, but they could equally be
multiple different
types of process.

Sockets are simply communications channels. They connect two processes,
usually over a network but not necessarily. They have no direct
relationship
to threads.

It is, however, very common to have different threads (different
instances of
a process) handle each communication channel, but it's by no means
necessary.

Post by richard kappler
in the examples I read, it was about which was server and which was
client,

As I said, client/server is about the nature of the communications
between two processes
(which are often on two different computers but don't have to be). The
nature of the client
and server processes is completely open, it could involve communicating
over sockets,
but it might not. It could involve multiple threads, it might not. They
key thing that defines
the relationship is that clients send requests to the server which sends
responses back.
The server never sends requests to the client (if it did it would be
peer to peer not
client/server). The server may send unsolicited events (notifications)
to its clients
(either broadcast to all or targeted at one) but it does not expect a
response.

All of these concepts are separate and not dependant on each other. You
could have
multiple client threads as well as multiple server threads, or no
threads at all. It could
all run on a single computer, with or without sockets. In particular,
it's common today
to use higher level comms mechanisms such as http with JSON/XMLRPC/SOAP.
Sockets
are used under the covers but the programmer doesn't need to know.

So to summarize what I think is going on in your case, you have:

VM1
A single threaded client process running on a micro-controller sending
requests
to a remote server (VM2)
Question:
is there really only one client controller or is it actually 3, one
per logged machine? How do the logged machines communicate
to the controller(s)?

VM2
A multi threaded server process running on a conventional computer
(a PC? Mac? which OS?) It receives messages from the client(s?) and
allocates these
messages to one of three threads(sub processes) depending on message source.
The server then acts as a client process and sends requests to another
micro
controller acting as a server(VM3).
Questions:
Are the messages to VM3 sent from the individual threads?
Do they all go down a single socket connection?
Or is there a top level VM2 process that forwards the messages to VM3?
Or is there in fact a 4th VM2 thread/process relaying the messages to VM3?
(perhaps triggered from a database? or as a polling process).

VM3
A single threaded server process that receives messages and analyses
their content.
I don't know what it does with the result...

In each case the messaging is done over raw sockets.

Is that a fair summary?

PS.
You might find it helpful to look at my tutorial sections on inter-process
comms and network programming. It covers client-server on a single machine
then using sockets to extend it to two machines (but uses an
alternative, simpler
form of multi-processing to threads). Its in Python 2 but should convert
easily
to Python 3.

http://www.alan-g.me.uk/tutor/tutipc.htm

But converting the model to threads would not be difficult (and indeed was
planned for the, as yet unwritten, topic on parallel processing!)
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Alan Gauld

2015-10-28 00:16:24 UTC

Permalink

Post by Alan Gauld
that enable multiple tasks to run in parallel on a single computer. Each
thread is a sub-process of the parent process.

I should add that this is a bit of a simplification because threading
varies in implementation depending on OS and language. Threads are
conceptual subprocesses but may in fact be a part of the parent
process from the OS perspective. In general threads are more efficient
(less memory, faster to start/stop) than true sub-processes but it
all depends on the implementation. Python threading in particular
is not especially efficient.

Threading is notoriously tricky to get right although the simpler
you keep the design (ideally, stateless processing, with no
shared data) the easier it is.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor