Discussion:
[Tutor] How should my code handle db connections? Should my db manager module use OOP?
boB Stepp
2015-08-27 00:11:42 UTC
Permalink
My ongoing project will be centered around an SQLite db. Since almost
all data needed by the program will be stored in this db, my thought
is that I should create a connection to this db shortly after program
startup and keep this connection open until program closure. I am
assuming that opening and closing a db connection has enough overhead
that I should only do this once. But I do not *know* that this is
true. Is it? If not, then the alternative would make more sense,
i.e., open and close the db as needed.

In the first iteration of my project, my intent is to create and
populate the db with tables external to the program. The program will
only add entries to tables, query the db, etc. That is, the structure
of the db will be pre-set outside of the program, and the program will
only deal with data interactions with the db. My intent is to make
the overall design of the program OO, but I am wondering how to handle
the db manager module. Should I go OO here as well? With each
pertinent method handling a very specific means of interacting with
the db? Or go a procedural route with functions similar to the
aforementioned methods? It is not clear to me that OOP provides a
real benefit here, but, then again, I am learning how to OOP during
this project as well, so I don't have enough knowledge yet to
realistically answer this question.

TIA!
--
boB
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Steven D'Aprano
2015-08-27 02:02:09 UTC
Permalink
Post by boB Stepp
My ongoing project will be centered around an SQLite db. Since almost
all data needed by the program will be stored in this db, my thought
is that I should create a connection to this db shortly after program
startup and keep this connection open until program closure.
If you do this, you will (I believe) hit at least three problems:

- Now only one program can access the DB at a time. Until the first
program closes, nobody else can open it.

- Your database itself is vulnerable to corruption. SQLite is an easy to
use database, but it doesn't entirely meet the ACID requirements of a
real DB.

- If your database lives on a NTFS partition, which is very common for
Linux/Unix users, then if your program dies, the database will very
likely be broken.

I don't have enough experience with SQLite directly to be absolutely
sure of these things, but Firefox uses SQLite for a bunch of things that
(in my opinion) don't need to be in a database, and it suffers from
these issues, especially on Linux when using NTFS. For example, if
Firefox dies, when you restart you may lose all your bookmarks, history,
and most bizarrely of all, the back button stops working.
--
Steve
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Zachary Ware
2015-08-27 03:22:25 UTC
Permalink
Post by Steven D'Aprano
- If your database lives on a NTFS partition, which is very common for
Linux/Unix users
<snip>
Post by Steven D'Aprano
these issues, especially on Linux when using NTFS.
Surely you mean NFS, as in Network FileSystem, rather than NTFS as in New
Technology FileSystem? :)

--
Zach
(On a phone)
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Steven D'Aprano
2015-08-27 17:08:11 UTC
Permalink
Post by Zachary Ware
Post by Steven D'Aprano
- If your database lives on a NTFS partition, which is very common for
Linux/Unix users
<snip>
Post by Steven D'Aprano
these issues, especially on Linux when using NTFS.
Surely you mean NFS, as in Network FileSystem, rather than NTFS as in New
Technology FileSystem? :)
Indeed I do, thank you for the correction, and apologies for the
confusion.
--
Steve
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Martin A. Brown
2015-08-27 04:42:17 UTC
Permalink
Hi there,
Post by boB Stepp
My ongoing project will be centered around an SQLite db.
Not a bad way to start. There are many possible ways to access SQL
DBs. I'll talk about one of my favorites, since I'm a big fan of
sqlalchemy [0], which provides a broad useful toolkit for dealing
with SQL DBs and an abstraction layer.

To start, often the question is why any such abstraction tool, given
the additional complexity of a module, a.k.a. another layer of code?

Briefly, my main two reasons:

A) abstraction of data model from SQL implementation for the
Python program (allows switching from SQLite another DBAPI,
e.g. postgres, later with a minimum effort)
B) somebody has already implemented the tricky bits, such as ORMs
(see below), failover, connection pooling (see below) and
other DB-specific features
Post by boB Stepp
Since almost all data needed by the program will be stored in this
db, my thought is that I should create a connection to this db
shortly after program startup and keep this connection open until
program closure.
That is one possible approach. But, consider using a "connection
pooling" technique that somebody else has already implemented and
tested. This saves your time for working on the logic of your
program.

There are many different pooling strategies, which include things
like "Use only one connection at a time." or "Connect on demand." or
"Hold a bunch of connections open and let me use one when I need
one, and I'll release it when I'm done." and even "When the
connection fails, retry quietly in the background until a successful
connection can be re-established."
Post by boB Stepp
I am assuming that opening and closing a db connection has enough
overhead that I should only do this once. But I do not *know*
that this is true. Is it? If not, then the alternative would
make more sense, i.e., open and close the db as needed.
Measure, measure, measure. Profile it before coming to such a
conclusion. You may be correct, but, it behooves you to measure.
(My take on an old computing adage: Premature optimization can lead
you down unnecessarily painful or time consuming paths.)

N.B. Only you (or your development cohort) can anticipate the load
on the DB, the growth of records (i.e. data set size), the growth of
the complexity of the project, or the user count. So, even if the
measurements tell you one thing, be sure to consider the longer-term
plan for the data and application.

Also, see Steven D'Aprano's comments about concurrency and other
ACIDic concerns.
Post by boB Stepp
In the first iteration of my project, my intent is to create and
populate the db with tables external to the program. The program
will only add entries to tables, query the db, etc. That is, the
structure of the db will be pre-set outside of the program, and
the program will only deal with data interactions with the db.
If the structure of the DB is determined outside the program,
this sounds like a great reason to use an Object Relational
Modeler (ORM). An ORM which supports reflection (sqlalchemy
does) can create Pythonic objects for you.
Post by boB Stepp
My intent is to make the overall design of the program OO, but I
am wondering how to handle the db manager module. Should I go OO
here as well? With each pertinent method handling a very specific
means of interacting with the db? Or go a procedural route with
functions similar to the aforementioned methods? It is not clear
to me that OOP provides a real benefit here, but, then again, I am
learning how to OOP during this project as well, so I don't have
enough knowledge yet to realistically answer this question.
I'm not sure I can weigh in intelligently here (OOP v. procedural),
but I'd guess that you could get that Object-Oriented feel by taking
advantage of an ORM, rather than writing one yourself. Getting used
to the idea of an ORM can be tricky, but if you can get reflection
working [1], I think you will be surprised at how quickly your
application logic (at the business layer) comes together and you can
(mostly) stop worrying about things like connection logic and SQL
statements executing from your Python program [2].

There probably are a few people on this list who have used
sqlalchemy and are competent to answer it, but if you have questions
specifically about sqlalchemy, you might find better answers on
their mailing list [3].

Now, back to the beginnings...a SQLite DB is a fine place to start
if you have only one thread/user/program accessing the data at any
time. Don't host it on a network(ed) file system if you have the
choice. If your application grows so much in usage or volume that
it needs a new and different DB, consider it all a success and
migrate accordingly.

Best of luck,

-Martin

[0] http://www.sqlalchemy.org/
[1] http://docs.sqlalchemy.org/en/rel_1_0/core/reflection.html
[2] Here, naturally, I'm assuming that you know your way around
SQL, since you are asserting that the DB already exists, is
maintained and designed outside of the Python program.
[3] https://groups.google.com/forum/#!forum/sqlalchemy
--
Martin A. Brown
http://linux-ip.net/
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Peter Otten
2015-08-27 07:53:25 UTC
Permalink
Post by boB Stepp
My ongoing project will be centered around an SQLite db. Since almost
all data needed by the program will be stored in this db, my thought
is that I should create a connection to this db shortly after program
startup and keep this connection open until program closure. I am
assuming that opening and closing a db connection has enough overhead
that I should only do this once. But I do not *know* that this is
true. Is it? If not, then the alternative would make more sense,
i.e., open and close the db as needed.
In the first iteration of my project, my intent is to create and
populate the db with tables external to the program. The program will
only add entries to tables, query the db, etc. That is, the structure
of the db will be pre-set outside of the program, and the program will
only deal with data interactions with the db. My intent is to make
the overall design of the program OO, but I am wondering how to handle
the db manager module. Should I go OO here as well? With each
pertinent method handling a very specific means of interacting with
the db? Or go a procedural route with functions similar to the
aforementioned methods? It is not clear to me that OOP provides a
real benefit here, but, then again, I am learning how to OOP during
this project as well, so I don't have enough knowledge yet to
realistically answer this question.
Don't overthink your project. However thorough your preparations you will
get about 50 percent of your decisions wrong.

Use version control to keep track of your code and don't be afraid to throw
parts away that don't work out.

Implement a backup scheme for the database lest you annoy your wife by
making her reenter data, and there you go. The database will be small for a
long time, so it will be practical to make a copy every day.

Regarding database access:

(1) A single connection:

_db = None

@contextmanager
def open_db():
global _db
if _db is None:
_db = sqlite3.connect(...)
is_owner = True
else:
is_owner = False
try:
yield _db
finally:
if is_owner:
_db.close()

(2) Multiple connections:

@contextmanager
def open_db():
db = sqlite3.connect(...)
try:
yield db
finally:
db.close()

You can use both the same way you deal with an open file:

with open_db() as db:
...

I don't say you should use the above code, I just want to demonstrate that
you can happily defer the answer to your connections question.

Regarding OO design in general: keep your classes small. You can't go wrong
with fewer, smaller and more general methods ;)

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alan Gauld
2015-08-27 08:14:36 UTC
Permalink
Post by boB Stepp
My ongoing project will be centered around an SQLite db. Since almost
all data needed by the program will be stored in this db, my thought
is that I should create a connection to this db shortly after program
startup and keep this connection open until program closure.
That's the usual approach with Sqlite. Remember it is just a single
file so as soon as you open it it is locked so other users can't
access it. But that's not going to be a problem for your app, at
least in the early days.

Of course keeping any file open for extended periods carries a
risk of corruption so you may want to implement an auto store/copy
regime so that there is always a recent backup. But if the app is
only being used for a few minutes at a time then it might be
overkill.
Post by boB Stepp
assuming that opening and closing a db connection has enough overhead
that I should only do this once. But I do not *know* that this is
true. Is it?
Its a good habit to get into. In fact Sqlite doesn't take too much work
to open because its just a file but once you get into server databases
its a much bigger overhead. So I'd just treat Sqlite as another database
in that regard.
Post by boB Stepp
In the first iteration of my project, my intent is to create and
populate the db with tables external to the program. The program will
only add entries to tables, query the db, etc. That is, the structure
of the db will be pre-set outside of the program, and the program will
only deal with data interactions with the db.
Yes, that makes sense.
Post by boB Stepp
the overall design of the program OO, but I am wondering how to handle
the db manager module. Should I go OO here as well?
I'm not clear what exactly you see the db manager module doing?
Is this the admin module? The data loader that sits outside the app?
Or a module within the app used by the objects?

For admin (assuming a CLI) I'd personally stick with procedural.

For data loader I'd stick with procedural and pure SQL.

For the main app I'd build a very thin procedural API over
the individual SQL queries and then let each model class
handle its own data access via that API. The API is then all
you need to change if that database changes.
OR go with a third party ORM (more to learn for little gain IMHO).


hth
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Loading...