A thorough guide to SQLite database operations in PythonTweet
After I wrote the initial teaser article "SQLite - Working with large data sets in Python effectively" about how awesome SQLite databases are via sqlite3 in Python, I wanted to delve a little bit more into the SQLite syntax and provide you with some more hands-on examples.
• Connecting to an SQLite database
• Creating a new SQLite database
- Overview of SQLite data types
- A quick word on PRIMARY KEYS:
• Adding new columns
• Inserting and updating rows
• Creating unique indexes
• Querying the database - Selecting rows
• Security and injection attacks
• Date and time operations
• Printing a database summary
Connecting to an SQLite database
The sqlite3 that we will be using throughout this tutorial is part of the Python Standard Library and is a nice and easy interface to SQLite databases: There are no server processes involved, no configurations required, and no other obstacles we have to worry about.
In general, the only thing that needs to be done before we can perform any operation on a SQLite database via Python's
sqlite3 module, is to open a connection to an SQLite database file:
where the database file (
sqlite_file) can reside anywhere on our disk, e.g.,
Conveniently, a new database file (
.sqlite file) will be created automatically the first time we try to connect to a database. However, we have to be aware that it won't have a table, yet. In the following section, we will take a look at some example code of how to create a new SQLite database files with tables for storing some data.
To round up this section about connecting to a SQLite database file, there are two more operations that are worth mentioning.
If we are finished with our operations on the database file, we have to close the connection via the
And if we performed any operation on the database other than sending queries, we need to commit those changes via the
.commit() method before we close the connection:
Creating a new SQLite database
Let us have a look at some example code to create a new SQLite database file with two tables: One with and one without a PRIMARY KEY column (don't worry, there is more information about PRIMARY KEYs further down in this section).
Download the script: create_new_db.py
Tip: A handy tool to visualize and access SQLite databases is the free FireFox SQLite Manager add-on. Throughout this article, I will use this tool to provide screenshots of the database structures that we created below the corresponding code sections.
Using the code above, we created a new
.sqlite database file with 2 tables. Each table consists of currently one column only, which is of type INTEGER.
Here is a quick overview of all data types that are supported by SQLite 3:
- INTEGER: A signed integer up to 8 bytes depending on the magnitude of the value.
- REAL: An 8-byte floating point value.
- TEXT: A text string, typically UTF-8 encoded (depending on the database encoding).
- BLOB: A blob of data (binary large object) for storing binary data.
- NULL: A NULL value, represents missing data or an empty cell.
Looking at the table above, You might have noticed that SQLite 3 has no designated Boolean data type. However, this should not be an issue, since we could simply re-purpose the INTEGER type to represent Boolean values (0 = false, 1 = true).
A quick word on PRIMARY KEYS:
In our example code above, we set our 1 column in the second table to PRIMARY KEY. The advantage of a PRIMARY KEY index is a significant performance gain if we use the PRIMARY KEY column as query for accessing rows in the table. Every table can only have max. 1 PRIMARY KEY (single or multiple column(s)), and the values in this column MUST be unique! But more on column indexing in the a later section.
Adding new columns
If we want to add a new column to an existing SQLite database table, we can either leave the cells for each row empty (NULL value), or we can set a default value for each cell, which is pretty convenient for certain applications.
Let's have a look at some code:
Download the script: add_new_column.py
We just added 2 more columns (
my_table_2 of our SQLite database next to the PRIMARY KEY column
The difference between the two new columns is that we initialized
my_3rd_column with a default value (here:'Hello World'), which will be inserted for every existing cell under this column and for every new row that we are going to add to the table if we don't insert or update it with a different value.
Inserting and updating rows
Inserting and updating rows into an existing SQLite database table - next to sending queries - is probably the most common database operation. The Structured Query Language has a convenient
UPSERT function, which is basically just a merge between UPDATE and INSERT: It inserts new rows into a database table with a value for the PRIMARY KEY column if it does not exist yet, or updates a row for an existing PRIMARY KEY value.
Unfortunately, this convenient syntax is not supported by the more compact SQLite database implementation that we are using here. However, there are some workarounds. But let us first have a look at the example code:
Download the script: update_or_insert_records.py
INSERT and B)
INSERT OR IGNORE have in common that they append new rows to the database if a given PRIMARY KEY does not exist in the database table, yet. However, if we'd try to append a PRIMARY KEY value that is not unique, a simple
INSERT would raise an
sqlite3.IntegrityError exception, which can be either captured via a try-except statement (case A) or circumvented by the SQLite call
INSERT OR IGNORE (case B).
This can be pretty useful if we want to construct an
UPSERT equivalent in SQLite. E.g., if we want to add a dataset to an existing database table that contains a mix between existing and new IDs for our PRIMARY KEY column.
Creating unique indexes
Just like hashtable-datastructures, indexes function as direct pointers to our data in a table for a particular column (i.e., the indexed column). For example, the PRIMARY KEY column would have such an index by default. The downside of indexes is that every row value in the column must be unique. However, it is recommended and pretty useful to index certain columns if possible, since it rewards us with a significant performance gain for the data retrieval.
The example code below shows how to add such an unique index to an existing column in an SQLite database table. And if we should decide to insert non-unique values into a indexed column later, there is also a convenient way to drop the index, which is also shown in the code below.
Download the script: create_unique_index.py
Querying the database - Selecting rows
After we learned about how to create and modify SQLite databases, it's about time for some data retrieval. The code below illustrates how we can retrieve row entries for all or some columns if they match certain criteria.
Download the script: selecting_entries.py
if we use the
.fetchall() method, we return a list of tuples from the database query, where each tuple represents one row entry. The print output for the 5 different cases shown in the code above would look like this (note that we only have a table with 1 row here):
Security and injection attacks
So far, we have been using Python's string formatting method to insert parameters like table and column names into the
c.execute() functions. This is fine if we just want to use the database for ourselves. However, this leaves our database vulnerable to injection attacks. For example, if our database would be part of a web application, it would allow hackers to directly communicate with the database in order to bypass login and password verification and steal data.
In order to prevent this, it is recommended to use
? place holders in the SQLite commands instead of the
% formatting expression or the
.format() method, which we have been using in this tutorial.
For example, instead of using
in the Querying the database - Selecting rows section above, we would want to use the
? placeholder for the queried column value and include the variable(s) (here:
123456), which we want to insert, as tuple at the end of the
However, the problem with this approach is that it would only work for values, not for column or table names. So what are we supposed to do with the rest of the string if we want to protect ourselves from injection attacks? The easy solution would be to refrain from using variables in SQLite queries whenever possible, and if it cannot be avoided, we would want to use a function that strips all non-alphanumerical characters from the stored content of the variable, e.g.,
Date and time operations
SQLite inherited the convenient date and time operations from SQL, which are one of my favorite features of the Structured Query Language: It does not only allow us to insert dates and times in various different formats, but we can also perform simple
- arithmetic, for example to look up entries that have been added xxx days ago.
Download the script: date_time_ops.py
Some of the really convenient functions that return the current time and date are:
The screenshot below shows the print outputs of the code that we used to query for entries that lie between a specified date interval using
and entries that are older than 1 day via
Note that we don't have to provide the complete time stamps here, the same syntax applies to simple dates or simple times only, too.
Update Mar 16, 2014:
If'd we are interested to calulate the hours between two
DATETIME()timestamps, we can could use the handy
STRFTIME()function like this
which would calculate the difference in hours between two dates in this particular example above (here:
48) in this case.
And to calculate the difference in hours between the current
DATETIMEand a given
DATETIMEstring, we could use the following SQLite syntax:
Retrieving column names
In the previous two sections we have seen how we query SQLite databases for data contents. Now let us have a look at how we retrieve its metadata (here: column names):
Download the script: get_columnnames.py
Since we haven't created a PRIMARY KEY column for
my_table_3, SQLite automatically provides an indexed
rowid column with unique ascending integer values, which will be ignored in our case.
PRAGMA TABLE_INFO() function on our table, we return a list of tuples, where each tuple contains the following information about every column in the table:
(id, name, type, notnull, default_value, primary_key).
So, in order to get the names of every column in our table, we only have to grab the 2nd value in each tuple of the returned list, which can be done by
names = [tup for tup in c.fetchall()]after the
PRAGMA TABLE_INFO()call. If we would print the contents of the variable
namesnow, the output would look like this:
Printing a database summary
I hope we covered most of the basics about SQLite database operations in the previous sections, and by now we should be well equipped to get some serious work done using SQLite in Python.
Let me conclude this tutorial with an obligatory "last but not least" and a convenient script to print a nice overview of SQLite database tables:
Download the script: print_db_info.py
I really hope this tutorial was helpful to you to get started with SQLite database operations via Python. I have been using the
sqlite3 module a lot recently, and it has found its way into most of my programs for larger data analyses.
Currently, I am working on a novel drug screening software that requires me to store 3D structures and other functional data for ~13 million chemical compounds, and SQLite has been an invaluable part of my program to quickly store, query, analyze, and share my data.
Another smaller project that uses
sqlite3 in Python would be smilite, a module to retrieve and compare SMILE strings of chemical compounds from the free ZINC online database. If you are interested, you can check it out at: https://github.com/rasbt/smilite.
If you have any suggestions or questions, please don't hesitate to write me an email or leave a comment in the comment section below! I am looking forward to your opinions and ideas, and I hope I can improve and extend this tutorial in future.