ICS 33 Fall 2024
Project 2: Learning to Fly
Due date and time: Friday, November 1, 11:59pm
Introduction
In lecture, we've explored the idea that it won't always be the case that our Python programs will operate only on objects stored in memory. We'll sometimes need data to be persistent, which is to say that we'll need it to remain available even if our program ends and we start it again later. Other times, we'll operate on an amount of data so large that it won't fit in our available memory — or, at the very least, we'll need a small enough percentage of the data at any given time that the cost of allocating so much memory to it would outweigh the benefit. Satisfying either of these requirements necessitates storing the data somewhere that outlives our program, which suggests that we could store it in one or more files instead; we know already that files live on, even after the program that created them has finished running, so they provide a great place to hold data for safe keeping.
As you've seen in prior coursework, though, this introduces a new set of problems to be solved, since file storage operates on a different set of principles than objects in memory in a Python program do. We need to figure out a way to take all of the objects we want to store and "flatten" them into one stream of text or bytes to be stored in a file, then to be able to turn that same information back into the original objects again. And, even if we can accomplish that, if the amount of data will be large, we'll need to figure out a way to solve that problem piecemeal, so we don't have to read an entire file everytime we need one piece of information from it, or re-write the entire file everytime we need to change one piece of information in it. These are difficult problems indeed, especially for non-experts.
Fortunately, these are such common problems that there are common solutions to them, with databases and database management systems (DBMSs) acting as the giants whose shoulders we can stand on. Provided that we can describe the shape of our data to a DBMS, it can efficiently automate the underlying file manipulation that occurs when we need to read a small amount of data from it or update something within it, so that we can issue a command that conceptually boils down to "Tell me the name of the user with the most followers" or "Associate a new mobile phone number with this patient's account" and get the necessary effect without needing to know the details of how files will be accessed or changed. As we saw in lecture, SQLite is a relational DBMS, which means that if we can describe our data in terms of tables and relationships between rows in those tables, and if we can issue it the necessary SQL statements, SQLite can take care of managing all of the file handling for us, even if the file is much larger than the amount of memory we have available. Furthermore, if we can describe constraints on that data, it can enforce them for us. Among its benefits is its availability in Python's standard library; if we have Python installed, we have SQLite, too, as well as a way to connect a Python program to a SQLite database. So, that makes it a great choice for our initial exploration of relational databases and SQL.
There is a tradeoff at work, though, in the sense that we now have a new problem to solve: When our Python program needs to fetch data from the database or make changes to it, it'll need to construct SQL statements and send them to SQLite, then interpret the result that SQLite sends back. In other words, since SQLite doesn't "speak Python," we'll have to do something to bridge the gap between the way our Python program manipulates data and the way SQLite does.
That tradeoff, ultimately, is the central focus of this project, in which you'll be implementing a program with a graphical user interface that allows a user to search and update some information in a SQLite database. We've provided a substantial starting point — notably, the entire graphical user interface is already implemented — so that you can stay focused on the important parts of the problem. But, of course, there's a tradeoff there, too: When you work on a project where a substantial amount of code is already in place, you'll need to understand enough about it that you can take advantage of what's there, while fitting your new work into it without having to rewrite all of it.
(A situation like this is a realistic analogue for joining an open source project or starting a new job; it's rarely the case that you'll be working in an area where nothing already exists, so setting one's fears and preferences aside and being able to become productive within an existing code base is an essential skill, though it can be quite daunting, even for relatively experienced people. If this project is your first experience with that, great! That's why we're doing it. Ifyou stay on a path that leads to real-world software engineering, it won't be your last experience like this. Start early and give yourself some time to digest what's there, and you'll be in business.)
Getting started
Near the top of this project write-up, you'll find a link to a Git repository that provides the starting point for this project. Using the instructions you followed in Project 0 (in the section titled Starting a new project), create a new PyCharm project using that Git repository; you'll do your work within that PyCharm project.
Additionally, you'll need one more thing, which you won't find in that repository: a SQLite database containing information about airports from around the world, which your program will be querying and updating. Download the file linked below and store it somewhere, but make a note of where you put it, because you'll need to be able to find it later. (It's fine to store it in your PyCharm project directory, but you can put it anywhere else you'd like. However, regardless of where you put it, do not commit it into your Git repository, as it's not a part of your program and potentially changes everytime you run your program.)
· airport.db
Understanding the provided database
This project will ask you to write a program that is primarily tasked with querying and updating a SQLite database that contains information about airports from around the world, mainly from the perspective of pilots flying into and out of them. As is often the case when you first start working in an area that's new to you — unless you're a trained pilot or a flight simulator enthusiast, it's likely that you know little or nothing about airports, runways, radio frequencies, and so on — your first order of business is familiarizing yourself with the problem domain. You don't need to become an immediate expert, but you definitely need to achieve at least a passing familiarity with the important concepts and the common terminology used to describe them. When you'll be using a database as part of your work, you'll also want to acquaint yourself with its schema (i.e., the tables, their columns, and the relationships between tables), which can be a great way to figure out what concepts are important and which terminology is common; understanding your data takes you a long way toward understanding a problem domain. If there are terms that are unfamiliar, you might even want to do some side research, so you understand a little bit of the context in which your work fits. (This process of gradual understanding has been necessary in every professional job I've ever started, since each one has been in an area of business very different from the previous ones. One of the great things about software skills is the number of areas in which they're applicable, but this means it's a lot likelier that switching jobs also means dramatically switching contexts.)
So, before you dive into writing any code, it's not a bad idea to take a look around the provided database, a task that PyCharm can help with, since it includes built-in tools for communicating with a SQL-based database like ours.
Connecting to the database in PyCharm
First, recall where you stored the airport.db file you downloaded previously, because now you're going to need it. Once you've figured that out, you're ready to connect to it within PyCharm. There are a few steps to follow to do that, which are complicated mainly because there are so many different ways that PyCharm is able to connect to databases, even though what we want to do is pretty simple.
· Open the PyCharm project you created previously, if it's not open already.
· Along the right-hand side of the PyCharm window, you'll see an icon labeled Database. Click that.
· That should reveal an area titled Database, which is mostly blank, but which has a
few buttons along the top of it. Click the button labeled with a + (plus sign), which
will drop down a menu, from which you should select Data Source from Path.
· A dialog titled Select Database Location will pop up, asking you to find the file
that contains the database. Find your airport.db file, select it, and click OK. · A dialog titled New Data Source will pop up.
o The box labeled Path: should already be populated with the path to your database file, so you can leave that as-is.
o In the dropdown labeled Driver:, make sure SQLite is selected, if it isn't already.
o Click OK.
· A dialog titled Data Sources and Drivers will pop up, in which we'll need to configure how PyCharm will connect to the database.
o Near the top-left corner, where there's a choice between Data Sources and Drivers, select Drivers, revealing a long list of DBMSs that PyCharm can connect to. Select SQLite in that list.
o In the right-hand area of the window, click General, then click the + (plus
sign) underneath Driver Files. That will reveal a menu, in which you should select Provided Driver, then Xerial SQLiteJDBC, then 3.43.0. (If the
only choices are slightly different from this version number, that's fine. Choose the most recent one that's at least 3.43.0 and you should be in business.)
。Next, select Data Sources near the top-left corner (instead of Drivers). o Finally, click OK near the bottom-right corner of the dialog.
· In the Database area of the PyCharm window, you should now see airport.db listed. Expanding it should reveal a schema named main. Expanding main should reveal a list of tables. (You might need to right-click or Ctrl-click main and select Refresh to reveal this.) Expanding that should reveal the names of the tables in
the database.
Now that your PyCharm project is connected to our database, we can now execute SQL statements against it. (The next time you close PyCharm and re-open the same project, this connection should still be available. If it ever disappears, the steps above should allow you to get it back again.)
Querying our database in the database console
Once you've connected to your database in PyCharm, another area within the PyCharm window titled console should have opened. If not, in the Database area, click the icon labeled Jump to Query Console, then select Open Query Console from the menu that pops up.
In that console area, you can type a SQL statement and execute it against the database. Type the SQL statement below into the console area, then click the Execute button
near the top-left corner of the console area (or press Ctrl+Enter).
SELECT *
FROM airport
WHERE airport_ident = 'KSNA ';
The Services area along the bottom edge of the PyCharm window will be displayed, if it wasn't already, in which you should see the result of your query: all of the columns from one row of a table named airport, describing information about Orange County Airport (which is a few miles from UCI).
That's all there is to it.
Exploring our database
From here, your best betis to explore our database a bit. Ifyou want to see its entire schema, the provided schema.sql file (which you'll find in your PyCharm project)
shows all of the database's tables, including their names, the names and types of their columns, along with any other constraints (primary keys, foreign keys, and so on).
The data in our database came from community-sourced data provided by OurAirports. They provide the data as a collection of files in the comma-separated values (CSV)
format, which I've converted into a SQLite database for our uses. Their data dictionary describes the meanings of the data they provide, which tracks pretty closely with the SQLite database that you've been provided, with each of their files having turned into one table in our database, and most of their columns appearing in our database with names that are the same (or, at least, pretty similar).
Let curiosity be your guide for a while. Don't aim to memorize everything you see;just aim for familiarity, as you would anytime you're exploring new territory.
The program
Your program is required to be a t kinter-based graphical user interface that allows a user to interact with the information stored in the airport.db database. It doesn't only provide the ability to visualize the information already in the database; it also provides a means to update it, which means that the database is effectively both an input and an output of your program.
This arrangement, with a program and a database existing alongside each other, is not unusual in practice; the reason we have databases is often specifically so we can have persistent data that's updated gradually over time, with one or more programs used to update it along the way. In such cases, the program and the database are symbiotic; we can't run the program without the database, we can't (as easily) interact with the database without the program, and changes to the design of one will have a commensurate impact on the other. (In your case, the design of the database is a given, so you won't have to worry about changes in the database's design affecting the design or implementation of the program, but that's a realistic concern in real-world programs that evolve over long periods of time; schemas change, and programs have to change to accommodate those changes accordingly.)
Not all students in ICS 33 will necessarily have a prior background in writing graphical user interfaces using tkinter — it's covered in varying levels of depth in ICS 32 and ICS H32, depending on both instructor and quarter — so we've provided one in its entirety, though it's largely non-functional as provided, because the "engine" behind it — the code that interacts with the underlying database — is entirely missing and will need to be
provided by you. We'll take a look at the details of how to do that a little later in the project write-up.
In terms of functionality, your program will have to meet the following basic requirements.
· Search for continents in the database, given either a continent code, a name, or both, displaying all of the continents that exactly match the given characteristics.
· Add a new continent to the database, by specifying the various data points that describe them (except their primary key).
· Update an existing continent in the database, changing any of the various data points that describe them (except their primary key).
· Search for a country in the database, given either its country code, its name, or both, displaying all of the countries that exactly match the given characteristics.
· Add a new country to the database, by specifying the various data points that describe them (except their primary key).
· Update an existing country in the database, changing any of the various data points that describe them (except their primary key).
· Search for a region (a part of a country) in the database, given either its region code, its local code, its name, or some combination of them, displaying all of the regions that exactly match the given characteristics..
· Add a new region to the database, by specifying the various data points that describe them (except their primary key).
· Update an existing region in the database, changing any of the various data points that describe them (except their primary key).
Notably, not all of the tables in the provided database are represented in the user
interface, nor are they required to be, but the three tables that are represented should be sufficient to gain the necessary experience with how you can approach problems like this.
Persistence
It's worth noting that persistence is not just a concept we're discussing generally; it's a
requirement. Changes made to the underlying data must be persistent, which is to say
that ifyou make a change via the user interface, quit the program, start the program
again, and open the same database, the changes made in the previous run of the program will still be present in the database. (That's one of the main reasons why we use databases, after all.)
Interlude: Organizing Python modules using packages
Because so much code is provided in this project, one of the things you'll need to do early on, before you start working on implementing the project, is to acquaint yourself with what's already there. Immediately, you'll see that the provided code is organized differently than you might have seen in your past work; it's made up of what are called Python packages. Since Python packages are likely to be new territory for you, let's stop briefly and talk about what problem they solve and the necessary details about their mechanics.
When we want to split up a collection of functions and classes, so that similar ones are grouped together and dissimilar ones are kept separate, we use modules to do that. Each module is written in a separate file (with a name ending in .py), and we use Python's import statement to allow the code in one module to make use of things written in other modules. We use naming conventions like leading underscores to indicate that some things are protected and shouldn't be used outside of their own module. All of that allows us to avoid writing large programs in a single file, which has all kinds of advantages.
But what do we do when there are so many modules that they become unwieldy to manage? The provided code spans fifteen or so modules, and it's not uncommon for one module to need to import things from multiple of them. That's a lot to complexity to track in our heads at once — more so for you, too, because you're new to this project and you weren't the original author of the provided code! — so it would be nice if there was a way of organizing all of these modules somehow, allowing us to achieve two goals.
· Grouping modules into directories, so that strongly-related modules are grouped together and less strongly-releated modules are kept separate. We want the same thing here that we wanted when we were grouping functions and classes into
modules: high cohesion and low coupling. But now we want it from the perspective of modules instead of individual functions or classes; the modules in one directory are strongly related to each other, while the modules in different directories solve different kinds of problems. This way, when we're looking for the part of our
program that's related to one broad kind of problem — a user interface, a network protocol, or whatever — we'll have an idea where to look.
· Providing a way for all of the modules in a directory to appear outwardly like they were a single module, so that we could import an entire directory's worth of
modules at once, instead of importing each one independently. This way, when
we're working on code in one part of a program, we won't have to remember every detail of which functions are defined in which modules in some other part of our program; we'll just need to know that "When I want something to do with the user interface, I want something in the p2app/views directory."
Python packages are meant to achieve those two goals. A Python package is a directory underneath the root directory of our project, in which there are multiple .py files. The relative path to that directory indicates the package's name. In the provided code for this project, for example, you'll find three packages named p2app.engine, p2app.events, and p2app.views. But there's a little more to the story than just the organization of files into a directory structure; the import statement gives us some tools to use packages more effectively, too.
When we want to import a single module from within a package in our project, we do so by specifying a sort of "path" to it. For example, if we wanted to import the module
main.py from the p2app/views directory, we could do so this way.
import p2app.views.main
After having done that, we can access what's in that module in the usual way. For example, if that module contained a function named foo, we could call it like this.
p2app.views.main.foo()
When we want to import an entire package all at once, we do so by specifying the name of the directory. For example, if we wanted to import the entire p2app.views packages, we could write this.
import p2app.views
But what did we get? The p2app.views package contains several different modules,
each containing functions and classes; so, which ones did we just import? The answer
lies in one more detail: A directory that we want to be able to treat as a package like this needs to contain a file named in it .py, whose job is to specify what the contents of the package are. Whatever it imports becomes what's imported when we say import
p2app.views. For example, the contents of p2app/views/ in it .py in the provided code is a single line.
from .main import MainView # The ".main" notation means "The mod
# the directory we 're currently in."
Consequently, when we say import p2app.views, what we get is exactly one definition:
the class MainView from p2app/views/main.py. Nothing else in any of the package's
modules is made available this way, mainly because everything else in that package is meant to be used only within that package. (That doesn't stop us from importing those other modules individually and using what's in them — as usual, Python enforces few rules about access restrictions — but, for most uses, we'd simply import a package we need and be done with it.)
There's a little more to the story of packages in Python than this, but Python's own
detailed documentation about its package feature tells it as least as well as I could; you can find that documentation at the link below.
· Python's documentation for the package feature
An overview of the program's design
Now that we've discussed how Python packages allow a large program to be organized, we can proceed with building an understanding of the design of the provided program. The program is organized into four major areas.
· A package named p2app.views, which defines a t kinter-based graphical user interface giving a user the ability to view and edit the information in the
airport.db database. All of the necessary functionality is already in place, so you will not need to make any modifications to this package.
· A package named p2app.engine, in which you'll write the part of the program
necessary to communicate with the airport.db, so that the provided graphical
user interface can obtain and modify the information in that database. Almost all of the necessary code is missing from this package, so this is where you'll be working.
· A package named p2app.events, which provides the tools necessary to allow the p2app.views package to communicate with the p2app.engine package. When the user interface needs to engine to perform. some operation, it uses events to do it;
when the engine needs to communicate its results back to the user interface, it uses events to do it. All of the necessary functionality is already in place, so you will not need to make any modifications to this package.
· A "main" module named project2.py that initializes the necessary parts of the program, then launches the user interface. You will not need to make any
modifications to this module.