CS 474 GP2: Schema Design
James Madison University, Spring 2017 Semester

Electronic submission due Feb 16th at 11:59 PM.
Each group creates a schema directory on GitHub.


The main goals of this assignment are: (1) create the initial design of your project database schema, and (2) import VLDS and other data into your group database. In addition to the VLDS data sets, you must include additional data from doe.virginia.gov, research.schev.edu, data.virginia.gov, and/or other agencies.

Please note this project will take some time. DO NOT WAIT to start until a couple days before the deadline. Divide and conquer!


  1. Determine what data will be necessary for your application. You should brainstorm as a group what relations and attributes you will need before moving on to Step 2.

  2. Write CREATE TABLE statements for group database. Pick suitable data types for each attribute; see the PostgreSQL docs for a list of available options. Determine which attributes will be NOT NULL. For each table, write a brief one-sentence description in a COMMENT statement.

  3. Develop SQL queries to COPY the data from vdoe and CSV files. Use the count(*) and LIMIT techniques discussed in class to debug your queries. Then paste them into a shell script to perform the actual import on the database server (see example below).

  4. Write an SQL script to count the number of rows in each of your tables. After importing the data in the previous step, check to make sure the counts are what you expect them to be. (If a table has too many or two few rows, then there was a problem with the import.)

  5. Create a database diagram (e.g., using Dia) that summarizes your tables and any constraints. Write the number of rows in the "comment" property of each table, and be sure to check the "comment visible" option. Your diagram should fit on a single page.

  6. Finally, write a README.md file that documents the steps needed to re-create your database from scratch. It will be useful both during the semester and after the course ends.

Beginning with this assignment, we will use GitHub to manage and submit your group project files. When grading, I will export your repository as of the due date.

Remember to commit and push your changes as you work on the project. Each member of your team must make at least one commit, and all commits should have a brief comment. When I browse your submission history, it should show evidence of incremantal work (i.e., not just one big commit at the end by a single team member).

Getting Started

The following files are provided as a template for your solution. One team member should save them into your schema directory, run the command git add on each file, then git commit -m "added GP2 template files", and finally git push. The rest of the team should be able to see/edit the files after running git pull. (NOTE: You don't need to add the college.csv file unless you plan to use it in your database.)

Step File Contents
1 college.csv Example optional NCES data set
2 create.sql DROP/CREATE table statements
3 copy.sh Script to COPY and \copy data
4 stats.sql Add keys and analyze the tables
5 schema.dia Database diagram with row counts
6 README.md Steps for creating your database

Do not store large files in your repository. Instead, provide a link to where you downloaded them in your readme.txt file. It's okay to store smaller (less than 1MB) data sets such as Excel spreadsheets, PDF files, and CSV data edited by hand.

For the VLDS data sets, it's okay to select a subset of rows (e.g., by filtering out certain school years and/or types of schools). However your final application must include all key attributes (sch_year, div_num, sch_num, race, gender, disabil, lep, disadva).

Client Meeting

Each team has been assigned a mentor from one of the following agencies. They have taken an interest in the work you are doing and graciously offered their time to provide feedback.

Your mentors each received the following instructions in an email:

You will be hearing from your teams soon. Each team will coordinate a 30-minute Google hangout with their client (you). The target timeline for these virtual meetings is between the 13th and the 17th. Google Hangouts has proven the most effective method for this. Details can be coordinated by team and mentor. During these virtual meetings, teams will pitch their app. Your role will be that of a user/client and you can provide honest feedback on their app. Consider the creativity, usefulness, user experience, originality and use of the data when providing feedback. The goal of this interaction is for you to provide them feedback to help them improve their apps prior to the final judging (and to be more useful to you and your stakeholders). After this virtual meeting, your team may request to email you for follow-on questions.

Take notes during the meeting! Later in the semester you will submit a report with the following sections:

  1. Briefly describe your client, where they work, what they do, their connection to Apps4VA, etc.
  2. Summarize your discussion, what questions your mentor asked, and the feedback you received.
  3. What insights you learned from the discussion, and what changes you will make to your project.