Creating and populating a database.

Once I had installed PostGIS it was relative simple to create a spatial database instance.  For convenience the first thing I did was create a database user with the same name as my Linux username.  This means that the default Postgres authentication system will just work transparently:

sudo -u postgres /usr/bin/createuser -s -e -P kms

If you run this command as is you’ll be prompted to enter a password for the newly created user, this is not strictly necessary for what I’ll be doing today, but it will be needed as I install and configure tools to use the database.  You should also note that this command will grant this user superuser privileges across the entire Postgres installation, you probably shouldn’t do this on a production system.

The next four commands create and then configure the database:

createdb -e opendata
createlang -e plpgsql opendata
psql -d opendata -f /usr/share/pgsql/contrib/postgis-1.5/postgis.sql
psql -d opendata -f /usr/share/pgsql/contrib/postgis-1.5/spatial_ref_sys.sql

The first command creates the database, which I’ve chosen to call “opendata”.  The next command configures the new database to support the PL/pgSQL programming language which is used by the functions that make up PostGIS.  The third command loads the PostGIS functions.  Finally the fourth command creates an new table in the database called spatial_ref_sys which holds information about different spatial reference systems and allows us to use database functions to transform between these systems.  Note that I’ve given the first two commands the -e flag, this forces the commands to echo the SQL statements to the console so you can see what’s happening.

To confirm that the database has been created I connected using the psql command and listed the tables:

$ psql -d opendata
psql (8.4.6)
Type "help" for help.

opendata=# \d
 List of relations
 Schema |       Name        | Type  | Owner
--------+-------------------+-------+-------
 public | geography_columns | view  | kms
 public | geometry_columns  | table | kms
 public | spatial_ref_sys   | table | kms
(3 rows)

opendata=#

The OS OpenData Strategi shapefile dataset that I previously downloaded arrives as a Zip file.  Unzipping this gives me a top level folder called Strategi Shape.  Under this are three directories: data, doc, and gazetteer.  I’m going to ignore the gazetteer data for the time being, and the doc directory contains README files and licencing information.

The data directory contains two sub-directories: GB_NORTH and GB_SOUTH.  These directories contain the actual shapefiles.  The shapefiles contain three different types of geometric data: LINES, POLYGONS, and POINTS, the first step is to create a table for each data type:

cd /path/to/Strategi\ Shape/data/GB_NORTH/
shp2pgsql -p -I -s 27700 admin_polyline strategi_line | psql -d opendata
shp2pgsql -p -I -s 27700 admin_font_point strategi_point | psql -d opendata
shp2pgsql -p -I -s 27700 foreshor_region strategi_region | psql -d opendata

Replace “/path/to/” with the path to your unzipped data.  The -p flag to shp2pgsql triggers prepare mode, in this mode shp2pgsql only creates the tables and does not populate them.  The -I flags creates an index on the geometry column in the table.  You could do this as a separate step, but it’s convenient to do it here.  Finally the -s 27700 instructs PostGIS about the datum our data is using, in this case OSGB36.  I’m planning a further article to go into the whole datum thing in more detail.

Notice that the output of shp2pgsql is piped into the psql command.  This is because the output of shp2pgsql is a sequence of SQL commands that create and/or populate the database.  If you chop off the pipe and the psql command you’ll see the SQL echoed to your screen.

At this point I can connect to the database again using psql and examine the tables that have been created:

opendata=# \d
 List of relations
 Schema |          Name           |   Type   | Owner
--------+-------------------------+----------+-------
 public | geography_columns       | view     | kms
 public | geometry_columns        | table    | kms
 public | spatial_ref_sys         | table    | kms
 public | strategi_line           | table    | kms
 public | strategi_line_gid_seq   | sequence | kms
 public | strategi_point          | table    | kms
 public | strategi_point_gid_seq  | sequence | kms
 public | strategi_region         | table    | kms
 public | strategi_region_gid_seq | sequence | kms
(9 rows)

This is where I ran into a problem.  While trying to populate the point table I kept getting constraint errors.  After battering at the problem for a couple of hours I turned to the GIS Stack Exchange website for help.  This is an excellent resource for those wanting to ask questions (or provide answers) on issues about GIS.  After a little bit if investigation and discussion user amercader came up with the answer.  There was a mismatch between the geometry that shp2pgsql was using to create the table (MULTIPOINT) and the geometry of the data that it was trying to load (POINT).

The solution was to drop the table from the database, re-run the shp2pgsql command to create the database but direct the output to a file rather than passing straight to psql:

$ shp2pgsql -p -I -s 27700 admin_font_point strategi_point > create_point.sql
Shapefile type: MultiPoint
Postgis type: MULTIPOINT[2]

I then edited the resulting SQL to change the geometry in the third last line to be POINT rather than MULTIPOINT:

SELECT AddGeometryColumn('','strategi_point','the_geom','27700','POINT',2);

I then loaded this SQL into the database using psql:

psql -d opendata -f create_point.sql

With the tables set up I could then write a short shell script to load each of the shapefiles into the correct table:

#!/bin/sh
cd /path/to/Strategi\ Shape/data/GB_NORTH/
for F in `ls *.shp | grep polyline`; do
    B=`basename ${F} .shp`
    shp2pgsql -a -s 27700 ${B} strategi_line | psql -d opendata
done
for F in `ls *.shp | grep point`; do
    B=`basename ${F} .shp`
    shp2pgsql -a -s 27700 ${B} strategi_point | psql -d opendata
done
for F in `ls *.shp | grep region`; do
    B=`basename ${F} .shp`
    shp2pgsql -a -s 27700 ${B} strategi_region | psql -d opendata
done
cd ../GB_SOUTH/
for F in `ls *.shp | grep polyline`; do
    B=`basename ${F} .shp`
    shp2pgsql -a -s 27700 ${B} strategi_line | psql -d opendata
done
for F in `ls *.shp | grep point`; do
    B=`basename ${F} .shp`
    shp2pgsql -a -s 27700 ${B} strategi_point | psql -d opendata
done
for F in `ls *.shp | grep region`; do
    B=`basename ${F} .shp`
    shp2pgsql -a -s 27700 ${B} strategi_region | psql -d opendata
done

This could take some time to complete depending on how fast your machine and disks are, but once it’s done that’s the database set up and the data loaded.  The -a flag tells shp2pgsql to append the data rather than the default which is to drop the existing table and then load the new data.  To check that the data has been loaded I counted the rows in each table:

opendata=# SELECT COUNT(*) FROM strategi_line;
 count  
--------
 318204
(1 row)

opendata=# SELECT COUNT(*) FROM strategi_point;
 count  
--------
 372071
(1 row)

opendata=# SELECT COUNT(*) FROM strategi_region;
 count 
-------
 42943
(1 row)

Finally to optimize the query plan I ran the VACUUM ANALYZE command in psql.

The next step is to actually use the data, hopefully I’ll get to that later in the week.

About these ads
This entry was posted in Mapping, Tools and tagged , , . Bookmark the permalink.

2 Responses to Creating and populating a database.

  1. Pingback: Visualising data with Quantum GIS. | Using OpenData

  2. Pingback: Loading Landform Panorama Data into Postgres/PostGIS. | Using OpenData

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s