Friday, May 23, 2008
Uploading shapefiles to PostGIS
I will be able to upload the shapefiles of the datasets I created as soon as Scott can fix my linux account. Seems like when I was gone for a few weeks without logging in my account reset or something. So I am basically setting here with a bunch a data that I can't do anything with. As soon as I can get this data uploaded I can start using the API and bang out some code or something.
Thursday, May 22, 2008
PostGIS info
I am currently trying to implement PostGIS databases for use in our project. So far I have installed the software needed to run the open source database, and I am currently reading through the documentation trying to get an idea of how it works.
Some interesting things that I would like to have are:
GiST Indexes
GiST stands for "Generalized Search Tree" and is a generic form of indexing. In addition
to GIS indexing, GiST is used to speed up searches on all kinds of irregular
data structures (integer arrays, spectral data, etc) which are not amenable to normal
B-Tree indexing.
Once a GIS data table exceeds a few thousand rows, you will want to build an index
to speed up spatial searches of the data (unless all your searches are based on
attributes, in which case you’ll want to build a normal index on the attribute fields).
The syntax for building a GiST index on a "geometry" column is as follows:
CREATE INDEX [indexname] ON [tablename] USING GIST ( [geometryfield] );
Building a spatial index is a computationally intensive exercise: on tables of around
1 million rows, on a 300MHz Solaris machine, we have found building a GiST index
takes about 1 hour. After building an index, it is important to force PostgreSQL to
collect table statistics, which are used to optimize query plans:
VACUUM ANALYZE [table_name] [column_name];
-- This is only needed for PostgreSQL 7.4 installations and below
SELECT UPDATE_GEOMETRY_STATS([table_name], [column_name]);
GiST indexes have two advantages over R-Tree indexes in PostgreSQL. Firstly, GiST
indexes are "null safe", meaning they can index columns which include null values.
Secondly, GiST indexes support the concept of "lossiness" which is important when
dealing with GIS objects larger than the PostgreSQL 8K page size. Lossiness allows
PostgreSQL to store only the "important" part of an object in an index -- in the case
of GIS objects, just the bounding box. GIS objects larger than 8K will cause R-Tree
indexes to fail in the process of being built.
Using Indexes
Ordinarily, indexes invisibly speed up data access: once the index is built, the query
planner transparently decides when to use index information to speed up a query
23
Some interesting things that I would like to have are:
GiST Indexes
GiST stands for "Generalized Search Tree" and is a generic form of indexing. In addition
to GIS indexing, GiST is used to speed up searches on all kinds of irregular
data structures (integer arrays, spectral data, etc) which are not amenable to normal
B-Tree indexing.
Once a GIS data table exceeds a few thousand rows, you will want to build an index
to speed up spatial searches of the data (unless all your searches are based on
attributes, in which case you’ll want to build a normal index on the attribute fields).
The syntax for building a GiST index on a "geometry" column is as follows:
CREATE INDEX [indexname] ON [tablename] USING GIST ( [geometryfield] );
Building a spatial index is a computationally intensive exercise: on tables of around
1 million rows, on a 300MHz Solaris machine, we have found building a GiST index
takes about 1 hour. After building an index, it is important to force PostgreSQL to
collect table statistics, which are used to optimize query plans:
VACUUM ANALYZE [table_name] [column_name];
-- This is only needed for PostgreSQL 7.4 installations and below
SELECT UPDATE_GEOMETRY_STATS([table_name], [column_name]);
GiST indexes have two advantages over R-Tree indexes in PostgreSQL. Firstly, GiST
indexes are "null safe", meaning they can index columns which include null values.
Secondly, GiST indexes support the concept of "lossiness" which is important when
dealing with GIS objects larger than the PostgreSQL 8K page size. Lossiness allows
PostgreSQL to store only the "important" part of an object in an index -- in the case
of GIS objects, just the bounding box. GIS objects larger than 8K will cause R-Tree
indexes to fail in the process of being built.
Using Indexes
Ordinarily, indexes invisibly speed up data access: once the index is built, the query
planner transparently decides when to use index information to speed up a query
23
Subscribe to:
Posts (Atom)