Postgis 2.1 ( on debian 8) make geocoding US addresses ( about 40 Million ) quite easy, and it is totally free! Thanks for the PostGIS and PostGRESQL open source communities.
To import all the US tiger data from ftp://ftp2.census.gov/geo/tiger/, you need at least 110 G disk partition for postgresql, I would recommend 200G to be safe. And another 40G disk space to download tiger data ( postgis2.1 will generate a script to use /gisdata to store all the downloaded data ).
Now step by step:
(1) postgis and postgresql
apt-get install postgresql-9.4-postgis-2.1 postgresql-9.4-postgis-2.1-scripts postgresql-9.4-postgis-scripts
(2) createdb comrite_geocoder
psql -d comrite_geocoder -c “CREATE EXTENSION postgis;”
psql -d comrite_geocoder -c “CREATE EXTENSION postgis_topology;”
psql -d comrite_geocoder -c “CREATE EXTENSION fuzzystrmatch;”
psql -d comrite_geocoder -c “CREATE EXTENSION postgis_tiger_geocoder;”
now you should able to test if the PostGIS installation is correct or not .
SELECT ‘foss4g2013′ As event, n.*
FROM normalize_address(’30 South 7th Street, Minneapolis, MN 55402’) As n;
(3) now import nation data:
psql comrite_geocoder -c “SELECT loader_generate_nation_script(‘sh’); ” -A -o nation.sh
you may need to adjust nation.sh something like and change as following if you run as postgres user:
HOST=””
PGDATABASE=comrite_geocoder and
PGBIN=/usr/bin,
PGPASSWORD=””
(4) the next step is to load all US states data:
psql comrite_geocoder -c “SELECT loader_generate_script(ARRAY[‘AL’, ‘AK’,’AZ’,’AR’], ‘sh’);” -A -o A.sh
the above A.sh will load AL, AK, AZ, AR data, again you have to adjust those settings.
you can write a script to install all US states. I have written a script to adjust to my needs. it take about 24 hours to import all those data.
(5) testing
Now you should be able to test geocode as:
SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat,
(addy).address As stno, (addy).streetname As street,
(addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip
FROM geocode(‘300 WATER STREET , MONTGOMERY, ALABAMA 36104’) As g;
It should give the result like this:
rating | lon | lat | stno | street | styp | city | st | zip
——–+——————+——————+——+——–+——+————+—-+——-
0 | -86.313876308318 | 32.3804685457777 | 300 | Water | St | Montgomery | AL | 36104
(1 row)
Notes: to make the import painless, I would recommend you download tiger data first on your local network.