I’ve been using the maxmind geo city lite data for a couple years now and wanted to share what I’ve learned.
Setting up the data
Firstly, there is a great article here about constructing your table and importing the data Geo IP location system in MySQL . My import for windows and my table was slightly different –
LOAD DATA LOCAL INFILE 'C:\\GeoLiteCity-Blocks.csv' INTO TABLE geo_city_ip_block_new CHARACTER SET utf8 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' IGNORE 2 LINES (start_ip, end_ip, location_id);
LOAD DATA LOCAL INFILE 'C:\\GeoLiteCity-Location.csv' INTO TABLE geo_city_ip_location_new CHARACTER SET utf8 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' IGNORE 2 LINES (location_id, country, region, city, zip, latitude, longitude, metro_code);
Be sure to load this data via the command line b/c it’s WAY faster… like 4 seconds as compared to ten minutes or more through phpmyadmin or some other client software. As of this writing, the blocks data is 2.256 million records.
Querying the data
My query initially looked just like the query Mr. Dobrzanski says not to do (using a between clause) and it took 0.2 seconds to complete. My query actually ran faster without any indexes on the columns, with the exception of having one on the location_id. After changing to his query using the less than, sorting, and limit, and also adding a primary key on the start and end ip address in the block table, it completed in 0.0004 seconds!
SELECT * FROM web_data.geo_city_ip_block b INNER JOIN web_data.geo_city_ip_location l USING(location_id) WHERE b.start_ip <= INET_ATON('your ip') ORDER BY b.start_ip DESC LIMIT 1
More about Maxmind’s data
Maxmind updates their data frequently; they publish a new data set every month. For each month out of date your data is, it loses 1.5% accuracy. I think the last time I updated my data was like two years ago :). The location data, which is currently 403,000 records, contains mostly historical locations and is no longer of any use if you’re searching for locations by latitude and longitude as with the below query
SELECT *, ( 3959 * acos( cos( radians(35.1495343) ) * cos( radians( latitude ) ) * cos( radians( longitude ) - radians(-90.0489801) ) + sin( radians(35.1495343) ) * sin( radians( latitude ) ) ) ) AS distance FROM web_data.geo_city_ip_location HAVING distance < 50 ORDER BY distance LIMIT 20
Take zip code 30301 for example. The locations data says it’s in both Atlanta GA and Jackson TN. This is not possible. The most up to date and authoritative data source for current zip codes is USPS’s zip code lookup tool. To clean out the old data that is of no use, run the below sql. Having less than half the data to look through will also make the above query complete over twice as fast.
DELETE FROM geo_city_ip_location WHERE NOT EXISTS ( SELECT * FROM geo_city_ip_block WHERE location_id = geo_city_ip_location.location_id )
After purging the historical data you’ll see the location for zip code 30301 Jackson TN is gone and the location table now contains only 118,000 records.