Geoparsing Historic Texts: Gutenkarte and the GeoParser
Schuyler Erle
MetaCarta
Christopher Schmidt
MetaCarta
Gutenkarte
- Gutenkarte puts books on a map
- Help understand georeferences in historical books
- First book: The History of the Peloponnesian War
Map View
- Shows all places in book
- Click for information about all references
- Uses OpenLayers, PostGIS, Mapserver
Place View
- Reference Count
- Reference List
- Average Confidence
- Wikipedia Link
Browse View
- Read book with built in references
- Hover over a place to see updated area of map
Technology Behind Gutenkarte
- Python
- PostGIS
- MapServer
- GeoParser
PostGIS in Gutenkarte
- Locations in PostGIS
- Attributes of locations affect size
- PostGIS SQL in DATA Statement
location from (
SELECT oid, etext_id, id as place_id,
setsrid(location, 4326) as location, name,
round(8 + ln(frequency * confidence)) as size
FROM place
WHERE etext_id = %etext%
ORDER BY frequency * confidence ASC)
AS the_geom USING UNIQUE oid USING SRID=4326
MapServer
- Labels are tiled layer above base WMS
- Base WMS VMap0 served by MetaCarta
GeoParser
- MetaCarta Technology
- Extracts georeferences from unstructured text
- Natural Language Processing, Gazetteer
GeoParser in Gutenkarte
- Limit bounding box to given area
- Store a reference for each hit
- Number of references determines 'importance'
GeoParser in your own Apps
- Available as public API from labs.metacarta.com
- Output as XML, Javascript, Map Image
- Can be added as callback via <script>
GeoParser Usage -- Javascript
'Locations': [
{
'Name': 'Lausanne',
'Type': 'PPLA',
'Population': 118015,
'Hierarchy': 'Switzerland/Vaud/Lausanne',
'ViewBox': [6.166667, 46.033333, 7.166667, 47.033333],
'Centroid': [6.666667, 46.533333],
'References': [{
'Georef': 'Lausanne',
'Relevance': 1.000000,
'Confidence': 0.182457,
'ExtractStart': 0,
'ExtractEnd': 40,
'Extract': 'I took the train from Geneva to Lausanne',
'Subranges': [
{'Start': 32, 'End': 40}
]
}]
}]
GeoParser Usage -- loc query
- Gazeteer Lookup
-
loc=Geneva
output=png
GeoParser Usage -- loc, js
'Name': 'Geneve',
'Type': 'PPLA',
'Population': 181492,
'Hierarchy': 'Switzerland/Geneva/Geneve',
'ViewBox': [5.566667, 45.600000, 6.766667, 46.800000],
'Centroid': [6.166667, 46.200000]
GeoParser Usage -- Javascript Callback
- GeoParser callback arg: handler
- Wraps data in handler({}) call
- Add script to DOM
- Direct from JS apps
- No proxy, no cross domain issues
GeoParser Usage -- Callback Example
- var s = document.createElement('script');
- var h = document.getElementsByTagName('head')[0];
- s.src="http://labs.metacarta.com/GeoParser/"+
"?output=js&loc=Geneva&handler=callback"
- h.appendChild(s)
- callback will be called, with the returned data as a single argument.
GeoParser Usage -- Python Example
import sys, urllib
data = eval(
urllib.urlopen(
"http://labs.metacarta.com/GeoParser/" +
"?output=js&loc=%s" % urllib.quote(sys.argv[1])
).read()
)
for i in data['Locations'][:3]
print i['Name'],
"%s,%s" % (i['Centroid'][0], i['Centroid'][1]),
i['Hierarchy']
Output for "Chicago":
Chicago -87.68,41.84 United States/Illinois/Cook/Chicago
Chicago -91.616667,14.083333 Guatemala/Suchitepéquez/Chicago
Chicago -99.24639,46.84222 United States/North Dakota/Stutsman/Chicago
For more...
- Check out http://labs.metacarta.com
- Send email to [email protected] with questions
- Current usage limitation is 100 hits/day per ip...
- Please email if you need more!