Your Favourite Map Projection

November 17, 2011, 10:32 pm

≫ Next: Let's do it again at FOSS4G-NA!

...and what it says about you.

XKCD goes geospatial! (Happens to everything eventually...)

↧

Let's do it again at FOSS4G-NA!

January 27, 2012, 8:06 pm

≫ Next: Barnes Analysis for Surface interpolation

≪ Previous: Your Favourite Map Projection

FOSS4G 2011 in Denver last September was such a good time, we're going to do it all over again at FOSS4G North America 2012!

It will be interesting to see who and how many attend. Some conference budgets might have been blown last year - but note that this event is cleverly timed to fall into a new fiscal! And being in D.C. there should be lots of suits and spooks in attendance.

I'm definitely looking forward to building on the momentum from the last conference, especially given my new gig with OpenGeo (ok, I'm definitely not booth-babe material - but being a booth-geek is probably more fun).

↧

Barnes Analysis for Surface interpolation

February 21, 2012, 9:50 pm

≫ Next: A cool application of JTS Voronoi diagrams

≪ Previous: Let's do it again at FOSS4G-NA!

Recently I've been working on generating surfaces from irregular sets of data observations, for thematic rendering purposes. The data I'm working with is meteorological measurements (e..g max and min temperature, average rainfall, etc.). Here's a sample:

Maximum Daily Temperatures for November 30, 2010

There are many, many different ways of interpolating surfaces - Wikipedia has a long list. It turns out that a classic approach for interpolation of meteorological data is Barnes Interpolation (also called Barnes Analysis). Barnes Interpolation is a surface estimating technique which uses an initial pass to provide a smoothed estimate of the surface, and optionally further refinement passes to improve the surface to more closely match the observation values. The initial pass produces an averaged (smoothed) surface, using a summation of exponential (Gaussian) decay functions around each observation point. Subsequent refinement passes compute an error surface using the delta between the previous estimated surface and the observations. The error surface is added to the previous estimate surface to refine the estimate (by reducing the delta to the observations).

I'm more of a vector guy than a raster guy, but I figured that Google and Wikipedia would make short work of this - after all, how hard can raster be? Unfortunately, the Wikipedia article is not very helpful - the description is unclear, and it doesn't even define all the symbols used in the presented equations. I found some better references here and here, but they still contained some confusing discrepancies in the equations. Barnes' original paper is online as well, but as is often the case with primary sources, it is more concerned with the physical properties of the analysis, and doesn't have a very crisp description of the actual algorithm.

Triangulating between the various sources, I eventually arrived at the following algorithm definition (provided herewith without warranty!):

Barnes Interpolation Algorithm

For the first pass, the estimated value E_g at each grid point g_x,y is:

E_g = ∑( w_i * o_i) / ∑( w_i )

where:

o_iis the value of the i'th observation point
w_i is the weight (decay function) value for the i'th observation point, defined as

w_i = exp(- d_i² / L²C )

where:

d_i is the distance from the grid point to the i'th observation point
L is the length scale, which is determined by the observation spacing and the natural scale of the phenomena being measured.
C is the convergence factor, which controls how much refinement takes place during each refinement step. In the first pass the convergence factor is 1. For subsequent passes a value in the range 0.2 - 0.3 is effective.

During refinement passes the estimate at each grid point is re-computed using:

E'_g = E_g + ∑( w_i * (o_i - E_i) ) / ∑( w_i )

where:

E_i is the estimated value at the grid cell containing the i'th observation point

The target development platform is Java (of course). For prototyping purposes I used JEQL to read sample datasets, run the algorithm with various parameter choices, and produce plots to visualize the results. Here's the surface plot for the dataset above:

This looks pretty reasonable, and provides a nice visualization of the large-scale data trend. (Surprise! It's hotter down south...) The ultimate goal is much more exciting. The algorithm will be provided as a Rendering Transformationin GeoServer, allowing surfaces to be generated dynamically from any supported datastore. This will open the way to providing all kinds of further surface visualizations, including other interpolation techniques - as well as everyone's favourite, Heat Maps (aka Multivariate Kernel Density Estimation).

Now, the pure mathematical description of the algorithm is sufficient for a proof-of-concept implementation producing single images. But to make it usable in a dynamic way in a high-demand environment, there are more details which need to be worked out to improve performance and functionality. For instance:

How large an extent of data points is necessary to produce a stable surface for a given area? This is important for dynamic presentation, since zooming, panning and tiling should not change the appearance of the surface. Of course, all data points could always be used, but more selectivity improves performance.
How far should the surface be extrapolated into areas of few or no data observations? Apart from the validity implications, limiting the extent of the computed surface provides a boost in performance.
A further performance increase may be gained by computing the surface on a coarse grid, then up-sampling in an appropriate way to the display resolution.
For data which is global in scope, how does the algorithm need to be changed to accommodate computation on the spheroid? It seems likely that it's necessary to use a geodetic distance calculation, and there may be other impacts as well.

↧

A cool application of JTS Voronoi diagrams

February 22, 2012, 9:25 pm

≫ Next: Battle of the Spatial Scripting Languages!

≪ Previous: Barnes Analysis for Surface interpolation

I just noticed this series of posts by Stephen Mather on generating medial axes of polygons by creating Voronoi diagrams on the densified linework.

If I read correctly, he's doing this via Python GeoScript, which uses JTS for the heavy lifting.

As he says, this is a venerable technique ( hack) for creating medial axis approximations. Regardless of its computational soundness (and hey, whatever works), it's a great example of using a series of complex geometric operations to compute a useful result.

I'll have to try this in JEQL....

(Stephen seems to have moved on to implementing the Scale Axis Transform in PGSQL - very impressive!)

↧

Battle of the Spatial Scripting Languages!

March 6, 2012, 9:54 am

≫ Next: Java's vision for the future

≪ Previous: A cool application of JTS Voronoi diagrams

The gloves are down in this exchange on gis.stackexchange, over LOC counts for Fiona, JEQL, and GeoScript-JS.

Fiona takes one on the chin in the first round! But Sean recovers quickly, and makes a hit by pointing out that Fiona allows defining new functions right in the scripting language. This would be feasible in JEQL as well, by providing hooks to into JVM-based scripting languages. And it's also very easy to develop extensions as simple Java functions or classes.

Now, this is really just a schoolyard scuffle over a very simple use case for spatial scripting. What the fans really want to see is a face-off over a more complex task...

↧

Java's vision for the future

March 8, 2012, 12:41 pm

≫ Next: Reminiscing about the PC Pre-Cambrian explosion

≪ Previous: Battle of the Spatial Scripting Languages!

Just noticed this slideshow from QCon 2012 on the vision for Java's evolution over the next few versions.

Some highlights of JDK 8:

closures via lambda expressions
filter-map-reduce framework
modules

and further out:

JNI goes away!
support for big in-memory data (arrays) and data structure optimizations

A key driver of these enhancements is the transition to multi-core and the end of the free performance lunch. This is a great graph that puts it in perspective - those flat lines are the lunch bill arriving.

QCon always has excellent presentations, so I'm sure there's some other thought-provoking slideware on that site as well.

↧

Reminiscing about the PC Pre-Cambrian explosion

March 9, 2012, 9:44 am

≫ Next: The TI-58/59 programmable calculators

≪ Previous: Java's vision for the future

With the mobile meteorite looming ever larger over the landscape of Planet PC, it seems like a good time to recall some memories of the last major transition in the computer world. I call this the Pre-Cambrian explosion of Personal Computers. It took place from the mid-70's to the early 80's, when dozens of entrepreneurial companies sprang up all hoping to hit the magic combination of hardware and software and sell a zillion units.

In a way, it was exactly the inverse of what is happening now. Back then, a myriad of bizarre life forms sprang up to fill the niche exposed by the advent of the microprocessor, only to be driven into extinction by the arrival of a behemoth with superior reproductive capability.

Whereas today, the new ecological zone was initially colonized by a single fierce predator, who may yet out-compete all other arrivals trying to crawl out onto the shore of the new territory.

On to a series of occasional posts on the Pre-Cambrian PC era...

↧

The TI-58/59 programmable calculators

March 9, 2012, 9:50 am

≫ Next: Battle of the SSLs - Round 2

≪ Previous: Reminiscing about the PC Pre-Cambrian explosion

Mobile computing is nothing new! My first programmable device was a TI-58 calculator:

I spend many happy hours punching in the code for Lunar Lander (and less time actually playing it). Wish I still had my coding sheets for the various apps I dreamed up. (For some reason, the convention then was to write the key codes vertically, rather than horizontally.)

TI had their own app store, selling Solid State Software^TM on plug-in ROM modules. Of course I got the Leisure one, which had a few lame games on it. Then as now, though, I had a lot more fun coding up new apps than playing games...

The calculator cognoscenti will notice the image is actually the TI-59, with the built-in magnetic card reader for offline permanent storage. Had one of those too, once I got tired of eternally punching in programs. The mag card reader was pretty finicky, but it sure was cool filling up that little vinyl case with all that software that never got used...

A couple of years later HP released the iPhone of the calculator era - the HP-41C, which had an alphanumeric display! HP was just as opinionated as Apple, in their own way - if you wanted one of their gorgeous units, you had to grok RPN notation. Kinda reminds me of my current effort of learning git. I never did get one of those units, because by that time it was a lot more rewarding to punch code into a real microcomputer....

↧

Battle of the SSLs - Round 2

March 9, 2012, 12:22 pm

≫ Next: FOSS4G-NA 2012 review

≪ Previous: The TI-58/59 programmable calculators

In a comment to Battle of the Spatial Scripting Languages, Sean suggested that a slightly more in-depth test of language expressiveness might be this task, involving a sequence of CRS transformation and geometry cleaning. Here's the JEQL equivalent:

ShapefileReader t file: "test_uk.shp";

trans = select * except GEOMETRY,

               CRS.transform(GEOMETRY, "epsg:4326", "epsg:27700") geom from t;

clean = select * except geom, isValid ? 1 : 0 isValid, cleanGeom
    with { 
        isValid = Geom.isValid(geom);
        cleanGeom = isValid ? geom : geom; //Geom.buffer(geom, 0);
    } from trans;

ShapefileWriter clean file: "test_uk_valid.shp";

The code is spread out a bit for clarity. It could have been done in single SELECT statement, apart from the isValid flag added for reporting purposes. Note the use of the JEQL extension with clause, which improves code clarity, and also allows a more imperative style of coding which can sometimes be preferable.

Fiona gets points for being able to define any function directly in the language, but there's a price to be paid in code complexity.

Here's the result (seen in the new JEQL Workbench Geometry Viewer):

↧

FOSS4G-NA 2012 review

April 18, 2012, 3:04 pm

≫ Next: Grandpa's Googler

≪ Previous: Battle of the SSLs - Round 2

Last week I was at FOSS4G-NA in Washington DC. It was my first time in DC, and my first FOSS4G as a member of the OpenGeo team. Both were very pleasant experiences!

One of the keynotes was by Josh Berkus of PostgresQL talking about "firehose databases". FOSS4G is always a firehose conference, and this one was no exception. I missed many more talks than I wanted to take in, for various reasons - the 3 simultaneous tracks, my own commitments to talks and OpenGeo work, and often because it's just such a good chance to talk to great people from all around the FOSS4G world.

Here are some of the my conference highlights:

another inspiring keynote from Paul Ramsey, on the origins and future of open source
the Ignite Spatial talks, which were fast, fascinating and funny
a panel discussion which hit on the need for a replacement for the shapefile (PLEASE!)
hearing Josh Berkus talk about the impressive new features coming in PostgresQL 9.2 (JSON, index-only scans, SP_GIST, to name a few)
Andrew Turner's fastpacedpresentationonGeoIQ
meeting Kevin Web and hearing more about OpenTripPlanner
discussing skeletonization, long-distance hiking, and sewage treatment with Stephen Mather
talking to Josh Marcus about GeoTrellis, an engine for scalable geoprocessing
talks on using NoSQL DBs such as MongoDB and HBase for storing and querying spatial data
a discussion with Javier de la Torre of CartoDB about the need for a spatially-sensitive sampling technique (a la Oracle's SAMPLE clause) in PostGIS. We ended up discussing this with Josh Berkus (since he was sitting at the next table!) and apparently there is some work starting on this in PostgreSQL. Hopefully this will extend to spatial datasets as well.
and of course, meeting lots of people using JTS!

Here's a few other reviews from around the web:

Directions Mag takeaways from the ever-present Adena Schutzberg
smather on a Real American Hero
Bill Dollins also had the firehose feeling
Josh Berkus liked his sip from the spatial firehose

↧

Grandpa's Googler

April 22, 2012, 6:23 pm

≫ Next: Why we were stronger in the 80's

≪ Previous: FOSS4G-NA 2012 review

A peerless pastiche by SWWTMTOTH... Try it, it works! Go ahead, make it your homepage!

Hands up all those who remember modem whistles. Takes me back... For maximum verisimilitude, every other time you hit this site it should give you a busy signal.

Little known fact - DSL and cable modems actually make the same sounds, they're just pitched too high to hear 8^

↧

Why we were stronger in the 80's

May 17, 2012, 9:54 am

≫ Next: A scientific basis for Open Source Software

≪ Previous: Grandpa's Googler

I love this picture. Nowadays this functionality weighs only a few ounces, and we carry it in a pocket.

↧

A scientific basis for Open Source Software

May 17, 2012, 2:58 pm

≫ Next: The Roshambo-bots are coming!

≪ Previous: Why we were stronger in the 80's

Stefan Steineger of the OpenJUMP project pointed out this great paper in Nature on The case for open compute programs. The paper raises the argument for open source software to a higher plane, that of being a necessary component of scientific proof. It points out that the increasing use of computational science as a basis for scientific discovery implies that open source must become a standard requirement for documentation. Apparently some journals such as Science already require source code to be supplied along with submissions of articles. Amongst other advantages, access to source code is an essential element of peer review.

An interesting example they mention is the infamous HadCRUT and CRUTEM3 meteorological datasets. One of the (few) salient criticisms levelled at this information during Climategate was the inability to reproduce the results by re-running the software. (Mind you, the software was probably a pile of crufty old Fortran programs mashed up by Perl scripts, so maybe it's just as well).

I'm looking forward to seeing JTS get cited in academic papers (actually, it already has been). Maybe I even have a finite Erdos number!

It's maybe too much to ask that mere scientists be coding hipsters, but I noticed that SourceForge is presented as the leading example of collaborative software development. Someone should introduce them to GitHub - which truly walks the talk. Researchers in bioinformatics should be especially appreciative of the sweeping effect of recombinant software development it enables.

↧

The Roshambo-bots are coming!

June 27, 2012, 9:07 am

≫ Next: Global Earthquakes since 1898

≪ Previous: A scientific basis for Open Source Software

The Japanese have figured out a way to let me avoid taking the garbage out on rainy winter evenings - a Rock-Paper-Scissors robot that always wins!

It actually cheats, though - it reads your mind to determine what move you're going to throw.

Now, enough playing with the easy stuff - get back to work making a Go program! Or at least Rock-Paper-Scissors-Lizard-Spock...

↧

Global Earthquakes since 1898

July 3, 2012, 1:44 pm

≫ Next: JTS helps ESRI with big data problem

≪ Previous: The Roshambo-bots are coming!

This post shows off a very cool image of global earthquakes since 1898.

No link to the source dataset, though. That's unfortunate, because it would be really cool to see this as an animation, and as a KML file (with timestamps, to allow temporal scrolling). Hmmm... sounds like a job for JEQL.

Update: the post does give a link to ANSS, which has a service making the data available in a variety of formats, including KML. It doesn't have a timestamp for temporal scrolling, however - and perhaps a different way of styling might be interesting. A heatmap, perhaps?

↧

JTS helps ESRI with big data problem

August 2, 2012, 10:42 am

≫ Next: Word frequency using JEQL

≪ Previous: Global Earthquakes since 1898

Can't help but feeling a little smug about this post on using JTS in a Hadoop process for generating heatmaps for demographic data.

He only seems to be using JTS for generating point buffers, which is hardly a challenging use case. But if having a fast point buffer algorithm is a key metric I'm not going to complain.

As usual the link for JTS is pointing to the old Vivid site - the current one is this.

↧

Word frequency using JEQL

August 18, 2012, 8:07 pm

≫ Next: Battle of the SSLs - Round 3

≪ Previous: JTS helps ESRI with big data problem

Ryan Tomayko has a post on how Ruby recapitulates AWK (or to be more biologically accurate, how it carries vestigial traits which reveal its evolutionary lineage from AWK down through Perl).

He gives an example of how curl, AWK, and sort can be chained together to compute word counts for Swift's A Modest Proposal:

curl -s http://www.gutenberg.org/files/1080/1080.txt |
ruby -ne '
  BEGIN { $words = Hash.new(0) }

  $_.split(/[^a-zA-Z]+/).each { |word| $words[word.downcase] += 1 }

  END {
    $words.each { |word, i| printf "%3d %s\n", i, word }
  }
' |
sort -rn

Back in the day I was an enthusiastic user of AWK. I was happy to discover that JEQL can be handily used for similar kinds of text processing, when equipped with suitable string handling and RegEx functions. Here's the word count functionality in JEQL (using a source for the text that is more bot-friendly than Project Gutenberg):

TextReader t file: 
  "http://www.victorianweb.org/previctorian/swift/modest.html";

t = select String.toLowerCase(splitValue) word from t 
      split by RegEx.splitByMatch(line, "[a-zA-Z]+" );

Print select word, count(*) cnt from t 
        group by word order by cnt desc;

AWK had a bit of a rep for being somewhat write-only. To my SQL-attuned eyes the JEQL version is more understandable.

↧

Battle of the SSLs - Round 3

August 18, 2012, 8:49 pm

≫ Next: We're all in deep map now

≪ Previous: Word frequency using JEQL

After the warmup of rounds 1& 2 of the Battle of the Spatial Scripting Languages, it's time to pick up the pace! The last two rounds were relatively simple tasks which any ETL system worth its keep should be able to tackle. So here's a more interesting problem:

Given a KML file of volcano locations, and a Shapefile of country boundaries, determine the countries with the greatest density of volcanoes.

It's slightly contrived, but contains a nice variety of spatial and data manipulation tasks, including reading spatial data formats, spatial joins, and grouping and sorting.

Here's the JEQL script:

KMLReader tvol

  file: "http://www.volcano.si.edu/ge/GVPWorldVolcanoes-List.kmz";

ShapefileReader tworld file: "geocommons-world.shp";
Mem tworld;

tc = select NAME, first(AREA) area, count(*) num 
    from tvol join tworld

      on GeomPrep.contains(tworld.GEOMETRY, tvol.geometry) 
    group by NAME;

td = select NAME, num, area, Val.toDouble(num)/area density 
    from tc where area > 0 
    order by density desc 
    limit 20;

HtmlWriter td file: "vol_density.html";

A few points to notice:

JEQL can read KML/KMZ files from the Web
JEQL's way of chaining select statements together is more readable and maintainable than the regular SQL nested syntax
The first() aggregate function in JEQL allows selecting a value from a non-aggregated column. This is often very awkward and inefficient to do in regular SQL.
The GeomPrep function set uses the JTS PreparedGeometry API to optimize repeated geometry predicate evaluation

And here's the places where you might not want to invest in hillside property:

Dominica              4    75 0.0533333333333333
Grenada               1    34 0.0294117647058824
Tonga                 2    72 0.0277777777777778
St. Vincent and the   1    39 0.0256410256410256
St. Lucia             1    61 0.0163934426229508
Martinique            1   106 0.0094339622641509
Comoros               2   223 0.0089686098654709
El Salvador          17 2072 0.0082046332046332
Western Samoa         2   283 0.0070671378091873
Guadeloupe            1   169 0.0059171597633136
Reunion               1   250 0.004
Vanuatu               4 1219 0.0032813781788351
Cape Verde            1   403 0.0024813895781638
Iceland              24 10025 0.0023940149625935
Solomon Islands       6 2799 0.0021436227224009
Guatemala            21 10843 0.0019367333763719
Costa Rica            9 5106 0.0017626321974148
Japan                64 36450 0.0017558299039781
Armenia               4 2820 0.0014184397163121
Nicaragua            16 12140 0.0013179571663921

↧

We're all in deep map now

September 10, 2012, 1:51 pm

≫ Next: The complexity of simple

≪ Previous: Battle of the SSLs - Round 3

A few thoughts on the Atlantic article on GooMaps...

I love the term "the deep map". I'm not doing GIS any more - I'm doing deep mapping!
Yay! Geospatial conflation gets a mention in the MSM!
The article name-checks Borge's map. The original story pointed out the paradox of a 1:1 scale map - but this is only paradoxical in 2-space. In the infinite-dimension virtual world it's possible to have maps that are more detailed than the physical world (in that they can represent relationships across time and other more abstract dimensions)
That is some great synergy between StreetView data acquisition and Google's work on self-driving cars
The author is convinced that Google has an unassailable lead on building the virtual map of the world. But without knowing what's going on at Apple (and perhaps Microsoft) I wouldn't be so sure about that. (And even Google isn't as transparent as this realtime view of OSM edits.)

As always, James Fee has a pithy comment and a great picture. The good news is you get a ping pong table in the office. The bad news is that 2000 more map edits just came in, so get back to work!

↧

The complexity of simple

September 10, 2012, 4:35 pm

≫ Next: Halton sequences, at last

≪ Previous: We're all in deep map now

Recently Sean Gillies and others have been looking at the concept of geometric simplicity as implemented in JTS/GEOS, specifically in the case of polygons. The TL;DR is that JTS has a bug because it reports the following polygon as simple:

POLYGON ((39.14513303 23.75100816, 97.05372209 40.54550088, 105.26527014 48.13302213, 100.91752685 58.43336815, 71.56081448 83.55114801, 60.71189168 86.25316099, 62.00469808 75.1478176, 83.16310007 42.82071673, 92.82305862 37.19175582, 95.99401129 26.47051246, 106.22054482 15.51975192, 39.14513303 23.75100816))

This polygon is invalid, because the shell ring self-intersects. This means the ring is non-simple - but it's not so clear-cut for the polygon itself.

In fact, reporting the polygon as simple isn't a bug - it was a deliberate design decision. The original rationale was:

the OGC Simple Features Specification in section 2.1.10 defining polygon geometries states that "Polygons are simple geometries". (This is consistent with the requirement that valid polygons contain no self-intersections).
JTS usually follows the principle that computed results are only well-defined when the input geometries are valid. This is because it can be expensive to check for validity in order to handle invalid inputs gracefully. Validity testing is factored out into a separate method (isValid) in order to allow the client to decide when to incur the code of executing it.

For these reasons the initial JTS implementation of isSimple returned true for all polygonal inputs, thus avoiding the cost of checking for self-intersections.

However, perhaps it would be more useful to actually carry out some testing of simplicity. This raises the question of what exactly are the semantics of isSimple for polygons? Looking to the SFS for guidance, it has the following somewhat imprecise general definition of simplicity:

the Geometry has no anomalous geometric points, such as self intersection or self tangency. The description of each instantiable geometric class will include the specific conditions that cause an instance of that class to be classified as not simple.

For polygons the spec simply states:

Polygons are simple geometries

(In contrast, for linear geometry the specification has a rigorous mathematical definition of simplicity.)

Apart from the "always simple" semantics, there are at least two other options:

Polygons are simple if their component rings are simple (i.e. have no self-intersections)
Polygons are simple if they have no topological anomalies. This is equivalent to checking whether the polygon is valid.

Because option #2 simply replicates the semantics of isValid, it seems more useful to adopt option #1 to provide more possibilities for analyzing polygon topology (but I'm definitely open to suggestions about why #2 would be better!)

This change to the semantics of isSimple has been implemented in JTS and will appear in the 1.13 release. The code will also be extended to handle GeometryCollections, with the semantics that they are simple if all their components are simple.

↧