Filling a plane surface with one or more geometric shapes so that there are no gaps or overlaps is called tessellation. The word is often used to describe a pattern created from the repetition of tiles like the famous work of Dutch graphic artist M. C. Escher (1898 - 1972). The mathematical definition is a bit broader. A "non-periodic" tessellation does not involve patterns or repeated tiles -- just the perfect coverage of a surface.
This concept is important in GIS because the geographic surface is continuous. We certainly don't want our data to imply a catastrophic subduction or bottomless crevasses! Unfortunately, tessellation is normally lost when complex geometries are simplified. The cause an possible solutions were the subject of my previous two posts:
"Why Simplification Get Complicated" explored several options for reducing the resolution of lines and polygons. "On Topo the World" discussed the special problem of maintaining a valid topology when simplifying contiguous polygons. Two methods for solving the topology dilemma using PostGIS were offered: building formal topological tables or emulating that behavior by disassembling, simplifying and reassembling the polygons in memory.
There is another way. SpherAware has developed an approach that offers the convenience of a custom function and the speed of a stored topology. It also separates simplification and topological restoration into separate steps which means the acceptable resolution and an appropriate generalization method can be determined first. "Fixing" the topology is then done in-place on the simplified data. Working only with the reduced number of vertices and leaving the polygons intact help optimize performance.
This is a nice solution for generalizing high-resolution data into a visually accurate representation at larger scales. It's ideal for PuzzleMap. In the following illustrations, polygons representing the 50 US States and the District of Columbia were reduced by 99% from a total of 853,517 vertices to 8,570 vertices using the VW algorithm. On a 4 GHz processor with 8 GB of RAM, this took 16 milliseconds. The topological corrections took 4.1 seconds. That's not bad considering the all-in-one "disassemble, simplify and re-assemble" query took over an hour to complete!
The Mississippi Valley
at full resolution
A close-up of the area
shown to the left
The core logic involves finding all sets of adjoining polygons first. This is done by checking whether slightly enlarged polygons intersect. Wherever there is an intersection of 3 such enlargements, there is a point where their borders should connect. The centroid of that area is therefore added to each of the overlapping polygons to eliminate any possible gap where they come together. This point becomes the topological node.
The simplified polygons
are clearly misaligned
Effective nodes are found
in the buffer intersections...
The "fix" uses these nodes
when redrawing the polygons
The rest of the process simply chooses one of the simplified outlines to use as the boundary between two adjacent polygons. It doesn't matter which since they should both be valid generalizations — they're just different. The reasons for this were mentioned in the first post in this series.
Database operations are normally transactional, meaning changes are only committed if they succeed. If they fail, nothing is affected. Subtransactions are autonomous transactions inside another transaction. This technique is used to help clarify and optimize the process. Rather than manipulate almost a million vertices in memory, transposing them to and from entirely different geometry objects, an iterative approach is used -- each polygon is modified with respect to its neighbors and committed to disk before proceeding. Although everything seems to happen at once, a series of changes are being saved immediately after they are made. Database management systems (DBMSs) are well-optimized for disk I/O and the amount involved is trivial compared to massive memory paging imposed on the OS when everything is done en masse in memory.
There is a practical limit to this approach. Over-simplification can lose details that are topologically important. Although the SpherAware technique attempts to recover them, the spatial logic inevitably fails at some point (no pun intended). Close visual inspection of the result is therefore always advised. Fortunately, manual adjustments become easier to manage as more vertices are removed.
You may download the solution here and try it for yourself. This is a preliminary implementation and several enhancements might improve its effectiveness. Ostensibly:
-
The re-discovered nodes are added back to the polygons using ST_Snap(). This function aligns one geometry to another within a distance tolerance. Snapping a polygon to a point seems to work somewhat inconsistently for this purpose. It would be more ideal to actually replace the closest vertex with the node.
-
The current logic only finds interior nodes that are surrounded by 3 or more polygons. A reliable method is needed to also find the nodes where 2 polygons touch at the very edge of the tessellated surface.
-
The logic could be rewritten to maintain a temp table instead of using subtransactions. The performance impact is unknown but this would likely be preferable since the entire transaction could then be rolled back on failure.
Re-tessellating a simplified set of polygons may be an uncommon GIS task. It has, however, been an extremely useful step forward in SpherAware's PuzzleMap project. Perhaps the concept will prove useful to others. If nothing else, it is a good example of the unfinished state of the GIS art. There will always be new challenges, new ways of meeting them and plenty of room for innovation. That is one reason why GIS is so endlessly fascinating.