Spatial selection of features #97
Replies: 3 comments 9 replies
-
Code Sample 1: creating a GiST index on the long island state plane (ft) coordinates for the tax lot table BEGIN;
CREATE INDEX tax_lot_li_ft_gix
ON tax_lot USING GIST (li_ft);
SELECT tablename, indexname, indexdef FROM pg_indexes WHERE tablename='tax_lot';
COMMIT; Code Sample 2: Comparing a distance-based sorting to bbl-based sorting. Also comparing keyset bbl pagination to offset pagination. SELECT *
FROM tax_lot
WHERE ST_Intersects(
ST_Buffer(
ST_Transform(
ST_GeomFROMGeoJSON('{
"coordinates": [
-73.99044046229767,
40.73423087068164
],
"type": "Point"
}'),
2263),
600),
tax_lot.li_ft)
ORDER BY tax_lot.bbl
OFFSET 100
LIMIT 100;
SELECT *
FROM tax_lot
WHERE ST_Intersects(
ST_Buffer(
ST_Transform(
ST_GeomFROMGeoJSON('{
"coordinates": [
-73.99044046229767,
40.73423087068164
],
"type": "Point"
}'),
2263),
600),
tax_lot.li_ft)
AND tax_lot.bbl > '4000150010'
ORDER BY tax_lot.bbl
LIMIT 100;
-- Point
SELECT *
FROM tax_lot
WHERE ST_Intersects(
ST_Buffer(
ST_Transform(
ST_GeomFROMGeoJSON('{
"coordinates": [
-73.99044046229767,
40.73423087068164
],
"type": "Point"
}'),
2263),
600),
tax_lot.li_ft)
ORDER BY tax_lot.li_ft <->
ST_Transform(
ST_GeomFROMGeoJSON('{
"coordinates": [
-73.99044046229767,
40.73423087068164
],
"type": "Point"
}'),
2263)
OFFSET 0
LIMIT 100; Code Sample 3: Distance ordering for linestrings -- LineString
SELECT *
FROM tax_lot
WHERE ST_Intersects(
ST_Buffer(
ST_Transform(
ST_GeomFROMGeoJSON('{
"coordinates": [
[
-74.00765772002035,
40.710173183817375
],
[
-74.0093324193894,
40.70866753628991
],
[
-74.01015029582547,
40.70772279889087
],
[
-74.01116290474634,
40.70645328690529
]
],
"type": "LineString"
}'),
2263),
600),
tax_lot.li_ft)
ORDER BY tax_lot.li_ft <-> ST_Transform(
ST_GeomFROMGeoJSON('{
"coordinates": [
[
-74.00765772002035,
40.710173183817375
],
[
-74.0093324193894,
40.70866753628991
],
[
-74.01015029582547,
40.70772279889087
],
[
-74.01116290474634,
40.70645328690529
]
],
"type": "LineString"
}'),
2263)
OFFSET 100
LIMIT 100; Code Sample 4: Distance ordering for center of max inscribed circle within a polygon -- Maximum inscribed circle, Polygon
SELECT *
FROM tax_lot
WHERE ST_Intersects(
ST_Buffer(
ST_Transform(
ST_GeomFROMGeoJSON('{
"coordinates": [
[
[
-73.91716324329917,
40.69506235583748
],
[
-73.91186651971307,
40.6946194213875
],
[
-73.91252861016173,
40.69798564935206
],
[
-73.91268439614913,
40.695239528792456
],
[
-73.91716324329917,
40.69506235583748
]
]
],
"type": "Polygon"
}'),
2263),
600),
tax_lot.li_ft)
ORDER BY tax_lot.li_ft <-> (ST_MaximumInscribedCircle(ST_Transform(
ST_GeomFROMGeoJSON('{
"coordinates": [
[
[
-73.91716324329917,
40.69506235583748
],
[
-73.91186651971307,
40.6946194213875
],
[
-73.91252861016173,
40.69798564935206
],
[
-73.91268439614913,
40.695239528792456
],
[
-73.91716324329917,
40.69506235583748
]
]
],
"type": "Polygon"
}'),
2263)
)
).center
OFFSET 900
LIMIT 100; |
Beta Was this translation helpful? Give feedback.
-
Offset seems like the logical pagination method to start with to me. I wasn't familiar with keyset, so I appreciate the research. Cursor is another one I had in mind but, as that article points out, it's usually more difficult to implement. Perhaps we go with Offset and keep Cursor in mind if we run into performance problems? I think we can play around with other ways to send geometries as query params but the way you outlined here sounds reasonable to me. One note about the response schema - including |
Beta Was this translation helpful? Give feedback.
-
@TangoYankee I wanted to pick back up the conversation about how we encode geometries in query params. I think the approach you outlined above with
The approach I'm suggesting is to include geometries as WKT-encoded strings. This approach in inspired by how CQL handles passing geometries to "predicates" in their query syntax. With this approach, instead of having the
I know there are obvious downsides to encoding the whole geometry as what is ultimately "just a string" to OpenAPI but I think this is an interesting opportunity to take advantage of a well known (pun intended) standard for encoding geometries. We would, of course, still validate that the string is valid WKT on the server side and throw 400s as necessary. Another consideration is what this would look like for Thoughts? I was thinking I could put together sister issues to the existing spatial filter ones for this approach. |
Beta Was this translation helpful? Give feedback.
-
Description
Zap Search, Capital Planning, Population Factfinder and Tax Lot Selector each have a spatial filter. Tax lot Selector and Population Factfinder provide polygon filters. Capital Planning and ZAP search provide point with radius filters. With these precedents, we should adopt functionality within the API to provide a general-use spatial filter that could be applied to different endpoints. Something that works for tax lots and census tracts should also apply to buildings and zoning applications.
We are interested in applying a spatial filter to an endpoint that would otherwise return all entries within a domain. We can use tax lot endpoints to illustrate this approach. Imagine we created a
tax-lots
endpoint that returned all tax lots. We could then pass query parameters to this endpoint to filter the results. In both scenarios, the infrastructure may become unresponsive from the large volume of requests. In worst case scenarios, the api would attempt to return all ~8.6e5 tax lots. Consequently, we will need to implement pagination, before or simultaneously with, a spatial filter.The sections below a general approach to implementing pagination and spatial filtering. Considerations include characteristics of PostGIS, Javascript, GeoJSON, and OpenAPI specifications. Offset/limit is the best pagination technique for our situation. For spatial filtering, we can pass a point, linestring, or polygon feature with an optional buffer. The filter will return only the entries that spatially intersect the resulting geometry.
Design
Endpoints
We create
tax-lots
andzoning-district
endpoints. When no spatial filters are provided to the endpoint, entries will be ordered alphabetically based on the primary key and limited to 20 results at a time. The users can request a higher limit, up to 100 results at a time. The user can also specify an offset.Users may also supply a spatial filter through the query parameters. The filter finds all entries that spatially intersect a supplied geometry and an optional buffer. The spatial filter consists of coordinates, a geometry parameter specifying whether it is a point, linestring, or polygon, and a buffer around the geometry. When a spatial filter is applied, the entries will be ordered based on the "nearness" to the supplied geometry. For points and linestrings, nearness is determined by the distance from the requested geometry to the intersecting feature. For polygons, nearness is determined by the distance from the center of the maximum inscribed circle of the requested geometry to the intersecting feature.
The response includes two top-level keys. The first key holds the array of results: ie) "tax-lots: Array<TaxLots" or "zoningDistricts: Array<ZoningDistricts". The second key contains pagination data for the offset, limit, total results in the response, and the method used to order the results.
Query parameters
The query parameters include data for pagination and spatial filtering. The pagination parameters are always optional because the API will always provide its own defaults. However, the spatial filtering parameters depend on each other to make a logical request.
Note: Values sent through the url are always strings. This includes query parameters. Descriptions of query parameter data schemas should be interpreted as "a string in the shape of...[data structure]". Schemas that obtain their values through query parameters must be able to parse the string representation into literal data structure as described. Ie) Numbers are passed through the query paremeters as "number-like" strings and then parsed by the application into literal numbers.
Update to Note we are adding a query param parser pipe. This will allow for the documentation of query parameters as their desired type, rather than a string formatted in the shape of the desired type
The pagination query parameters:
The geospatial query parameters
lats
parameterlons
parameterResponse body
The response body contains two keys. The first key is the results. Its exact shape is determined by the endpoint. The second key is the
page
data and follows the structure below:Open API Specifications
The documentation strategy for the query parameters. The required combination of query parameters is a little complicated. The general strategy is to create one page schema and three geometry schemas. Because the page parameters can always be used, the geometry schemas include the page schema. Each endpoint can then enforce that 'oneOf' the page or geometry type schemas is followed.
EDIT These schemas are invalid. The supported approach is to list each query parameter separately and as optional. The relationship between the parameters is then documented in the text description for the end point. This approach is reflected in the implementation tickets.
Page Schema
Geometry filter schemas
Point
LineString
Polygon
Notes
Buffer unit is feet. It hurts to use feet because the coordinates are in degrees. However, we are translating the lat/lon into long island state plane feet. The actual spatial analysis will use feet and users will likely be thinking in feet. Might as well request it in feet.
lat and lon lengths are never more than 5, to keep urls resonably lengthed while still allowing the creation of rectangles/4-sided polygons. (valid polygons are linear rings- their endpoints match their starting points. Consequently, 3 and 4-sided polygons actually contain 4 and 5 points, respectively. The last point being the 'closing' point. It is tempting to only accept the 'unique' points and then automatically reuse the first point as the last and closing point. However, we should not assume the user understood this conversion would occur. It is better to follow the geojson convention and expect the last point to be a 'closing' point.)
Geometry Validations
We will want to perform two primary validations for geometries. First, we should verify the lats and lons parameters are of equal length. If they are different lengths. This is cheap to perform in the API and quickly identifies obviously invalid geometries. This saves a trip to the database to check the spatial validity of the geometry. In these cases, the API should throw an error with a message that explains the users passed with coordinates of mismatched lengths.
For linestrings and polygons we will also want to use the database to check for "OGC validity", to prevent the user from inadvertently sending an invalid geometry and receiving dubious results. In these case, the API should throw an error with a message that explains the user passed an invalid geometry.
In both cases, the thrown errors should be caught by the BadRequest Filter.
Appendix for design considerations
Pagination approach
Pagination limits the number of rows that are returned in each response. five types of pagination are described at nordic apis. The two most relevant types seem to be offset and keyset. They point out that offset scales poorly because the database is finding and discarding all of the results before the offset. I found this to be the case. With the tax lot dataset, finding tax lots after 8.4e5 records took multiple seconds to return. In contrast, querying for the results after a specified bbl (as described in keyset design), consistently took about 0.110 seconds.
However, we should also consider the context in which we want to use pagination. We will likely want to set the limit on the order of magnitude of 1e2. Iterating through every record from 1 to 8.5e5 is impractical regardless of whether we use keyset or offset. The real power comes from applying filters and then iterating through this subset of results. The comments below highlight that we are expecting results on the order of magnitude of 1e3. Within this context, offset pagination performs marginally faster than keyset.
Open API spec
Query parameter serialization
Query parameter descriptions
The query parameters for tax lots and zoning districts will be 'one of PageQuery, PointFilterQuery, LineStringFilterQuery, PoygonFilterQuery' (As decribed below, the spatial filter queries will themselves extend the PageQuery)
Geometry quality
OGC defines whether a geometry is simple and valid. PostGIS integrates validity checking. For reasons outlined below, there is value in checking the validity of linestrings and polygons. When users pass invalid linestrings or polygons, we should return a 400 code.
Simplicity
Simplicity excludes self-intersection and self-tangency. We should not worry about checking simplicity. Single points are simple by definition. As long as a polygon is valid, it is also simple. Finally, users may want to interact with linestrings that intersect themselves.
Validity
Validity "allow polygonal geometry to model planar areas unambiguously". There really isn't an opportunity for points to be invalid. However, linestrings need to be non-zero length and polygons have several requirements to be valid. Their validity should be verified before running spatial queries on them, otherwise the ambiguous geometry may return misleading results.
Geometry precision
TL;DR from the next two sections: coordinates are generally limited to 14 decimal places by memory capabilities.
We want to be mindful of the number of characters we are placing in the url as part of the get request. When they are too long, it may become unwieldy. Coordinates with several decimal values may quickly increase the character count, especially with 4-sided shapes requiring arrays with 5 numbers. We want to balance shorter URLs with sufficient spatial precision. I looked at what precision existing tools and also experimented with different precisions.
The PostGIS ST_AsGeoJSON documentation defaults to a precision of 9 but states that 6 decimal places is sufficient for 'display' purposes. Creating polygons at geojson.io, I found they use 14 decimal places. This is a large disparity in precedent. I was curious how precise each decimal place is.
Wgs84/epsg:4326 coordinates model the Earth as an ellipsoid. The arc length between two points on an ellipsoid depends on where you are on the surface. So, I used geojson.io to draw square and lines, looking at their areas and lengths. I found that in NYC, 6 decimal places differentiates distance of about a decimeter (roughly 4 inches). This is likely sufficient for most purposes. We can consider this the 'minimum' length that we should be able to support.
Url Length
We were initially concerned that url length would be a limiting factor. However, StackOverflow indicates 2000 characters is the general upper limit for system infrastructure (apache servers, cdns, SEOs). Modern browsers have even higher limits. When we count the characters for each section of the url, we find that urls with 14 decimal place coordinates still only take up about 325 characters. This isn't to say that we should allow more coordinates and decimal places until the url hits an infrastructure limit. However, url length shouldn't be a primary concern.
Url segments:
https://zoning.planningdigital.com/api
): 38 characters/tax-lots/{bbl}/zoning-districts/classes?
): 41 charactersoffset=850000&
: 14 characterslimit=100&
: 10 charactersgeometry=linestring&
: 20 characterslons=-73.97914931646805, -73.97914931646805,-73.97740014971507,-73.97740014971507,-73.97914931646805
: 100 characterslats=-40.75910653895198,-40.75785953808591,-40.75785953808591,-40.75910653895198,-40.75910653895198&
: 100 charactersUnderlying precision capabilities
Javascript numbers and PostGIS Geometry Types both use double precision floating point format. Coordinates like
-73.97914931646805
are at this upper limit.Technically, queries are formatted as strings and could send a number-like string with more decimal places. However, the parseFloat method implicitly rounds these numbers (as does zod, likely for the same reason). I did consider checking whether the parsed value was rounded. This is possible by converting it back to a string and then comparing to the original. However, this is such an extreme edge case that I don't think it's worth enforcing.
So by definition, numbers will be limited in both the API and the database to 64 bits. This will generally translate to 14 decimal places.
Beta Was this translation helpful? Give feedback.
All reactions