8000 Better workflow for very large national line dataset · Issue #339 · felt/tippecanoe · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Better workflow for very large national line dataset #339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
palmerj opened this issue Apr 24, 2025 · 16 comments
Open

Better workflow for very large national line dataset #339

palmerj opened this issue Apr 24, 2025 · 16 comments

Comments

@palmerj
Copy link
palmerj commented Apr 24, 2025

What I'm trying to achieve

Render a 50 million-line cadastral survey observation dataset so that scale-independent view that renders keep the density and local shape in one vector-tile layer. I tried most of the Tippecanoe options but had little luck. So I tried to pre-calc data to render at specific scales and blend dense areas of short lines into point clusters to reduce the size while merging longer lines into the dataset so it looks good in sparser areas. Beyond z13, I just merged in the actual data. Here's my current process:

ogr2ogr -f GeoJSONSeq -lco RS=NO -t_srs EPSG:4326 -dialect SQLite \
  -sql "SELECT
    id,
    geometry,
    13 AS maxzoom,
    /* 3-pixel visibility rule:
       resolution_z = (156,543 m / 2^z)   ←  1 px at zoom z
       3-px cut-off = 3 × resolution_z    */
    CASE
          WHEN L >= 469629 THEN 0
          WHEN L >= 234815 THEN 1
          WHEN L >= 117408 THEN 2
          WHEN L >=  58704 THEN 3
          WHEN L >=  29352 THEN 4
          WHEN L >=  14676 THEN 5
          WHEN L >=   7338 THEN 6
          WHEN L >=   3669 THEN 7
          WHEN L >=   1835 THEN 8
          WHEN L >=    918 THEN 9
          WHEN L >=    459 THEN 10
          WHEN L >=    229 THEN 11
          WHEN L >=    115 THEN 12
          WHEN L >=     57 THEN 13
    END AS minzoom
FROM (
    SELECT geometry,
           ST_Length(ST_Transform(geometry, 3857)) AS L
    FROM   nz_survey_observations_cadastral
) AS t
WHERE
    -- discard lines that cannot reach 3 px even at z13
    ST_Length(ST_Transform(geometry, 3857)) >= 57" \
  /vsistdout/ nz-survey-observations-cadastral.fgb \
| jq -c '(.tippecanoe = {minzoom: .properties.minzoom, maxzoom: .properties.maxzoom}) 
         | .properties |= del(.minzoom, .maxzoom)' \
> long_lines.geojsonl

tippecanoe -o observation_lines_z0_z13.mbtiles -l survey_obs -Z0 -z13 -y id \
    --hilbert --read-parallel --force \
    --no-tile-size-limit --no-feature-limit \
    long_lines.geojsonl

ogr2ogr -f GeoJSONSeq /vsistdout/ nz-survey-observations-cadastral.fgb \
  -t_srs EPSG:4326 -dialect SQLite -sql"
    SELECT
        ST_Centroid(geometry) AS geometry,
        /* 3-pixel length→minzoom logic (same thresholds used for lines) */
        CASE
          WHEN L >= 469629 THEN 0
          WHEN L >= 234815 THEN 1
          WHEN L >= 117408 THEN 2
          WHEN L >=  58704 THEN 3
          WHEN L >=  29352 THEN 4
          WHEN L >=  14676 THEN 5
          WHEN L >=   7338 THEN 6
          WHEN L >=   3669 THEN 7
          WHEN L >=   1835 THEN 8
          WHEN L >=    918 THEN 9
          WHEN L >=    459 THEN 10
          WHEN L >=    229 THEN 11
          WHEN L >=    115 THEN 12
          WHEN L >=     57 THEN 13
          ELSE 14
        END AS line_minz
    FROM (
        SELECT geometry,
               ST_Length(ST_Transform(geometry, 3857)) AS L
        FROM   nz_survey_observations_cadastral
    ) AS t
  "| jq -c'
  if .properties.line_minz == 0 # line visible from z0 → no point
     then empty
     else
       .tippecanoe = {
         "minzoom": 0,
         "maxzoom": (.properties.line_minz - 1) # vanish 1 zoom before line
       } |
       del(.properties.line_minz)
  end
' > observation_centroids.ndjson


tippecanoe -o observation_points_z0_z13.mbtiles \
  -l survey_obs -Z0 -z13 -r1 \
  --cluster-densest-as-needed \
  --read-parallel --force \
  observation_centroids.ndjson

tippecanoe -o survey_obs_z14_z15.mbtiles -l survey_obs -y id -D18 -Z14 -z15 -B15 --no-tile-size-limit \
      --hilbert --read-parallel --force \
      nz-survey-observations-cadastral.fgb

tile-join --force -o final_survey_obs.mbtiles \
    --no-tile-size-limit \
    observation_points_z0_z13.mbtiles \
    observation_lines_z0_z13.mbtiles \
    survey_obs_z14_z15.mbtiles

My process works ok if you style the points at a smaller enough size (say 1px or less). I'm wondering if I'm being silly here and if there is a smarter way to do it with just Tippecanoe commands or if the tools can be easily improved to do what I want. Also, ideally it would be good to get rid of the points and just have lines, but I fear this is not so easy when trying to keep the tile sizes down. This process takes about 2 hours on my M2 Macbook Pro with 12 cores (noting that most of it is single-core bound, like with OGR commands).

Here is the dataset

A couple of full renders of the dataset at large scales.

Image Image

See here for what the tileset looks like in action.

Lastly here are the tile sizes by zoom level, which seems to be ok:

┌─────────────┬────────────┬───────────────┬────────────┐
│ size_bucket │ tile_count │ avg_tile_size │ total_size │
├─────────────┼────────────┼───────────────┼────────────┤
│ 0–1 KB      │     156860 │ 327.0         │   51298613 │
│ 1–10 KB     │      92347 │ 3555.0        │  328268737 │
│ 10–100 KB   │      26101 │ 28275.0       │  738006298 │
│ 100–250 KB  │       2845 │ 157488.0      │  448052238 │
│ 250 KB–1 MB │       1102 │ 443661.0      │  488914052 │
│ >= 1 MB     │         17 │ 1205337.0     │   20490737 │
└─────────────┴────────────┴───────────────┴────────────┘
@e-n-f
Copy link
Collaborator
e-n-f commented Apr 24, 2025

Thanks! I'll take a look and see if I can figure out how to make this work well. I am also running into the infinite loop trying to thin out the features in my first attempt.

@palmerj
Copy link
Author
palmerj commented Apr 24, 2025

Much appreciated @e-n-f!!

@e-n-f
Copy link
Collaborator
e-n-f commented Apr 25, 2025

I think #340 should fix the infinite loop problem, at least, but I am still struggling to get a decent tileset for this data that doesn't have dropouts at any zoom level.

gzcat ../nz-survey-observations-cadastral.fgb.ldgeojson.gz | ./tippecanoe --no-feature-limit --extend-zooms-if-still-dropping --coalesce-densest-as-needed -P -zg -D9 -M1000000 -y id -f -o out.mbtiles

makes it to z8 but then fails to get tile 8/252/156 under a million bytes.

@palmerj
Copy link
Author
palmerj commented Apr 25, 2025

Wow, thanks so much. I got it to render a tileset with a larger 2MB tilesize limit:

./tippecanoe \
  --no-feature-limit \
  --extend-zooms-if-still-dropping \
  --coalesce-densest-as-needed \
  --maximum-tile-bytes=2000000 \
  -P -zg -D9 -y id -f \
  -o out.mbtiles \
  ../nz-survey-observations-cadastral.fgb

But it would be nice to keep it to less than 1mb. Do you think that's possible without loosing too much detail > z8?

Also I noticed that some features never show up at all at any zoom level. See this missing network of observation out at sea:
Image

Image

Lastly what level of resolution does -D9 provide. I never fully understood that logic... I'm actually wonder if it's possible to get a precision of 5cm for my coordinates. Is the only way to render titles to zoom level 18?

Many thanks once again!!

@e-n-f
Copy link
8000
Collaborator
e-n-f commented Apr 25, 2025

A detail of 9 means that each tile's coordinates are in the range 0 through 2^9, which is 512.

The overall precision of a tileset is 360 / (2 ^ (detail + maxzoom)) degrees. If I have the conversion right that 1 cm at the equator is 0.000000089895 degrees, that means that detail 12 at z18 is 3.73 cm (360 / (2 ^ (12 + 18)) / 0.000000089895). If your renderer can support it, you could increase the detail at maxzoom instead of the zoom level. Mapbox GL can't make use of detail higher than 13 (8192x8192 tiles).

Thanks for pointing out the missing features. I'll take a look and see if I can figure out what has happened to them.

I wish I knew how to keep the overall tile size under 1 MB, but it doesn't seem possible without losing some of the geometry.

@e-n-f
Copy link
Collaborator
e-n-f commented Apr 25, 2025

Can you identify any of the ids of the missing features?

@palmerj
Copy link
Author
palmerj commented Apr 25, 2025

Can you identify any of the ids of the missing features?

Ok big sorry. I got mixed up with dataset revisions. Those missing features we in fact not missing in the dataset I provided and used for the vector tile generation comparison.

@palmerj
Copy link
Author
palmerj commented Apr 25, 2025

A detail of 9 means that each tile's coordinates are in the range 0 through 2^9, which is 512.

The overall precision of a tileset is 360 / (2 ^ (detail + maxzoom)) degrees. If I have the conversion right that 1 cm at the equator is 0.000000089895 degrees, that means that detail 12 at z18 is 3.73 cm (360 / (2 ^ (12 + 18)) / 0.000000089895). If your renderer can support it, you could increase the detail at maxzoom instead of the zoom level. Mapbox GL can't make use of detail higher than 13 (8192x8192 tiles).

Many thanks for the explanation. Ah makes more sense. So the best I could hope for in generating the resolution of 5cm that is supported in Mapbox/Maplibre without generating another zoom level is -z17 -d 13? Or is their little benefit in doing and and I should just produce smaller resolution tiles and generate further zoom levels of tiles?

I wish I knew how to keep the overall tile size under 1 MB, but it doesn't seem possible without losing some of the geometry.

Ok thanks once again. I was able to generate a cache with a max size of 1500000, which is still pretty good given the size and density of the datasets. Next I'm going to try and revisit this dataset mapbox/tippecanoe#311

@palmerj
Copy link
Author
palmerj commented Apr 28, 2025

Trying to tile another large dense dataset (this time a polygon), I get the following result:

https://pmtiles.io/#url=https%3A%2F%2Flinz-test-data.s3.ap-southeast-2.amazonaws.com%2Fpmtiles%2Fnz-primary-parcels.pmtiles&map=3.1/-42.08/172.64

I used the same PR branch of code you provided.:

./tippecanoe \
  --no-feature-limit \
  --extend-zooms-if-still-dropping \
  --coalesce-densest-as-needed \
  --no-simplification-of-shared-nodes \
  --no-tiny-polygon-reduction \
  --maximum-tile-bytes=2000000 \
  -P -z17 -D11 -d12 -y id -f \
  -o nz-primary-parcels.mbtiles \
  nz-primary-parcels.fgb

The result is pretty good, but it would be nice to reduce the tile sizes while trying to retain urban density and shape. Do you have any suggestions on the CLI above?? I increased the max file size to 3MB+ or which helped, but the download and rendering took too long for my liking Or is this about as good as I will get?

I also noticed that increasing the detail at low zooms to a higher number than 12 (i.e., -D13) made the situation much worse than with a -D11. Also, -D7 to -D10 produced similar results. This didn't make sense to me. Are you able to explain?

Here's example shots at z7 based:

  • out_z17_no-simp-D11_d12_3MB (Best result)
Image
  • out_z17_no-simp-D11_d12_2MB (Good result)
Image
  • out_z16_no-simp-D13_d13.mbtiles (higher low detail reduced the increased density in urban areas)
Image

The dataset is here.

Many thanks

@palmerj
Copy link
Author
palmerj commented May 2, 2025

@e-n-f Any thoughts on parcels generation? no worries if you don't have time.

Can we get this code merged? It's looking pretty good for me.

Greatly appreciated :-)

@e-n-f
Copy link
Collaborator
e-n-f commented May 6, 2025

Sorry for the delayed response. I am working on some other parcel-tiling issues now and appreciate having your dataset to try too.

@palmerj
Copy link
Author
palmerj commented May 7, 2025

No problem, I can wait. Great that you are making use of the dataset

@e-n-f
Copy link
Collaborator
e-n-f commented May 7, 2025

The reason a higher detail like -D13 makes things worse is twofold:

  1. It increases the range of coordinate values within the tile, so each X,Y pair takes 13 bits to represent instead of 12
  2. It increases the minimum physical size of a feature, so there are more features that it has to try to include in the tile. A feature that is not at least 1 unit by 1 unit in extent can't be seen so it can be excluded from the tile. When you increase the detail, some features that were previously excluded because they were smaller than the tile grid are no longer smaller than the tile grid and have to be included to maintain visual continuity.

I don't think you have many options left for decreasing the size of the tiles. You are already excluding most of the attributes and reducing the tile detail. You could reduce the tile buffer with -b if you don't need 5 units of buffer; 1 is probably enough. Additional --simplification probably won't change anything since most of your features are already simple in shape.

@e-n-f
Copy link
Collaborator
e-n-f commented May 9, 2025

These changes are merged as https://github.com/felt/tippecanoe/releases/tag/2.78.0 now.

@palmerj
Copy link
Author
palmerj commented May 11, 2025

Thank you for merging the code!

Also thank you for explanation of the impacts of the increased resolution.

Adding -b 1 makes a small difference.

Interesting looking at Felt.com service, which of course keep all of the attributes, it has a much denser looking result. Is felt using a much higher --maximum-tile-bytes size, or different parameters to what I'm using?

@palmerj
Copy link
Author
palmerj commented May 17, 2025

@e-n-f able to comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants
0