-
Notifications
You must be signed in to change notification settings - Fork 0
3 group surveys to fix floating point issue #4
Conversation
WalkthroughRefactors LAS conversion to group surveys by H3 cells, adds public functions (prepare_cells, CSV_2_LAS, group_by_h3_cell), enhances CRS and H3 handling with per-survey reprojection and PDAL cropping, updates README and requirements, and converts .gitignore to directory-level entries. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as Main (__main__)
participant Prep as prepare_cells()
participant CSV as CSV_2_LAS()
participant Group as group_by_h3_cell()
CLI->>Prep: load metadata + level1_h3_cells
note right of Prep #d9f0ff: generate level‑2 H3 cells\ncompute cell_polygons\nmap surveys → cells
Prep-->>CLI: cell_to_surveys, cell_polygons
CLI->>CSV: pass metadata
note right of CSV #f7fbf0: per-survey CSV → LAS\nper-row CRS handling + fallback\nchunked writes → merge
CSV-->>CLI: per-survey merged LAS files
CLI->>Group: pass cell_to_surveys, cell_polygons
rect rgb(255,249,230)
note right of Group #fff9e6: iterate level‑2 H3 cells\ncrop merged survey LAS with PDAL\nwrite per-cell LAS outputs (`./h3_cells`)
end
Group-->>CLI: per-cell LAS outputs / logs
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (12)
requirements.txt (1)
5-8: PDAL installability and version pinning
pdaloften requires system/conda packages; plainpip install pdalmay fail on CI or Windows. Also none of the new deps are pinned, which can introduce breaking changes.
- Document/automate installation (e.g., conda-forge: pdal, python-pdal, gdal, proj, geos).
- Consider pinning: pyproj, shapely, h3, pdal to versions you’ve validated locally/CI.
README.MD (2)
61-68: Call out PDAL/GDAL/PROJ install pathMention that PDAL (and GDAL/PROJ) may require conda/OS packages; provide a conda-forge snippet. This prevents install blockers.
3-9: Clarify CRS assumptionsYou state conversion “using EPSG:4978.” Please add:
- What CRS
metadata.geomis in (currently assumed 4258 in code).- What CRS the input CSV coordinates are in (
metadata.epsg).- That cropping is done in a consistent CRS (see code fixes below).
lasConverter.py (8)
226-229: Don’t rely on symlinks/globs; pass explicit file listSymlinks fail on Windows without admin privileges and add I/O churn. PDAL accepts
filenames(array), so buildinput_filesdirectly and drop the temp dir and glob.Apply:
- # PDAL filename glob pattern - input_pattern = str(temp_folder / "*.las") + # Use explicit filenames list for PDAL + input_pattern = None # not used; we pass `filenames` aboveAlso remove the temp folder creation and symlink block above.
255-259: Catch a narrower exception and prefer non-streaming executionUse
p.execute()unless you’ve validated the pipeline is streamable; catchRuntimeError(or PDAL’s specific exception) instead of a blindException.Apply:
- try: - p.execute_streaming(chunk_size=500000) + try: + p.execute() print(f"✅ Written H3 cell {c2}.") - except Exception as e: + except RuntimeError as e: print(f"⚠️ Pipeline failed for cell {c2}: {e}") - finally: - shutil.rmtree(temp_folder) + # temp folder removal no longer needed if symlinks are removed
82-99: Document/verify EPSG fallback and dtypeDefaulting missing
epsgto 25832 may be wrong for surveys outside UTM32. Consider:
- Using survey centroid to pick ETRS89/UTM zone, or
- Defaulting to 4326 and explicitly reprojecting via PDAL later.
142-161: Consider setting a consistent CRS and scale/offset in LAS headersOptionally set
header.add_crs("EPSG:4978")and consistentscales(e.g., 0.001) across all chunks per survey. If you customize offsets, ensure they are identical across chunks or rescale during merge.
204-218: Use constants for paths instead of hardcodingReplace
Path("converted_surveys")withCONVERTED_PATHfor consistency.Apply:
- src = Path("converted_surveys") / f"{survey_name}.las" + src = CONVERTED_PATH / f"{survey_name}.las"
22-33: Type-hint consistency
prepare_cellsreturnsDefaultDict[str, List[str]](FIDs appended as strings), whilegroup_by_h3_cellexpectsList[int]. Align types (preferList[str]) to avoid runtime.item()issues.
52-67: Performance: O(cells×surveys) intersection checkThis nested loop will be slow for large extents. Consider using a
STRtreeover survey geoms and querying with each cell polygon’s envelope before preciseintersects.
47-47: Nit: typo“Transfrom” → “Transform”.
.gitignore (1)
1-5: Confirm intentional exclusion of input/metadataIgnoring
metadataandsurveysprevents tracking sample inputs. If examples are needed, add ametadata.sample/pattern and reference it in README.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.gitignore(1 hunks)README.MD(1 hunks)lasConverter.py(1 hunks)requirements.txt(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
README.MD
16-16: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.1)
lasConverter.py
257-257: Do not catch blind exception: Exception
(BLE001)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
lasConverter.py (2)
31-33: Unify CRS to 4326 for overlap tests; stop transforming H3/survey polygons to 4978Cropping/overlap in ECEF (4978) with 2D polygons is geometrically wrong and mixes datums. Keep both H3 polygons and survey geoms in 4326 for overlap; only point clouds should be 4978. This also simplifies PDAL cropping later.
Apply:
- cell_polygons: Mapping of H3 cell IDs to their shapely Polygons (EPSG:4978). + cell_polygons: Mapping of H3 cell IDs to their shapely Polygons (EPSG:4326). @@ - # Transformer for h3 cell polygons - transformer_4326_to_4978 = Transformer.from_crs("EPSG:4326", "EPSG:4978", always_xy=True) + # H3 polygons remain in EPSG:4326 for overlap/cropping. @@ - poly = Polygon([(lng, lat) for lat, lng in coords]) - cell_polygons[c2] = transform(transformer_4326_to_4978.transform, poly) + poly = Polygon([(lng, lat) for lat, lng in coords]) # EPSG:4326 + cell_polygons[c2] = poly @@ - # Transform survey polygons from EPSG:4258 to EPSG:4978 + # Transform survey polygons from EPSG:4258 to EPSG:4326 metadata["geom"] = metadata["geom"].apply(wkt.loads) - transform_4258_to_4978 = Transformer.from_crs("EPSG:4258", "EPSG:4978", always_xy=True) - metadata["geom_4978"] = metadata["geom"].apply(lambda g: transform(transform_4258_to_4978.transform, g)) + transform_4258_to_4326 = Transformer.from_crs("EPSG:4258", "EPSG:4326", always_xy=True) + metadata["geom_4326"] = metadata["geom"].apply(lambda g: transform(transform_4258_to_4326.transform, g)) @@ - geom = survey["geom_4978"] + geom = survey["geom_4326"]Also applies to: 35-50, 56-66
234-251: Crop in 4326; add reprojection stages; use filenames; drop extra_dims in PDALPerform crop in 4326, reproject back to 4978. Use explicit filenames to avoid glob quirks and remove unnecessary extra_dims.
- # PDAL JSON pipeline + # PDAL JSON pipeline (crop in EPSG:4326, then reproject back) pdal_pipeline = { "pipeline": [ - { - "type": "readers.las", - "filename": input_pattern, - "extra_dims": "Accepted=uint8,TVU=float32,THU=float32" - }, - { - "type": "filters.crop", - "polygon": cell_poly.wkt - }, - { - "type": "writers.las", - "filename": str(H3_CELLS_PATH / f"{c2}.las"), - "extra_dims": "Accepted=uint8,TVU=float32,THU=float32" - } + {"type": "readers.las", "filenames": input_files}, + {"type": "filters.reprojection", "in_srs": "EPSG:4978", "out_srs": "EPSG:4326"}, + {"type": "filters.crop", "polygon": cell_poly.wkt}, # cell_poly must be EPSG:4326 + {"type": "filters.reprojection", "in_srs": "EPSG:4326", "out_srs": "EPSG:4978"}, + {"type": "writers.las", "filename": str(H3_CELLS_PATH / f"{c2}.las")} ] }
🧹 Nitpick comments (6)
lasConverter.py (5)
192-195: Docstring: reflect 4326 polygons when cropping in 4326Update to avoid confusion with the corrected CRS flow.
- cell_polygons: Mapping of H3 cell IDs to their shapely Polygons (EPSG:4978). + cell_polygons: Mapping of H3 cell IDs to their shapely Polygons (EPSG:4326).
229-231: Remove unused glob after switching to filenames
input_patternbecomes dead code when usingfilenames. Clean it up.- # PDAL filename glob pattern - input_pattern = str(temp_folder / "*.las") + # Using explicit filenames list for PDAL; no glob needed
215-221: Symlink fallback for Windows/non‑privileged environmentsSymlink creation often fails on Windows without Developer Mode/admin. Fall back to copy.
- dst.symlink_to(src.resolve()) + try: + dst.symlink_to(src.resolve()) + except OSError: + # Fallback on platforms without symlink privileges + shutil.copy2(src, dst)
143-151: Embed CRS in LAS output (optional but helpful for downstream tools)Add EPSG:4978 to LAS headers so consumers don’t rely on external config.
- header = laspy.LasHeader(point_format=6, version="1.4") + header = laspy.LasHeader(point_format=6, version="1.4") + # Embed CRS in header for clarity (LAS 1.4) + # Requires: from laspy.crs import CRS (import at top) + header.add_crs(CRS.from_epsg(4978))Add import:
import laspy +from laspy.crs import CRS
140-141: Verify point_source_id fits 16‑bit; optionally preserve full FIDLAS point_source_id is uint16. If your survey index/FIDs can exceed 65535, remap or store the full value in an extra dimension (e.g., SourceFID).
Optional addition:
- # Set source_id - ids = np.full(len(df), source_id, dtype=np.uint16) + # Set source_id (LAS spec: uint16); ensure source_id <= 65535 + ids = np.full(len(df), source_id % 65536, dtype=np.uint16) @@ - header.add_extra_dim(laspy.ExtraBytesParams(name="Accepted", type=np.uint8)) + header.add_extra_dim(laspy.ExtraBytesParams(name="Accepted", type=np.uint8)) + header.add_extra_dim(laspy.ExtraBytesParams(name="SourceFID", type=np.uint32)) @@ - las["Accepted"] = accepted + las["Accepted"] = accepted + las["SourceFID"] = np.full(len(df), int(row["fid"]), dtype=np.uint32)Also applies to: 145-159
README.MD (1)
3-9: Clarify CRS flow: crop in 4326, reproject to 4978 for LASAlign docs with implementation fix: overlap/cropping in EPSG:4326; CSV→LAS in 4978.
-`lasConverter.py` converts survey CSV files into LAS files using EPSG:4978 and groups/crops them by H3 cells. +`lasConverter.py` converts survey CSV files into LAS files (EPSG:4978). H3 overlap and cropping are performed in EPSG:4326, then results are reprojected back to 4978.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
README.MD(1 hunks)lasConverter.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.1)
lasConverter.py
260-260: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (2)
lasConverter.py (1)
167-171: Nice defensive guard on empty chunk setPrevents IndexError during merge when a survey yields no chunks. LGTM.
README.MD (1)
16-28: Good: fenced code block language setThe code block now specifies
text, satisfying MD040.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
lasConverter.py (1)
47-50: Hardcoded EPSG:4258 assumption for survey geometries is risky.The code assumes all survey
geomWKT strings are in EPSG:4258 without validation. If geometries are in a different CRS, spatial intersections will silently produce incorrect results.Consider adding a
geom_epsgcolumn to metadata or at minimum document this assumption prominently:+ # ASSUMES: All geometries in metadata["geom"] are in EPSG:4258 + # TODO: Make source CRS configurable via metadata["geom_epsg"] metadata["geom"] = metadata["geom"].apply(wkt.loads) transform_4258_to_4978 = Transformer.from_crs("EPSG:4258", "EPSG:4978", always_xy=True) metadata["geom_4978"] = metadata["geom"].apply(lambda g: transform(transform_4258_to_4978.transform, g))
🧹 Nitpick comments (2)
lasConverter.py (2)
54-65: Nested loop may be slow for large datasets.The O(cells × surveys) iteration checks every survey against every cell. For hundreds of cells and surveys, this could become slow.
Consider using an R-tree spatial index (via
shapely.strtree.STRtree) to speed up intersection queries:from shapely.strtree import STRtree # Build spatial index for cells cell_ids = list(cell_polygons.keys()) cell_geoms = list(cell_polygons.values()) tree = STRtree(cell_geoms) # Query for each survey for _, survey in metadata.iterrows(): geom = survey["geom_4978"] if geom.is_empty: continue # Query index for potentially intersecting cells for idx in tree.query(geom): c2 = cell_ids[idx] cell_poly = cell_geoms[idx] if cell_poly.intersects(geom) or (isinstance(geom, MultiPolygon) and any(cell_poly.intersects(sg) for sg in geom.geoms)): cell_to_surveys[c2].append(survey["fid"])
83-83: Document the default EPSG:25832 choice.EPSG:25832 (ETRS89 / UTM zone 32N) is silently used as the default when
epsgis missing. This should be documented for clarity.Add a comment explaining the choice:
- # Set EPSG:25832 as default when crs is missing + # Default to EPSG:25832 (ETRS89 / UTM zone 32N, common for Norwegian surveys) metadata["epsg"] = metadata["epsg"].fillna(25832).astype(int)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lasConverter.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-26T12:12:49.402Z
Learnt from: adriahso
PR: TDT4290-group-4/lasConverter#4
File: lasConverter.py:238-250
Timestamp: 2025-10-26T12:12:49.402Z
Learning: In the lasConverter project, PDAL pipelines require explicit `extra_dims` declarations in readers.las and writers.las stages to preserve custom extra bytes (Accepted, TVU, THU) when cropping LAS files, even though theoretically LAS should carry extra bytes automatically. This was empirically tested and confirmed necessary.
Applied to files:
lasConverter.py
🔇 Additional comments (1)
lasConverter.py (1)
219-219: Symlink creation may fail on Windows.
Path.symlink_to()requires administrator privileges or Developer Mode on Windows, which could cause failures in production environments.Consider using
shutil.copy2()for cross-platform compatibility if symlinks are problematic:- dst.symlink_to(src.resolve()) + shutil.copy2(src, dst) # More portable than symlinksOr add error handling:
- dst.symlink_to(src.resolve()) + try: + dst.symlink_to(src.resolve()) + except (OSError, NotImplementedError): + # Fallback to copy on Windows or when symlinks unavailable + shutil.copy2(src, dst)Verify whether the deployment environment supports symlinks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why the metadata files cant be included in the repo?
Also saw some exception handling suggestions from the rabbit. Not sure how necessary they are, but can be nice to display error messages for bug fixing for future use of the converter maybe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I didn't include the metadata files in the repo in case the survey locations themselves are confidential, so want to clarify with the customer before including them. Already have quite some exception handling, but can include more later if time allows for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Issue number
Closes #3
Description
Updated the lasConverter to group and crop the survey to level 2 H3 cells. This is done in order to prevent the floating point precision issue which occurs when dealing with large areas.
Testing steps
metadata.csvandlevel1_h3_cells.csvfiles are uploaded to our Google Drive.Summary by CodeRabbit
New Features
Documentation
Chores