-
Notifications
You must be signed in to change notification settings - Fork 494
ORC-1920: [C++] Support Geometry
and Geography
types
#2269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Geometry
and Geography
types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @ffacs .
cc @wgtmac and @williamhyun
c++/include/orc/Geospatial.hh
Outdated
/// | ||
/// These values correspond to the 1, 2, ..., 7 component of the WKB integer | ||
/// geometry type (i.e., the value of geometry_type % 1000). | ||
enum class GeometryType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Java, we had an additional value, -1
, for UNKNOWN_TYPE_ID.
private static final int UNKNOWN_TYPE_ID = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ employs a different implementation that invalidates statistics when unknown types are encountered; therefore, UNKNOWN_TYPE_ID is unnecessary in this context.
} | ||
|
||
std::stringstream ss; | ||
ss << "<GeoStatistics>"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. Why do we have this only in C++ implementation? If this is required, can we have this in Java first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is due to gap between Parquet Java and C++ implementations...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is due to gap between Parquet Java and C++ implementations..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few minor comments about the feature parity between Java and C++ implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thank you @ffacs for this PR, waiting on the above comment before merging.
Gentle ping, @ffacs . |
Pong. Thank you for your review @dongjoon-hyun , I'd update this patch these days. |
Thank you, @ffacs . Also, please participate the 1.8.10 vote too when you have some time. |
} | ||
|
||
std::stringstream ss; | ||
ss << "<GeoStatistics>"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is due to gap between Parquet Java and C++ implementations...
Could you fix the above comments, @ffacs ? |
Let me know when ready to review. |
@wgtmac @dongjoon-hyun I'm ready to review now, please take a look when you're free~ |
Thank you, @ffacs . |
#include "orc/Type.hh" | ||
#include "orc/Vector.hh" | ||
#include "orc/orc-config.hh" | ||
|
||
#include <sstream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this removed due to the #include <ostream>
from Geospatial.hh
?
virtual ~GeospatialColumnStatistics(); | ||
|
||
/** | ||
* get bounding box |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. Maybe, get
-> Get
for consistency with other code around this?
* This file contains code adapted from the Apache Arrow project. | ||
* | ||
* Original source: | ||
* https://github.com/apache/arrow/blob/main/cpp/src/parquet/geospatial/statistics.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide a tag based location instead of the main
branch because it changes always.
* This file contains code adapted from the Apache Arrow project. | ||
* | ||
* Original source: | ||
* https://github.com/apache/arrow/blob/main/cpp/src/parquet/geospatial/statistics.cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide a tag based location instead of the main branch because it changes always.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
#include <ostream> | ||
#include <string> | ||
|
||
namespace orc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can use namespace orc::geospatial {
@@ -25,6 +25,19 @@ | |||
|
|||
namespace orc { | |||
|
|||
namespace geospatial { | |||
enum EdgeInterpolationAlgorithm { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enum EdgeInterpolationAlgorithm { | |
enum class EdgeInterpolationAlgorithm { |
ANDOYER = 3, | ||
KARNEY = 4 | ||
}; | ||
using EIAlgo = EdgeInterpolationAlgorithm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
CEB7 The reason will be displayed to describe this comment to others. Learn more.
I don't think this acronym is clear. Can we use its full name instead?
virtual const std::string& getCRS() const = 0; | ||
virtual geospatial::EIAlgo getEIAlgo() const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
virtual const std::string& getCRS() const = 0; | |
virtual geospatial::EIAlgo getEIAlgo() const = 0; | |
virtual const std::string& getCrs() const = 0; | |
virtual geospatial::EdgeInterpolationAlgorithm getAlgorithm() const = 0; |
It would be good to add comment to say these two functions are for geometry & geography types only.
const char* incomingMask) override { | ||
ColumnWriter::add(rowBatch, offset, numValues, incomingMask); | ||
|
||
const StringVectorBatch& strBatch = dynamic_cast<const StringVectorBatch&>(rowBatch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we provide a clear exception message if it fails?
} | ||
} | ||
|
||
} // namespace orc::geospatial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a blank line
namespace orc::geospatial { | ||
|
||
template <typename T> | ||
inline std::enable_if_t<std::is_trivially_copyable_v<T>, T> SafeLoadAs(const uint8_t* unaligned) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we fix the function names in this file to have lowercased initials?
void WKBGeometryBounder::mergeGeometryInternal(WKBBuffer* src, bool recordWkbType) { | ||
uint8_t endian = src->ReadUInt8(); | ||
bool swap = endian != 0x00; | ||
if (isLittleEndian()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can use if constexpr (std::endian::native == std::endian::little)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::endian requires C++20, we are using c++17 now.
} // namespace geospatial | ||
} // namespace orc | ||
|
||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a blank line
What changes were proposed in this pull request?
Support Geometry and Geography types for c++ side
Why are the changes needed?
Add support for Geometry and Geography types
How was this patch tested?
UT passed
Was this patch authored or co-authored using generative AI tooling?
No