-
Notifications
You must be signed in to change notification settings - Fork 69
Search CQ DIrectory for orgs within a radius from patient address #1271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and
privacy statement. We’ll occasionally send you account related emails.
Already on GitHub?
Sign in
to your account
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
packages/api/src/command/medical/cq-directory/get-cq-directory-entry.ts
Outdated
Show resolved
Hide resolved
): Promise<CQOrganizationCreateResponse[]> => { | ||
const oids = orgDataArray.map(data => data.oid); | ||
const existingEntries = await getCQDirectoryEntriesByOids(oids); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is critical AFAIK, so we probably want to do some tweaks we don't usually do for regular DB/queries/lookups:
oids
can probably have 10s of 1,000s of records, rights? So we probably want to limit the amount of IDs we want to sent to the DB, this will be translated intoselect * from ... where id in (LIST)
, we don't want LIST to have 20,000 ids 😅- maybe chunk it to 100 entries per query and issue around 20 queries in parallel?
getCQDirectoryEntriesByOids()
should only return the exact data we need, in this case just the record'sid
;- (looping trough the entries here is expensive, we're talking about +1,000 items) we should probably consider having a map instead of an array (I would even consider the
listAllOrganizations
from the SDK to return a map, so one less loop through 10s of 1,000s of records);- but that only if its worth doing it - i.e., no point converting to map if we're only going to iterate over it once (might as well just use the array)
- regardless of using a map or an array, we can add a property to the original payload to indicate whether it should be updated or created (I think it will be cheaper to filter twice later than keep adding items to array one by one like lines 45 and 47);
- speaking of which, we probably want to avoid object deconstructing like line 45 - can't we just set the id of
orgData
? - re: reconstructing: also want to avoid that on lines 57 and 58 - could return separated properties
- actually, do we absolutely need to return the result? if not, this means create and update also don't need to return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, I took into account 2, 4, 5 and 6.
#1 I marked as a TODO, and for 3, we only iterate thru it once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1 I marked as a TODO
As in, this PR's TODO, or TODO for later/another PR?
Also, I forgot about this:
7. Is it worth to run check for existing entries and update them selectively, or should we just update it all? If we always recreate, we save (1) loading data (which I believe is like +99% of the records, right?) and (2) comparing it in memory (which will not be cheap, very CPU intensive considering the amount of data).
8. Do we need the directory's ID? If not, removing the PK also removes the associated index, which will speed up a lot building it.
9. We should prob remove indexes before we recreate the directory and add them afterwards, this should speed up a lot the whole process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later TODOs. I will update the description and create new issues to link here.
For 7), I think it's worth only updating the entries that need an update. I think at some point, when we start querying the directory on a daily basis, we will probably want to know which organizations actually got updated, so we can re-query for new organization x patient associations, especially in the case of an address change.
For 8), the directory ID is the OID of the organization.
For 9), I'll look into it, but might need clarifications - will take it to Slack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
That's a good point, but I would make sure that's the case, because I assume the difference in execution time from always create vs. update to be significant. If we need to update, could look into
.upsert()
so we don't need to load and compare. -
Gotcha. And are we going to query by OID? Or always by lat/lon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- probably OID and point, rather than lat/lon. Maybe we should add another index for the point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should add another index for the point?
If we're going to query the directory using that as the main filter, likely yes. Would need to look at the filter/conditions to be able to better recommend what to do here. We can add it when we PR this query.
packages/api/src/command/medical/cq-directory/create-cq-directory-entry.ts
Outdated
Show resolved
Hide resolved
packages/api/src/command/medical/cq-directory/create-cq-directory-entry.ts
Outdated
Show resolved
Hide resolved
...api/src/sequelize/migrations/2023-11-28_00_alter-cq-directory-change-coord-data-type copy.ts
Outdated
Show resolved
Hide resolved
const orgs = []; | ||
|
||
// TODO: MAKE SURE LAT LON WITHIN RANGE | ||
for (const coord of coordinates) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's usually well under 5, could prob issue parallel requests to the DB - although I know the majority will be one, so NABD.
Sequelize.literal(`earth_box(ll_to_earth (${coord.lat}, ${coord.lon}), ${radiusInMeters}) @> ll_to_earth (lat, lon) | ||
AND earth_distance(ll_to_earth (${coord.lat}, ${coord.lon}), ll_to_earth (lat, lon)) < ${radiusInMeters}`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be something we can do some improvement as well. I'm assuming this is doing a full-table scan, right?
We probably can't do much about it since the filter criteria is applying a function on the columns with dynamic values - still worth checking how earth_box
and earth_distance
work WRT indexes, but maybe we can store ll_to_earth(lat, lon)
as a precomputed column. It might not help much, but just something that crossed my mind as well.
IIRC, we don't have indexes on lat
, long
, but we don't do any query on those without applying a function to the values, right? If so, unless Postgres' geolocation functions can leverage it, not worth adding those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm that's a good point.. storing that instead of, or in addition to, the coordinates could save some time when querying for each patient's nearby facilities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking to see if it's worth the time (I'd recommend you do too), and TIL Posgres has a type of index that supports complex types, it's called GiST: https://blog.devgenius.io/generalized-search-tree-gist-indexes-in-postgresql-ba77e1bf26a1
Found about it here (good reading): https://dba.stackexchange.com/questions/314984/optimizing-geolocation-query-in-postgres-with-earth-box-and-earth-distance
Found this too: https://stackoverflow.com/questions/18059740/how-to-create-a-postgresql-index-that-includes-latitude-longitude-using-gist
I'd look into these to better understand how it works, best practices, and caveats, while building this type of solution.
packages/api/src/command/medical/cq-directory/create-cq-directory-entry.ts
Outdated
Show resolved
Hide resolved
packages/api/src/command/medical/cq-directory/parse-cq-directory-entry.ts
Outdated
Show resolved
Hide resolved
console.log("Parsed", orgs.length, "organizations for the CQ directory."); | ||
|
||
for (let i = 0; i <= orgs.length; i += BATCH_SIZE) { | ||
const batch = orgs.slice(i, i + BATCH_SIZE) as CQDirectoryEntryDataWithUpdateAndId[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also save cycles skipping this .slice()
and always passing the whole directory, indicating from which position to start working at and how many entries to be processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to implement that, but with how getCQDirectoryEntriesIdsAndLastUpdated() is built, I had to either use slice, or have some for loops.. i felt like using slice() in this case to break it up into batches isn't as bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have some for loops
Yeah, the simplest form of "for loop":
for (const i = initialPos; i < Math.min(initialPos + batchSize, arr.length); i++) {
...
}
10000
|
||
async function computeEarthPoint(orgData: CQDirectoryEntryData): Promise<string | undefined> { | ||
if (orgData.lat && orgData.lon) { | ||
const query = "SELECT ll_to_earth(:lat, :lon) as point"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use sequelize orm instead of raw sql querries. So something like this instead:
const pointQueryResult = await CQDirectoryEntryModel.findOne({ attributes: [[sequelize.fn('ll_to_earth', orgData.lat, orgData.lon), 'point']], raw: true,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider the pros and cons here. I'm 100% for using Sequelize's ORM goodies in the vast majority of cases, especially to have type safety. But, if we don't get type safety from it and it makes it harder to code, I'd go with pure SQL for very specific applications/use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, in this particular case, couldn't we compute it in memory? This is likely some standard computation, so we probably don't need to ask the DB do calculate it for us.
This answer might leads us on the right path, I got to this source code for the ll_to_earth
function.
packages/api/src/command/medical/cq-directory/create-cq-directory-entry.ts
Outdated
Show resolved
Hide resolved
packages/api/src/command/medical/cq-directory/create-cq-directory-entry.ts
Outdated
Show resolved
Hide resolved
packages/api/src/command/medical/cq-directory/get-cq-directory-entry.ts
Outdated
Show resolved
Hide resolved
const entries = await CQDirectoryEntryModel.findAll({ | ||
where: { | ||
oid: oids, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add:
attributes: ['id', 'oid', 'lastUpdated'],
to the search to improve efficiency by only retrieving necessary fields. Also sorta similar to the suggestion Raf has been making to explicitly decompose patient: { id, cxId } in function headers
packages/api/src/command/medical/cq-directory/parse-cq-directory-entry.ts
Outdated
Show resolved
Hide resolved
const lat = parseFloat(position.latitude.value); | ||
const lon = parseFloat(position.longitude.value); | ||
if (lat < -90 || lat > 90) return []; | ||
if (lon < -180 || lon > 180) return []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseFloat can return NaN if non numeric characters in position lat long. Maybe check for NaNs:
if (isNaN(lat) || isNaN(lon)) return [];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh cool, didnt know that!
Ref: #000 - api@1.10.1-alpha.0 - @metriport/api-sdk@7.8.3-alpha.0 - carequality-cert-runner@1.0.1-alpha.0 - @metriport/carequality-sdk@0.1.6-alpha.0 - @metriport/commonwell-cert-runner@1.10.7-alpha.0 - @metriport/commonwell-jwt-maker@1.7.7-alpha.0 - @metriport/commonwell-sdk@4.9.7-alpha.0 - connect-widget@1.7.11-alpha.0 - @metriport/core@1.6.8-alpha.0 - @metriport/ihe-gateway-sdk@0.1.9-alpha.0 - infrastructure@1.5.9-alpha.0 - @metriport/react-native-sdk@1.5.7-alpha.0 - @metriport/shared@0.1.7-alpha.0 - tester-node@1.1.11-alpha.0 - utils@1.7.10-alpha.0
packages/api/src/command/medical/cq-directory/create-cq-directory-entry.ts
Outdated
Show resolved
Hide resolved
packages/api/src/command/medical/cq-directory/create-cq-directory-entry.ts
Outdated
Show resolved
Hide resolved
packages/api/src/command/medical/cq-directory/create-cq-directory-entry.ts
Show resolved
Hide resolved
packages/api/src/command/medical/cq-directory/search-cq-directory.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LFG!!! 🚀 🚀 🚀
🎉 This PR is included in version 5.28.0-develop.3 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
🎉 This PR is included in version 5.29.0-develop.1 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
🎉 This PR is included in version 5.29.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
Ref: https://github.com/metriport/metriport-internal/issues/1302
Dependencies
Description
carequality-sdk
Testing
Local tests:
Tests on staging:
Follow up / Related issues:
Release Plan