Search CQ DIrectory for orgs within a radius from patient address #1271

RamilGaripov · 2023-12-01T00:30:57Z

Ref: https://github.com/metriport/metriport-internal/issues/1302

Dependencies

Upstream: https://github.com/metriport/metriport-internal/pull/1312

Description

Added AWS Location Service to Core and API stack
Added functionality to find CQ directory organizations within a radius around patients' addresses
Added internal endpoints to create mock cq organizations for testing - this is temporary
Removed update functionality for CQ directory entry rows
Removed listAllOrganizations from the carequality-sdk

Testing

Local tests:

Thoroughly tested locally: 70k rows of mocked organizations get created and inserted in 16-17 seconds.

Tests on staging:

100k rows of mocked organizations created and inserted in 2 min 20 seconds. The API was very slow after that for a couple of minutes. Note that the prod db instance is 4x bigger than that on staging, so it should handle it more easily.
Added some sleep cycles in between batches to ease the load on the db: 100k rows took 3 min 20 seconds, and the API seemed to work completely fine right after.

Follow up / Related issues:

Updating the Address model to include geog coordinates - 1326

Release Plan

Run some SQL queries in staging to clean up the previous migrations - context
Run some SQL queries in prod to clean up the previous migrations
Release NPM (alpha)
Release NPM (prod)
Merge upstream
Merge this

refs. metriport/metriport-internal#1302

packages/api/Dockerfile.dev

packages/api/src/command/medical/cq-directory/get-cq-directory-entry.ts

leite08 · 2023-12-01T01:44:39Z

packages/api/src/command/medical/cq-directory/create-cq-directory-entry.ts

+): Promise<CQOrganizationCreateResponse[]> => {
+  const oids = orgDataArray.map(data => data.oid);
+  const existingEntries = await getCQDirectoryEntriesByOids(oids);


This function is critical AFAIK, so we probably want to do some tweaks we don't usually do for regular DB/queries/lookups:

oids can probably have 10s of 1,000s of records, rights? So we probably want to limit the amount of IDs we want to sent to the DB, this will be translated into select * from ... where id in (LIST), we don't want LIST to have 20,000 ids 😅

maybe chunk it to 100 entries per query and issue around 20 queries in parallel?

getCQDirectoryEntriesByOids() should only return the exact data we need, in this case just the record's id;

(looping trough the entries here is expensive, we're talking about +1,000 items) we should probably consider having a map instead of an array (I would even consider the listAllOrganizations from the SDK to return a map, so one less loop through 10s of 1,000s of records);

but that only if its worth doing it - i.e., no point converting to map if we're only going to iterate over it once (might as well just use the array)

regardless of using a map or an array, we can add a property to the original payload to indicate whether it should be updated or created (I think it will be cheaper to filter twice later than keep adding items to array one by one like lines 45 and 47);

speaking of which, we probably want to avoid object deconstructing like line 45 - can't we just set the id of orgData?

re: reconstructing: also want to avoid that on lines 57 and 58 - could return separated properties

actually, do we absolutely need to return the result? if not, this means create and update also don't need to return

So far, I took into account 2, 4, 5 and 6.

#1 I marked as a TODO, and for 3, we only iterate thru it once.

#1 I marked as a TODO

As in, this PR's TODO, or TODO for later/another PR?

Also, I forgot about this:
7. Is it worth to run check for existing entries and update them selectively, or should we just update it all? If we always recreate, we save (1) loading data (which I believe is like +99% of the records, right?) and (2) comparing it in memory (which will not be cheap, very CPU intensive considering the amount of data).
8. Do we need the directory's ID? If not, removing the PK also removes the associated index, which will speed up a lot building it.
9. We should prob remove indexes before we recreate the directory and add them afterwards, this should speed up a lot the whole process.

Later TODOs. I will update the description and create new issues to link here.

For 7), I think it's worth only updating the entries that need an update. I think at some point, when we start querying the directory on a daily basis, we will probably want to know which organizations actually got updated, so we can re-query for new organization x patient associations, especially in the case of an address change.

For 8), the directory ID is the OID of the organization.

For 9), I'll look into it, but might need clarifications - will take it to Slack

That's a good point, but I would make sure that's the case, because I assume the difference in execution time from always create vs. update to be significant. If we need to update, could look into .upsert() so we don't need to load and compare.

Gotcha. And are we going to query by OID? Or always by lat/lon?

probably OID and point, rather than lat/lon. Maybe we should add another index for the point?

Maybe we should add another index for the point?

If we're going to query the directory using that as the main filter, likely yes. Would need to look at the filter/conditions to be able to better recommend what to do here. We can add it when we PR this query.

packages/api/src/command/medical/cq-directory/create-cq-directory-entry.ts

...api/src/sequelize/migrations/2023-11-28_00_alter-cq-directory-change-coord-data-type copy.ts

packages/core/src/external/aws/location.ts

packages/infra/lib/api-stack/api-service.ts

leite08 · 2023-12-01T02:12:29Z

packages/api/src/command/medical/cq-directory/search-cq-directory.ts

+  const orgs = [];
+
+  // TODO: MAKE SURE LAT LON WITHIN RANGE
+  for (const coord of coordinates) {


That's usually well under 5, could prob issue parallel requests to the DB - although I know the majority will be one, so NABD.

leite08 · 2023-12-01T15:51:17Z

packages/api/src/command/medical/cq-directory/search-cq-directory.ts

+        Sequelize.literal(`earth_box(ll_to_earth (${coord.lat}, ${coord.lon}), ${radiusInMeters}) @> ll_to_earth (lat, lon)
+        AND earth_distance(ll_to_earth (${coord.lat}, ${coord.lon}), ll_to_earth (lat, lon)) < ${radiusInMeters}`),


This might be something we can do some improvement as well. I'm assuming this is doing a full-table scan, right?

We probably can't do much about it since the filter criteria is applying a function on the columns with dynamic values - still worth checking how earth_box and earth_distance work WRT indexes, but maybe we can store ll_to_earth(lat, lon) as a precomputed column. It might not help much, but just something that crossed my mind as well.

IIRC, we don't have indexes on lat, long, but we don't do any query on those without applying a function to the values, right? If so, unless Postgres' geolocation functions can leverage it, not worth adding those.

hmm that's a good point.. storing that instead of, or in addition to, the coordinates could save some time when querying for each patient's nearby facilities

I was looking to see if it's worth the time (I'd recommend you do too), and TIL Posgres has a type of index that supports complex types, it's called GiST: https://blog.devgenius.io/generalized-search-tree-gist-indexes-in-postgresql-ba77e1bf26a1

Found about it here (good reading): https://dba.stackexchange.com/questions/314984/optimizing-geolocation-query-in-postgres-with-earth-box-and-earth-distance

Found this too: https://stackoverflow.com/questions/18059740/how-to-create-a-postgresql-index-that-includes-latitude-longitude-using-gist

I'd look into these to better understand how it works, best practices, and caveats, while building this type of solution.

packages/infra/lib/api-stack/api-service.ts

packages/api/Dockerfile.dev

packages/infra/lib/api-stack/api-service.ts

packages/api/src/command/medical/cq-directory/create-cq-directory-entry.ts

packages/api/src/command/medical/cq-directory/parse-cq-directory-entry.ts

refs. metriport/metriport-internal#1302

leite08 · 2023-12-05T04:59:32Z

packages/api/src/command/medical/cq-directory/rebuild-cq-directory.ts

+    console.log("Parsed", orgs.length, "organizations for the CQ directory.");
+
+    for (let i = 0; i <= orgs.length; i += BATCH_SIZE) {
+      const batch = orgs.slice(i, i + BATCH_SIZE) as CQDirectoryEntryDataWithUpdateAndId[];


Could also save cycles skipping this .slice() and always passing the whole directory, indicating from which position to start working at and how many entries to be processed.

I tried to implement that, but with how getCQDirectoryEntriesIdsAndLastUpdated() is built, I had to either use slice, or have some for loops.. i felt like using slice() in this case to break it up into batches isn't as bad

have some for loops

Yeah, the simplest form of "for loop":

for (const i = initialPos; i < Math.min(initialPos + batchSize, arr.length); i++) { ... }

packages/api/src/models/db.ts

jonahkaye · 2023-12-05T05:12:00Z

packages/api/src/command/medical/cq-directory/create-cq-directory-entry.ts

+10000
+async function computeEarthPoint(orgData: CQDirectoryEntryData): Promise<string | undefined> {
+  if (orgData.lat && orgData.lon) {
+    const query = "SELECT ll_to_earth(:lat, :lon) as point";


use sequelize orm instead of raw sql querries. So something like this instead:
const pointQueryResult = await CQDirectoryEntryModel.findOne({ attributes: [[sequelize.fn('ll_to_earth', orgData.lat, orgData.lon), 'point']], raw: true,

I'd consider the pros and cons here. I'm 100% for using Sequelize's ORM goodies in the vast majority of cases, especially to have type safety. But, if we don't get type safety from it and it makes it harder to code, I'd go with pure SQL for very specific applications/use cases.

Actually, in this particular case, couldn't we compute it in memory? This is likely some standard computation, so we probably don't need to ask the DB do calculate it for us.

This answer might leads us on the right path, I got to this source code for the ll_to_earth function.

packages/api/src/command/medical/cq-directory/create-cq-directory-entry.ts

packages/api/src/command/medical/cq-directory/get-cq-directory-entry.ts

jonahkaye · 2023-12-05T05:31:43Z

packages/api/src/command/medical/cq-directory/get-cq-directory-entry.ts

+  const entries = await CQDirectoryEntryModel.findAll({
+    where: {
+      oid: oids,
+    },


add:
attributes: ['id', 'oid', 'lastUpdated'],
to the search to improve efficiency by only retrieving necessary fields. Also sorta similar to the suggestion Raf has been making to explicitly decompose patient: { id, cxId } in function headers

packages/api/src/command/medical/cq-directory/parse-cq-directory-entry.ts

jonahkaye · 2023-12-05T05:52:21Z

packages/api/src/command/medical/cq-directory/parse-cq-directory-entry.ts

+      const lat = parseFloat(position.latitude.value);
+      const lon = parseFloat(position.longitude.value);
+      if (lat < -90 || lat > 90) return [];
+      if (lon < -180 || lon > 180) return [];


parseFloat can return NaN if non numeric characters in position lat long. Maybe check for NaNs:
if (isNaN(lat) || isNaN(lon)) return [];

oh cool, didnt know that!

refs. metriport/metriport-internal#1302

Ref: #000 - api@1.10.1-alpha.0 - @metriport/api-sdk@7.8.3-alpha.0 - carequality-cert-runner@1.0.1-alpha.0 - @metriport/carequality-sdk@0.1.6-alpha.0 - @metriport/commonwell-cert-runner@1.10.7-alpha.0 - @metriport/commonwell-jwt-maker@1.7.7-alpha.0 - @metriport/commonwell-sdk@4.9.7-alpha.0 - connect-widget@1.7.11-alpha.0 - @metriport/core@1.6.8-alpha.0 - @metriport/ihe-gateway-sdk@0.1.9-alpha.0 - infrastructure@1.5.9-alpha.0 - @metriport/react-native-sdk@1.5.7-alpha.0 - @metriport/shared@0.1.7-alpha.0 - tester-node@1.1.11-alpha.0 - utils@1.7.10-alpha.0

refs. metriport/metriport-internal#1302

.eslintrc.json

packages/api/src/command/medical/cq-directory/create-cq-directory-entry.ts

.eslintrc.json

packages/api/src/command/medical/cq-directory/search-cq-directory.ts

packages/api/src/external/aws/address.ts

packages/api/src/routes/medical/internal-cq.ts

packages/api/src/models/medical/cq-directory.ts

refs. metriport/metriport-internal#1302

leite08

LFG!!! 🚀 🚀 🚀

github-actions · 2023-12-08T20:04:36Z

🎉 This PR is included in version 5.28.0-develop.3 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

github-actions · 2023-12-09T00:08:34Z

🎉 This PR is included in version 5.29.0-develop.1 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

github-actions · 2023-12-10T22:42:00Z

🎉 This PR is included in version 5.29.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

RamilGaripov added 3 commits November 29, 2023 13:34

feat(api): added cq directory search based on radius

677f54f

refs. metriport/metriport-internal#1302

feat(mapi): added ability to create mock cq orgs to test the process

50fccc8

refs. metriport/metriport-internal#1302

feat(api): added fargate policy to allow use of location services

23f2683

refs. metriport/metriport-internal#1302

leite08 reviewed Dec 1, 2023

View reviewed changes

RamilGaripov commented Dec 1, 2023

View EDBE reviewed changes

RamilGaripov added 2 commits December 1, 2023 10:16

feat(mapi): introduced some optimization and fixed some comments

2c05fe6

refs. metriport/metriport-internal#1302

feat(mapi): pre-computed geog points + update based on last updated

126584b

refs. metriport/metriport-internal#1302

RamilGaripov had a problem deploying to staging December 4, 2023 20:31 — with GitHub Actions Error

RamilGaripov added 3 commits December 4, 2023 12:42

feat(mapi): merged develop

bf7a857

refs. metriport/metriport-internal#1302

feat(mapi): merged develop

08acf64

refs. metriport/metriport-internal#1302

feat(mapi): removed core dependencies from cq sdk

6591fba

refs. metriport/metriport-internal#1302

RamilGaripov temporarily deployed to staging December 4, 2023 21:09 — with GitHub Actions Inactive

feat(mapi): improved efficiency for org creation

ab6ef61

refs. metriport/metriport-internal#1302

RamilGaripov marked this pull request as ready for review December 5, 2023 02:58

RamilGaripov requested review from Goncharo and Orta21 as code owners December 5, 2023 02:58

leite08 reviewed Dec 5, 2023

View reviewed changes

packages/api/src/models/db.ts Outdated Show resolved Hide resolved

jonahkaye reviewed Dec 5, 2023

View reviewed changes

packages/api/src/command/medical/cq-directory/create-cq-directory-entry.ts Outdated Show resolved Hide resolved

jonahkaye reviewed Dec 5, 2023

View reviewed changes

packages/api/src/command/medical/cq-directory/create-cq-directory-entry.ts Outdated Show resolved Hide resolved

jonahkaye reviewed Dec 5, 2023

View reviewed changes

packages/api/src/command/medical/cq-directory/get-cq-directory-entry.ts Outdated Show resolved Hide resolved

jonahkaye reviewed Dec 5, 2023

View reviewed changes

packages/api/src/command/medical/cq-directory/parse-cq-directory-entry.ts Outdated Show resolved Hide resolved

jonahkaye reviewed Dec 5, 2023

View reviewed changes

RamilGaripov temporarily deployed to staging December 7, 2023 00:30 — with GitHub Actions Inactive

RamilGaripov added 2 commits December 6, 2023 17:30

feat(mapi): added sleep cycles to decrease the db load

8f19459

refs. metriport/metriport-internal#1302

feat(mapi): merged develop

a225260

refs. metriport/metriport-internal#1302

RamilGaripov temporarily deployed to staging December 7, 2023 01:46 — with GitHub Actions Inactive

RamilGaripov added 5 commits December 6, 2023 19:33

feat(mapi): took out mocking functionality

0007b93

refs. metriport/metriport-internal#1302

Merge branch 'develop' into 1302-search-cq-directory

0eaa111

feat(mapi): merged develop

078191b

refs. metriport/metriport-internal#1302

feat(mapi): nit fix

f26b633

refs. metriport/metriport-internal#1302

leite08 reviewed Dec 7, 2023

View reviewed changes

RamilGaripov added 4 commits December 6, 2023 23:55

feat(mapi): fixed comments

1eec318

refs. metriport/metriport-internal#1302

Merge branch 'develop' into 1302-search-cq-directory

a7baccf

feat(mapi): merge develop

ecd512f

refs. metriport/metriport-internal#1302

feat(mapi): table lock for renaming tx

00230b7

refs. metriport/metriport-internal#1302

RamilGaripov requested review from leite08 and jonahkaye December 8, 2023 17:36

RamilGaripov added 2 commits December 8, 2023 09:40

feat(mapi): returning pr template to normal state

4048ccd

refs. metriport/metriport-internal#1302

feat(mapi): shorter lock

4bb31c2

refs. metriport/metriport-internal#1302

leite08 approved these changes Dec 8, 2023

View reviewed changes

RamilGaripov merged commit 22671b3 into develop Dec 8, 2023

RamilGaripov deleted the 1302-search-cq-directory branch December 8, 2023 19:04

github-actions bot added the released on @develop label Dec 8, 2023

RamilGaripov mentioned this pull request Dec 8, 2023

RELEASE 2023-12-08 #1306

Merged

7 tasks

github-actions bot added the released label Dec 10, 2023

		Sequelize.literal(`earth_box(ll_to_earth (${coord.lat}, ${coord.lon}), ${radiusInMeters}) @> ll_to_earth (lat, lon)
		AND earth_distance(ll_to_earth (${coord.lat}, ${coord.lon}), ll_to_earth (lat, lon)) < ${radiusInMeters}`),

Search CQ DIrectory for orgs within a radius from patient address #1271

Search CQ DIrectory for orgs within a radius from patient address #1271

Uh oh!

Conversation

Uh oh!

Dependencies

Description

Testing

Follow up / Related issues:

Release Plan

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!