8000 Improvements to HTTP APIs by mhxion · Pull Request #137 · uhd-urz/elAPI · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Improvements to HTTP APIs #137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Aug 25, 2024
Merged

Improvements to HTTP APIs #137

merged 14 commits into from
Aug 25, 2024

Conversation

mhxion
Copy link
Member
@mhxion mhxion commented Aug 21, 2024

This PR mainly brings improvements that shaves off 1-2 seconds (!) of heavy-duty plugins like bill-teams and experiments.

Shared clients

elAPI provides the following core APIs that can be imported from elapi.api that do the heavy lifting of all HTTP calls to eLabFTW without having to worry about configuring HTTP client and make working with eLab API responses convenient.

  • GETRequest
  • POSTRequest
  • PATCHRequest
  • DELETERequest
  • AsyncGETRequest
  • AsyncPOSTRequest
  • AsyncPATCHRequest
  • AsyncDELETERequest

Each one of them by default opens a connection once, and closes it immediately after one HTTP request. This is useful when we only need a single request (e.g., elapi get <endpoint name> command) and don't want to worry about forgetting to close the request.

from elapi.api import GETRequest

session = GETRequest()  # Connection not open yet
print(session(endpoint_name="info").json())  # Connection is open and closed as soon as response is received

This immediate closing of connection can soon become very limiting as we need to make repeated requests. A keep_session_open keyword argument was introduced before.

from elapi.api import GETRequest, POSTRequest, PATCHRequest, DELETERequest

s1 = GETRequest(keep_session_open=True)  # Default is False
s2 = POSTRequest(keep_session_open=True) 
s3 = PATCHRequest(keep_session_open=True)
s4 = DELETERequest(keep_session_open=True)

print(s1(endpoint_name="info").json())  # A new connection opens
print(s2(endpoint_name="users", data={"firstname": "John", "lastname": "Doe", "email": "john_doe@itnerd.de", "team": 0}))
# Another new connection opens
# If we also call s3, and s4 as well, two more new connections will open

# All open connections manually need to be closed
s1.close()
s2.close()
s3.close()
s4.close()

Not only keep_session_open would open individual connections for each GET, POST, PATCH and DELETE, we would also need to make sure to close them manually. We deprecate keep_session_open in this PR, and introduce SimpleClient and the replacement of keep_session_open, shared_client.

from elapi.api import SimpleClient, GETRequest, POSTRequest

client = SimpleClient(is_async_client=False)  # Has already been configured with host URL and API key information from elapi.yml
# Connection opens at "SimpleClient()" immediately
s1 = GETRequest(shared_client=client)  # Default shared_client is None; sharing the same client
s2 = POSTRequest(shared_client=client)  # Shari
8000
ng the same client

print(s1(endpoint_name="info").json())  # Connection alredy opened in "client" definition
print(s2(endpoint_name="users", data={"firstname": "John", "lastname": "Doe", "email": "john_doe@itnerd.de", "team": 0}))

# Close all open connections
client.close()

This eases sharing the same client when needed. SimpleClient also just returns a httpx.Client or httpx.AsyncClient depending on the value of keyword argument is_async_client. So it can also be used in ways httpx.Client/httpx.AsyncClient can be used. I.e., SimpleClient can also be used as context manager.

from elapi.api import SimpleClient, GETRequest, POSTRequest

with SimpleClient(is_async_client=False)  as client:
    # Has already been configured with host URL and API key information from elapi.yml
    # Connection opens at "SimpleClient()" immediately
    s1 = GETRequest(shared_client=client)  # Sharing the same client
    s2 = POSTRequest(shared_client=client)  # Sharing the same client

    print(s1(endpoint_name="info").json())  # Connection alredy opened in "client" definition
    print(s2(endpoint_name="users", data={"firstname": "John", "lastname": "Doe", "email": "john_doe@itnerd.de", "team": 0}))

With context manager, we need not to worry about closing the connections, as they are closed when context manager is automatically.

This solution still inherits one big problem. All existing code that does not use shared_client argument will now need to be updated. In the next section, we will introduce a solution to that.

GlobalSharedSession

In the following example, we pseudo-code a script that uses a number of aforementioned elAPI HTTP APIs and runs some advanced automation task.

def check_user_permission():
    # Do something with GETRequest
    ...

def get_users_data():
    check_user_permission()  # Run permission validation
    # Do something with GETRequest
    ...

def modify_experiments_data():
    # Do something with POSTRequest, PATCHRequest
    ...

def delete_resources():
    # Do something with DELETERequest
    ...

def main():
    # Main function runs all the functions above
    check_user_permission()
    get_users_data()
    modify_experiments_data()
    delete_resources()
    # cleanup

Since each function makes calls to one or more of GET, POST, PATCH and DELETE, this script would be perfect use-case of shared_client. But that would require updating the code of all 5 functions. In this PR, we introduce GlobalSharedSession that lets us avoid just that using the power of OOP (although the user doesn't need to deal with any OOP). The main function can use a single connection to make all requests in the following way:

from elapi.api import GlobalSharedSession

def check_user_permission():
    # Do something with GETRequest
    ...

def get_users_data():
    check_user_permission()  # Run permission validation
    # Do something with AsyncGETRequest
    ...

def modify_experiments_data():
    # Do something with POSTRequest, PATCHRequest
    ...

def delete_resources():
    # Do something with DELETERequest
    ...

def main():
    with GlobalSharedSession():  # <- This line only
        # Main function runs all the functions above
        check_user_permission()
        get_users_data()
        modify_experiments_data()
        delete_resources()
    # cleanup

The change in one line will now force all elAPI HTTP APIs to use a single connection (internally, of course, its using HTTPX connection pooling) and automatically closes it accordingly. GlobalSharedSession has been added to all elAPI CLI commands.

GlobalSharedSession benchmark

We already mentioned in the introduction that GlobalSharedSession trims a mere 1-2 seconds. That is the case when we're making a single or a few eLab API calls. We use hyperfine to run a simple benchmark of elapi get users targeting server dev-002 (with more than 2500+ users). Here, ~/.local/bin/elapi is not using GlobalSharedSession, and elapi command is.

$ hyperfine --warmup 3 --runs 15 "~/.local/bin/elapi get users" "elapi get users"
Benchmark 1: ~/.local/bin/elapi get users
  Time (mean ± σ):     838.4 ms ± 333.6 ms    [User: 277.0 ms, System: 77.5 ms]
  Range (min … max):   689.1 ms … 1988.4 ms    15 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: elapi get users
  Time (mean ± σ):     715.7 ms ±  13.3 ms    [User: 274.6 ms, System: 77.5 ms]
  Range (min … max):   693.7 ms … 746.4 ms    15 runs

Summary
  elapi get users ran
    1.17 ± 0.47 times faster than ~/.local/bin/elapi get users

This is not a significant improvement and the final number (1.17x) fluctuates a lot.

Where GlobalSharedSession truly makes a big difference is when we run HTTP APIs without shared_client, i.e., when we do not update existing code. Let's benchmark an example that reflects that. The following benchmark compares the speed of 10-times repeating GET (not async) requests to endpoint info in a loop between with-GlobalSharedSession and without.

~/.local/bin/elapi awesome repeated-get-requests info elapi awesome repeated-get-requests info
with GlobalSharedSession(): for _ in range(10): validate = Validate(HostIdentityValidator(), PermissionValidator(group="sysadmin")) validate() r = GETRequest() print(f'Request {_}: {r(endpoint_name)}') for _ in range(10): validate = Validate(HostIdentityValidator(), PermissionValidator(group="sysadmin")) validate() r = GETRequest() print(f'Request {_}: {r(endpoint_name)}')

Running hyperfine:

$ hyperfine --warmup 3 --runs 10 "~/.local/bin/elapi awesome repeated-get-requests info" "elapi awesome repeated-get-requests info"
Benchmark 1: ~/.local/bin/elapi awesome repeated-get-requests info
  Time (mean ± σ):      6.320 s ±  2.856 s    [User: 0.737 s, System: 0.121 s]
  Range (min … max):    4.729 s … 13.939 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: elapi awesome repeated-get-requests info
  Time (mean ± σ):      3.161 s ±  0.154 s    [User: 0.349 s, System: 0.090 s]
  Range (min … max):    3.082 s …  3.586 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  elapi awesome repeated-get-requests info ran
    2.00 ± 0.91 times faster than ~/.local/bin/elapi awesome repeated-get-requests info

The for loop making GETRequest() calls and some more calls for permission validation, is almost 2x faster with GlobalSharedSession.

GlobalSharedSession gotchas

  1. GlobalSharedSession will override any and all arguments passed to elAPI HTTP APIs. This can be undesirable if the user is doing something specific with one of the API classes. GlobalSharedSession will show warning log for when it is necessarily overriding such arguments.
  2. By default GlobalSharedSession will open two connections only one for sync connection and another for async connection. If we only want to work with sync connections, we should use GlobalSharedSession(limited_to="sync"), and GlobalSharedSession will not open or override any of the async HTTP APIs. limited_to can accept sync or async or all (default is all).
  3. We needed to add one more external dependency nest-asyncio for GlobalSharedSession to properly work.
  4. GlobalSharedSession can be used as a normal class as well instead of using it as a context manager. In which case, the session must be closed manually. Though, using it as a context manager is more recommended.
session = GlobalSharedSession()
# <Make some API calls>
session.close()
  1. GlobalSharedSession can be used multiple times. Each time a new connection will be opened after closing the previous connection (if GSS is used as a context manager). Though, this is intended and not a gotcha.
with GlobalSharedSession():
    # <Make some API calls>

with GlobalSharedSession(limited_to="sync"):
    # <Make some more API calls>

mhxion added 14 commits August 21, 2024 02:51
GlobalSharedSession allows all HTTP APIs that are children of APIRequest to share the same HTTP session.
- Add support for shared_client (replace keep_session_open with this one) to APIRequest
- Add is_global_shared_session_user to APIRequest
- Return NotImplemented or None for close methods
FixedAsyncEndpoint and FixedAsyncEndpoint now truly share the same session.
- Return Optional[type(NotImplemented)] for close methods
GlobalSharedSession is now used throughout elAPI commands.
They are mainly for elapi/api methods for now.
- Rename missing_warning to preventive_missing_warning
- Move preventive_missing_warning to configurations/_overload_history.py as it makes more sense to have it there as part of the simple multi-layered design pattern hierarchy (I should start using an acronym for that).
- Use RuntimeWarning as superclass of PreventiveWarning
- Add imports
- Replace missing_warning with preventive_missing_warning
- Use update_kwargs_with_defaults to update kwargs of GlobalSharedSession and APIRequest
- Reset GlobalSharedSession._instance while closing GlobalSharedSession
- Make __enter__ and __exit__ instance methods from class methods
- Add suppress_override_warning
- Only show override warning while GlobalSharedSession is in use if certain conditions are met
- Replace property with cached_property
- Only update kwargs under APIRequest __init__, pass kwargs from inheriting classes as is without modification
- Add imports
- Do not call SimpleClient (i.e., open connection) when GlobalSharedSession._instance is not None
- Fix delete method of FixedAsyncEndpoint
- Return NotImplemented for close when GlobalSharedSession._instance is not None
It seems GlobalSharedSession breaks tenacity.retry for teams-info (get_teams function). elAPI just quits after the 2nd retry, and the 1st retry doesn't do anything besides immediately triggering the 2nd retry. Either it's an issue with tenacity, or how event loop is handled/closed in RecursiveInformation. This issue is not observed when no retry is triggered (i.e., no network error). For now, we just don't use GlobalSharedSession for teams-info/get_teams.
GlobalSharedSession (GSS) throws RuntimeError if the event loop closes abruptly before its close method can run. This fixes the issue in commit 0fa1fae.
The issue is fixed in commit 0fa1fae.
@mhxion mhxion self-assigned this Aug 21, 2024
@mhxion mhxion added this to the Complete bill-teams plugin milestone Aug 21, 2024
@mhxion mhxion merged commit b1997f2 into dev Aug 25, 2024
@mhxion
Copy link
Member Author
mhxion commented Sep 4, 2024

A few updates

Since the merge, a few commits related to this PR has been made to dev branch.

  • elAPI will send a specific user-agent string elAPI/<elAPI version> python-httpx/<HTTPX version> (4bde6cd)
  • Optional uvloop instead of 93F8 asyncio for bill-teams plugin
  • With an improved internal asyncio logic, nested-asyncio is no longer needed and has been removed

@mhxion mhxion mentioned this pull request Sep 4, 2024
@mhxion mhxion added the plugin-bill-teams Issues and PRs related to bill-teams plugin. bill-teams is an internal plugin. label Sep 4, 2024
@mhxion mhxion deleted the improve-http-client-2 branch September 4, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement new-feature plugin-bill-teams Issues and PRs related to bill-teams plugin. bill-teams is an internal plugin.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0