8000 Logging rework · Issue #6 · ULJ-Yale/qunex · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Logging rework #6 A879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
demsarjure opened this issue Apr 9, 2025 · 0 comments
Open

Logging rework #6

demsarjure opened this issue Apr 9, 2025 · 0 comments

Comments

@demsarjure
Copy link
Contributor
demsarjure commented Apr 9, 2025

Proposed steps:

  1. Logs are moved into the QuNex study's root folder (<qunex_study>/logs).
  2. Each QuNex command invocation creates a time stamped subfolder in there (e.g. <qunex_study>/logs/hcp_fmri_volume_2025_02_19_09513322). Inside that subfolder are all logs (current batchlogs, comlogs, runlogs). Inside there, we couuld put QuNex stuff in the root folder (current runlog and batchlog) and externally generated logs in another subfolder, say external. The alternative is to have everything in the root, which could again become messy.
  3. For transparency and reproducibility, we could copy small support files (e.g., batch file, list file, conc file ...) into the log folder as well, this way you always have at hand the exact file that was used in processing.
  4. run_recipe would create one subfolder and then each command in the recipe would create its own subfolder within the run_recipe one.
  5. Some commands currently do not create logs (I believe these are MATLAB commands), all commands should create logs at least when executed within a QuNex study. There are commands that do not have any study related parameters and can be used anywhere on disk, we need to figure out what to do with those.

Grega's comments:

1/ Logging structure

Logging so far has been "flat" in that logs from different commands were saved in the same folder and can be differentiated based on the names and dates. I think that organizing the logs in files can indeed make things easier to review. So instead of having run log in one place and command logs in another place, all the logs are in the same folder. This makes it easier to review logs that relate to the same command, but more difficult to have a more general overview of processing and analysis steps run. Speaking from a user perspective, in cases when a single or a few sessions fail, the current structure makes it easy to see across runs whether the sessions now completed ok. Overall, I think this still makes for a better organization of logs.

One option to consider – and indeed, it could be optional for users - could be to place soft or hard links to runlogs into a separate folder.

2/ Logging spec files

Having a copy of the relevant spec files (batch file, mapping file, list file) could improve transparency and reproducibility. I am worried that it will be difficult and somewhat arbitrary to decide, which files should be copied and which not. Recipe, batch file, and mapping file are probably easy choices. But what about the commands that run on individual session files, or conc files? In a single run, e.g. to run process_conc there could be hundreds of conc files.

I guess we should make a clear list of files to get copied or make it a requirement that every commands reports all the relevant settings in their log. I think the latter does already happen to a large extent. The exact calls are recorded and some of the commands have the option to report the specific parameters used. My preference would be to report the relevant settings rather than copy all the files each time.

3/ Who does the logging

Indeed, matlab commands do not create log files. They do print information to stdout, but they do not create log files. I think this is ok. I don't think it would make sense to recreate logging framework in matlab. I think it is completely ok, for matlab code to output the logging information to stdout. And I would not change this behavior. As a users, when I write custom matlab scripts for analysis, that can call functions such as fc_compute_gbc and run them from the matlab environment, this is all I need and want. What I think it does make sense adding is capturing of the stdout when a matlab command is run through the qunex entry point. In this case, I think the correct behavior would be to print the code to stdout and also create a relevant log file in the QuNex log structure.

Basically, what I am arguing for is for the entry point (currently in python) to take care of logging, and for logging to only occur when commands are run through the standard QuNex entry point.

4/ What is logged

As I user, I also want to have the option to specify when the commands should be logged and when not. Most of the discussion about logging has predominantly been done for what I call "processing commands". That is commands that are used for routine, large scale (pre)processing and analysis of data. For instance, running import_dicom, or hcp_fmri_volume. For these commands users would in majority of cases want detail logs. There are, however, also many more specialised "utility" commands. Such as joining fidl files, creating event figures based on the fidl files, thresholding and merging maps resulting from PALM analyses, and others. In a number of cases, the commands run might not even pertain to a specific study – e.g. creating an ROI file or a network mask to be used across different studies. In these cases, I do not need nor do I want to create logs.

Based on this, as a user, I would at least expect to be able to turn off logging. Better still, for this class of commands, I would rather see the logging to be opt in rather than opt out.

5/ Implementation

This goes beyond the initial discussion, and is something that I would rather first discuss with Jure, so that we can jointly prepare the initial specification, so I will just post a quick overview. My proposal for the implementation for the changed logging functionality would be to create a python class that handles all the logging. The python functions currently build and return a report string when they are called. I would change this behavior so that the function when called receives as a parameter a log handler. Instead of building a report string, the function then calls log handler methods to add new lines to the report. E.g. log.add(f"-> processing session {session_id}")

This way, all the logging functionality can be developed within a single class, making it easy to resolve some of the issues we now have, for instance relating to race conditions, and also making it easy to change logging behavior – e.g. where and how logs are being saved –, and enhance it – e.g. each call of log.add could add a timestamp to the information logged, or other useful metadata.

The same class would also be used to handle logs for external and non-python commands, such as matlab code, discussed above.

The change would be in a way substantial, but not really difficult to implement throughout the codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant
0