8000 Ability to extend supported extension - file parser mapping #3690 by fractal3000 · Pull Request #3683 · jmix-framework/jmix · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Ability to extend supported extension - file parser mapping #3690 #3683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 77 commits into
base: master
Choose a base branch
from

Conversation

fractal3000
Copy link
Contributor
@fractal3000 fractal3000 commented Sep 7, 2024

@fractal3000 fractal3000 marked this pull request as draft September 9, 2024 09:09
@fractal3000
Copy link
Contributor Author
fractal3000 commented Sep 9, 2024

I moved it to "draft" to add specific logging checks in tests.

@fractal3000 fractal3000 marked this pull request as ready for review September 9, 2024 13:57
testImplementation 'org.mockito:mockito-core'
testImplementation 'org.spockframework:spock-core'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate dependency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spock, still, added twice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


package io.jmix.search.exception;

public abstract class ParserResolvingException extends Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDoc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


public class UnsupportedFileFormatException extends Exception {
public static final String MESSAGE = "Extension of the file %s is empty";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add a public API if it is not intended to be used by users. You don't event use this constant in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


public class UnsupportedFileExtensionException extends ParserResolvingException {

public static final String MESSAGE = "The file %s with '%s' extension is not supported. " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add a public API if it is not intended to be used by users. You don't event use this constant in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

public class UnsupportedFileExtensionException extends ParserResolvingException {

public static final String MESSAGE = "The file %s with '%s' extension is not supported. " +
"Only following file extensions are supported %s.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we decided not to specify which extensions are allowed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it so because the problem with "fogotten extensions" had been solved.

this.parserSupplier = parserSupplier;
}

public String getSymbols() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

These symbols are called extension.

Copy link
Contributor Author
6D40

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did so to make this field name be different from the class name that countains a "extension" word.


@Component
@Component("search_FileProcessor")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't change public API. Bean name is public API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. But this package is marked as "internal"
image

public class FileProcessor {

private static final Logger log = LoggerFactory.getLogger(FileProcessor.class);

protected FileStorageLocator fileStorageLocator;
private final FileParserResolver fileParserResolver;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -82,46 +77,4 @@ public String extractFileContent(FileRef fileRef) throws FileParseException, Uns
}
return stringWriter.toString();
}

protected Parser getParser(FileRef fileRef) throws UnsupportedFileFormatException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert both methods and call fileParserResolver from them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. But this class is in the package is annotated with an "Internal" annotation.


import java.util.function.Supplier;

public enum SupportedFileExtensions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What is the point of this enum? It still cannot be extended if needed + you've created FileParserResolver bean to iterate over available constants which can be easily done in the enum itself (see any enum for entity attribute).

  2. Remember that the issue that you fix is a bug, not a feature. But all these changes to much for a bug fix.

Copy link
Contributor Author
@fractal3000 fractal3000 Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The main goal of using enums is much more safe using them as string constansts in the "switch" expressions. If the new item is added to a "enum" and the item is not added to the correspondant "swith" constructions the compile time error will occured. Another point is to incapsulate the string constants and the parsers into the "enum" constants. This makes code more clean and reliable.

The FileParserResolver have been created to icapsulate getting a parser into a "enum" counstant. If we should add any new extention we will have to point the correspondant parser in the same time. And there are no an ability to do forget to do it. Such incupsulating makes string link between the string constant of the extension(such as "txt" or "doc") and the parser. If we do so a proggamer can't add any new extension and not add a parser for it. As a result of adding such "enum" item a programmer doesn't need to add any changes to algoritms. The case of using the new file format will be supported automatically.

  1. Except the small change in the io.jmix.search.utils.FileProcessor#getParser method signature there are no any other breaking changes in the public API. Additionaly the changing type of the exception of this method was approved by Ivan Gavrilov and this change is already commited to the release 2.3 and to the "master" branches. The other changes are not breaking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the io.jmix.search.utils.FileProcessor class is in the "internal" marked package. And with "The boy scout rule".

Copy link
Contributor Author
@fractal3000 fractal3000 Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally I found at least two places in code with incorrect behaviour(excepting the main problem).

@fractal3000 fractal3000 linked an issue Sep 11, 2024 that may be closed by this pull request
@fractal3000 fractal3000 requested a review from glebfox September 18, 2024 21:00
"rtf" | RTFParser
"odt" | OpenDocumentParser
"ods" | OpenDocumentParser
"doc" | OfficeParser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still no tests for UPPER CASE extensions which are not supported

Screenshot 2024-09-23 at 16 17 40

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

this.fileParserResolvers = fileParserResolvers;
}

public Parser getParser(FileRef fileRef) throws UnsupportedFileTypeException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDoc

Copy link
Contributor Author
@fractal3000 fractal3000 Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class is in the internal package. Is it neccessary to add JavaDoc for all its methods?


public Parser getParser(FileRef fileRef) throws UnsupportedFileTypeException {
if (fileParserResolvers.isEmpty()) {
throw new EmptyFileParserResolversList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Exception classes must be named in form of FooException
  2. You don't need a custom exception class until you explicitly handle it. Throw IllegalStateException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Fixed

* An exception that is thrown when a user added some file of the type that is not supported
* and there are no any known parser for.
*/
public class UnsupportedFileTypeException extends Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 2.3 you have a different exception class name. You cannot rename public API, so either revert prev name or rename it in 2.3 until we published a next bug fix release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -16,13 +16,11 @@

package io.jmix.search.exception;

import org.apache.commons.io.FilenameUtils;
public class EmptyFileParserResolversList extends RuntimeException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Exception class must be named in form FooException
  2. You don't need a custom exception class until you explicitly handle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception was deleted. Fixed.

@Component("search_OldOfficeDocumentsParserResolver")
@Order(100)
public class OldMSOfficeDocumentsParserResolver extends AbstractExtensionBasedFileParserResolver {
@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://youtrack.haulmont.com/articles/PLAT-A-1158/Code-Style#Java

Empty line after class or interface definition

P.S. fix all classes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

import java.util.List;

/**
* Is a part of the extendable engine the gives an ability to implement custom file parser resolvers and to support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDoc is written for API users, not for developers who will implement it. For example:

Interface to be implemented by a custom file parser resolver

Check https://youtrack.haulmont.com/articles/PLAT-A-1158/Code-Style#Javadocs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

public interface FileParserResolver {

/**
* This method should return the description that describes the constraints or the constraint for the files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDoc is written for API users, not for developers who will implement it. Writing style for a methods: starting with the verb defining the main action.

Check https://youtrack.haulmont.com/articles/PLAT-A-1158/Code-Style#Javadocs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Parser getParser();

/**
* This method should implement the logic for checking
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDoc is written for API users, not for developers who will implement it. Writing style for a methods: starting with the verb defining the main action.

Check https://youtrack.haulmont.com/articles/PLAT-A-1158/Code-Style#Javadocs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

"Only the following file parsing criteria are supported:\n -%s";

/**
* @param fileName - the name of the file which type is not supported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hyphen is redundant.

Screenshot 2024-09-23 at 16 50 23

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

public abstract class AbstractExtensionBasedFileParserResolver implements FileParserResolver {

/**
* Returns a collection of supported extensions of the supported file type. E.g. ["xlsx", "XLSX", "docx", "DOCX"].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to explicitly specify both lowercase and uppercase extensions now?
Lets require (and mention it in javadoc) lowercase only and convert the actual extension to lowercase within support method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed with Gleb such option but we desided not to implement it within this issue. It will be implemented in a separate issue. #3696

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JavaDoc was corrected

*
* @return collection of supported extensions
*/
public abstract List<String> getSupportedExtensions();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set is better for contains operation than List.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

public interface FileParserResolver {

/**
* This method should return the description that describes the constraints or the constraint for the files
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method name is getCriteriaDescription but javadoc tells about "constraints".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

*
* @return criteria description
*/
String getCriteriaDescription();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need this public API method?
Isn't it better to delegate this logic to the final consumer? The only purpose of this is to generate message like 'The file extension should be one of the following: ...'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consumer doesn't know anything about the file checking criteria that are implemented in the resolvers. The aim was to give to the user ability to get comprehensive information what is going wrong. If we remove this method we just could say that "A resolver(and parser) for the file couldn't be found".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message of the AbstractExtensionBasedFileParserResolver was corrected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method was left without changes as it was discussed.

* A search principle is based on the sequential applying FileParserResolver objects' checks for the given file.
*/
@Component("search_FileParserResolverManager")
public class FileParserResolverManager {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class consumes existing FileParserResolvers internally, but doesn't provide outside anything about them.
It provides the final Parsers.
So maybe this class should be named as FileParserProvider?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, fixed

if (resolver.supports(fileRef)) {
return resolver.getParser();
}
messages.add(resolver.getCriteriaDescription());
Copy link
Collaborator
@Gavrilov-Ivan Gavrilov-Ivan Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we try to process completely unsupported file it will collect "criteria descriptions" from all existing resolvers.
As a result the final exception message will contain multiple "The file extension should be one of the following: ..." expressions.

That is the point in favor of the FileParserResolver#getCriteriaDescription method irrelevance.

This loop should collect extensions supported by unsuitable resolvers and provide them to exception if non of existing resolvers is suitable. And exception generates the final message based on provided extensions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed. Left without changes.

*
* @return an instance of a file parser
*/
Parser getParser();
FileParsingBundle getParsingBundle();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe FileParserKit?


public record FileParsingBundle(
@NotNull Parser parser,
@NotNull Function<StringWriter, BodyContentHandler> bodyContentHandlerGenerator,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ContentHandler. BodyContentHandler is a specific implementation.

@fractal3000 fractal3000 marked this pull request as draft September 27, 2024 16:01
@glebfox glebfox removed their request for review September 30, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to extend supported extension - file parser mapping
3 participants
0