-
Notifications
You must be signed in to change notification settings - Fork 134
Ability to extend supported extension - file parser mapping #3690 #3683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I moved it to "draft" to add specific logging checks in tests. |
jmix-search/search/search.gradle
Outdated
testImplementation 'org.mockito:mockito-core' | ||
testImplementation 'org.spockframework:spock-core' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spock, still, added twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
package io.jmix.search.exception; | ||
|
||
public abstract class ParserResolvingException extends Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JavaDoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
public class UnsupportedFileFormatException extends Exception { | ||
public static final String MESSAGE = "Extension of the file %s is empty"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add a public API if it is not intended to be used by users. You don't event use this constant in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
public class UnsupportedFileExtensionException extends ParserResolvingException { | ||
|
||
public static final String MESSAGE = "The file %s with '%s' extension is not supported. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add a public API if it is not intended to be used by users. You don't event use this constant in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
public class UnsupportedFileExtensionException extends ParserResolvingException { | ||
|
||
public static final String MESSAGE = "The file %s with '%s' extension is not supported. " + | ||
"Only following file extensions are supported %s."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we decided not to specify which extensions are allowed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it so because the problem with "fogotten extensions" had been solved.
this.parserSupplier = parserSupplier; | ||
} | ||
|
||
public String getSymbols() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did so to make this field name be different from the class name that countains a "extension" word.
|
||
@Component | ||
@Component("search_FileProcessor") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't change public API. Bean name is public API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public class FileProcessor { | ||
|
||
private static final Logger log = LoggerFactory.getLogger(FileProcessor.class); | ||
|
||
protected FileStorageLocator fileStorageLocator; | ||
private final FileParserResolver fileParserResolver; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -82,46 +77,4 @@ public String extractFileContent(FileRef fileRef) throws FileParseException, Uns | |||
} | |||
return stringWriter.toString(); | |||
} | |||
|
|||
protected Parser getParser(FileRef fileRef) throws UnsupportedFileFormatException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert both methods and call fileParserResolver
from them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. But this class is in the package is annotated with an "Internal" annotation.
|
||
import java.util.function.Supplier; | ||
|
||
public enum SupportedFileExtensions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
What is the point of this enum? It still cannot be extended if needed + you've created
FileParserResolver
bean to iterate over available constants which can be easily done in the enum itself (see any enum for entity attribute). -
Remember that the issue that you fix is a bug, not a feature. But all these changes to much for a bug fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The main goal of using enums is much more safe using them as string constansts in the "switch" expressions. If the new item is added to a "enum" and the item is not added to the correspondant "swith" constructions the compile time error will occured. Another point is to incapsulate the string constants and the parsers into the "enum" constants. This makes code more clean and reliable.
The FileParserResolver have been created to icapsulate getting a parser into a "enum" counstant. If we should add any new extention we will have to point the correspondant parser in the same time. And there are no an ability to do forget to do it. Such incupsulating makes string link between the string constant of the extension(such as "txt" or "doc") and the parser. If we do so a proggamer can't add any new extension and not add a parser for it. As a result of adding such "enum" item a programmer doesn't need to add any changes to algoritms. The case of using the new file format will be supported automatically.
- Except the small change in the io.jmix.search.utils.FileProcessor#getParser method signature there are no any other breaking changes in the public API. Additionaly the changing type of the exception of this method was approved by Ivan Gavrilov and this change is already commited to the release 2.3 and to the "master" branches. The other changes are not breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the io.jmix.search.utils.FileProcessor class is in the "internal" marked package. And with "The boy scout rule".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally I found at least two places in code with incorrect behaviour(excepting the main problem).
"rtf" | RTFParser | ||
"odt" | OpenDocumentParser | ||
"ods" | OpenDocumentParser | ||
"doc" | OfficeParser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
this.fileParserResolvers = fileParserResolvers; | ||
} | ||
|
||
public Parser getParser(FileRef fileRef) throws UnsupportedFileTypeException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JavaDoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class is in the internal package. Is it neccessary to add JavaDoc for all its methods?
|
||
public Parser getParser(FileRef fileRef) throws UnsupportedFileTypeException { | ||
if (fileParserResolvers.isEmpty()) { | ||
throw new EmptyFileParserResolversList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Exception classes must be named in form of
FooException
- You don't need a custom exception class until you explicitly handle it. Throw
IllegalStateException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Fixed
* An exception that is thrown when a user added some file of the type that is not supported | ||
* and there are no any known parser for. | ||
*/ | ||
public class UnsupportedFileTypeException extends Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 2.3 you have a different exception class name. You cannot rename public API, so either revert prev name or rename it in 2.3 until we published a next bug fix release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -16,13 +16,11 @@ | |||
|
|||
package io.jmix.search.exception; | |||
|
|||
import org.apache.commons.io.FilenameUtils; | |||
public class EmptyFileParserResolversList extends RuntimeException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Exception class must be named in form
FooException
- You don't need a custom exception class until you explicitly handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exception was deleted. Fixed.
@Component("search_OldOfficeDocumentsParserResolver") | ||
@Order(100) | ||
public class OldMSOfficeDocumentsParserResolver extends AbstractExtensionBasedFileParserResolver { | ||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://youtrack.haulmont.com/articles/PLAT-A-1158/Code-Style#Java
Empty line after class or interface definition
P.S. fix all classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
import java.util.List; | ||
|
||
/** | ||
* Is a part of the extendable engine the gives an ability to implement custom file parser resolvers and to support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JavaDoc is written for API users, not for developers who will implement it. For example:
Interface to be implemented by a custom file parser resolver
Check https://youtrack.haulmont.com/articles/PLAT-A-1158/Code-Style#Javadocs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
public interface FileParserResolver { | ||
|
||
/** | ||
* This method should return the description that describes the constraints or the constraint for the files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JavaDoc is written for API users, not for developers who will implement it. Writing style for a methods: starting with the verb defining the main action.
Check https://youtrack.haulmont.com/articles/PLAT-A-1158/Code-Style#Javadocs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Parser getParser(); | ||
|
||
/** | ||
* This method should implement the logic for checking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JavaDoc is written for API users, not for developers who will implement it. Writing style for a methods: starting with the verb defining the main action.
Check https://youtrack.haulmont.com/articles/PLAT-A-1158/Code-Style#Javadocs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
"Only the following file parsing criteria are supported:\n -%s"; | ||
|
||
/** | ||
* @param fileName - the name of the file which type is not supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
public abstract class AbstractExtensionBasedFileParserResolver implements FileParserResolver { | ||
|
||
/** | ||
* Returns a collection of supported extensions of the supported file type. E.g. ["xlsx", "XLSX", "docx", "DOCX"]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to explicitly specify both lowercase and uppercase extensions now?
Lets require (and mention it in javadoc) lowercase only and convert the actual extension to lowercase within support
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed with Gleb such option but we desided not to implement it within this issue. It will be implemented in a separate issue. #3696
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JavaDoc was corrected
* | ||
* @return collection of supported extensions | ||
*/ | ||
public abstract List<String> getSupportedExtensions(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set
is better for contains
operation than List
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
public interface FileParserResolver { | ||
|
||
/** | ||
* This method should return the description that describes the constraints or the constraint for the files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method name is getCriteriaDescription
but javadoc tells about "constraints".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
* | ||
* @return criteria description | ||
*/ | ||
String getCriteriaDescription(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need this public API method?
Isn't it better to delegate this logic to the final consumer? The only purpose of this is to generate message like 'The file extension should be one of the following: ...'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The consumer doesn't know anything about the file checking criteria that are implemented in the resolvers. The aim was to give to the user ability to get comprehensive information what is going wrong. If we remove this method we just could say that "A resolver(and parser) for the file couldn't be found".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message of the AbstractExtensionBasedFileParserResolver was corrected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method was left without changes as it was discussed.
* A search principle is based on the sequential applying FileParserResolver objects' checks for the given file. | ||
*/ | ||
@Component("search_FileParserResolverManager") | ||
public class FileParserResolverManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class consumes existing FileParserResolvers internally, but doesn't provide outside anything about them.
It provides the final Parsers
.
So maybe this class should be named as FileParserProvider
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, fixed
if (resolver.supports(fileRef)) { | ||
return resolver.getParser(); | ||
} | ||
messages.add(resolver.getCriteriaDescription()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we try to process completely unsupported file it will collect "criteria descriptions" from all existing resolvers.
As a result the final exception message will contain multiple "The file extension should be one of the following: ..." expressions.
That is the point in favor of the FileParserResolver#getCriteriaDescription
method irrelevance.
This loop should collect extensions supported by unsuitable resolvers and provide them to exception if non of existing resolvers is suitable. And exception generates the final message based on provided extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed. Left without changes.
* | ||
* @return an instance of a file parser | ||
*/ | ||
Parser getParser(); | ||
FileParsingBundle getParsingBundle(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe FileParserKit?
|
||
public record FileParsingBundle( | ||
@NotNull Parser parser, | ||
@NotNull Function<StringWriter, BodyContentHandler> bodyContentHandlerGenerator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ContentHandler. BodyContentHandler is a specific implementation.
#3690