8000 Is it possible to know the rows length without XLSX.read? · Issue #459 · SheetJS/sheetjs · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Is it possible to know the rows length without XLSX.read? #459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ronilitman opened this issue Aug 17, 2016 · 3 comments
Closed

Is it possible to know the rows length without XLSX.read? #459

ronilitman opened this issue Aug 17, 2016 · 3 comments

Comments

@ronilitman
Copy link
ronilitman commented Aug 17, 2016

Hi,

I have to know the rows length without reading the file at all - because reading the file in order to get the rows length takes me a lot of time.. Is there such a possibility?

Edit: I have found out that it takes me A LOT more time if the my excel file has macros in it.

Is there any possibility to send a flag to not pay attention to the macros?

Thanks

@reviewher
Copy link
Contributor
reviewher commented Jan 6, 2017

@ronilitman As I understand it, the worksheet self-reports its range. XLSX stores the cell range in the <dimension> tag: https://github.com/SheetJS/js-xlsx/blob/master/bits/67_wsxml.js#L16

The range may not be correct. Excel will "do the right thing" by ignoring the dimension field, but that requires reading the whole sheet to get the correct range.

Related issues #189 #82

@SheetJSDev is it theoretically possible to scan the entire sheet and get the addresses without having to generate a cell object for every cell?

@VN666
Copy link
VN666 commented Mar 16, 2019

@ronilitman As I understand it, the worksheet self-reports its range. XLSX stores the cell range in the <dimension> tag: https://github.com/SheetJS/js-xlsx/blob/master/bits/67_wsxml.js#L16

The range may not be correct. Excel will "do the right thing" by ignoring the dimension field, but that requires reading the whole sheet to get the correct range.

Related issues #189 #82

@SheetJSDev is it theoretically possible to scan the entire sheet and get the addresses without having to generate a cell object for every cell?

@ronilitman by the way, how to get the progress when reading the file ?

@SheetJSDev
Copy link
Contributor

The technical answer depends on file format:

Some formats like CSV don't report the range anywhere and have variable sized rows, so the only way to know the total number of records is to effectively parse the whole thing.

Other formats like DBF have readily computable record counts based on the size since the header tells you how large each row payload must be.

The interesting formats generally have a way of self-reporting ranges but these are self-reported. A number of third party generators are known to hack around this. Third party hacks have made the data source unreliable, and resolving #1601 will involve changing the behavior anyway.

So the complete and unfortunate answer is "no, it's not possible to correctly determine the number of rows without scanning the entire worksheet".

As @reviewher mentioned, it is possible to just avoid generating cells, but it's unclear if the payoff is worth it (especially if the file will have to be re-parsed to actually extract the data)

@VN666 #632 is tracking "progress" related issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
0