[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

DEV Community

Cover image for Hacking Excel Files in Power Automate
david wyatt
david wyatt Subscriber

Posted on

Hacking Excel Files in Power Automate

Ever since 2003 Microsoft had to open up their Office files for other software, and that meant moving to a universal code based on XML. After .xls we have .xlsx, and the x was for XML.

At its heart a .xlsx file is a zip file full of XML files. All you need to do is change the name to .zip and then you can see what's in the file.

.xlsx contents

And that made me think, all the Excel API's are doing is edit the XML in the files, something that in theory I could do with Power Automate.

So I went on a crazy journey to see if its possible, originally I wanted to edit a few cells, but then I had a possible use case come with value, cracking a password protected sheet/workbook. I originally built this functionality in VBA, so it gave me a head start.

Call out, there are probably better ways, like a Office Script, and this doesn't work on password protected files (you have to be able to open), but its more the journey then the goal, right?

There are a few things that we will cover

  1. XML, Excel and Passwords
  2. Updating the XML
  3. The Flow

1. XML, Excel and Passwords

As I said above, Excel files are actually a collection of XML files, so all we need to do is nosey around opening the files to figure out what they do. I found the interesting bits are in the xl folder:

xl folder

Here we have the workbook.xml file, and the folder worksheets, which you guessed it contains all of the worksheets as .xml files. If you open the workbook.xml file you see information like the type, default view, calc status, and most interesting for us the workbookProtection.

workbook.xml

As you can see Microsoft are well behaved and don't store the password in plane text, its SHA-512 encrypted. But the this is where Microsoft being well behave ends, as there is no default protection. What does this mean, well if you look at a unprotected workbook, there isn't a default workbookProtection, all they do is remove it. So you guessed it, for us to unlock the workbook we don't need to crack the password, we just need to delete the workbookProtection tag.

Worksheets are exactly the same, just a sheetProtection instead of workbookProtection tag.

worksheet.xml

So now you now how to crack a Excel file, you can do all of the above manually, save back to zip, rename back to .xlsx and all protection has gone.

2. Editing Excel XML

In this case we are going to edit the XML by removing a tag, but its relatively easy to swap that out for a edit or add. There are 3 different ways to edit which I went into more detail here, but the best way is to convert to a JSON.

Because Power Automate uses JSON as its API language, we need convert the XML to JSON, remove the tag with the built in functions, and then convert back to XML.

json flow

The first expression converts the XML to a JSON, but as the content is a string, not XML, we convert it to XML first.

json(xml(outputs('Get_file_content')?['body']))
Enter fullscreen mode Exit fullscreen mode

The second expression uses the removeProperty() function to remove the workbookProtection tag. Though as its a nested tag (inside the workbook tag) we have to update the workbook property with the new workbook tag with the removed workbookProtection

xml(
    setProperty(
        outputs('Convert_to_JSON')
    ,
        'workbook'
    ,
        removeProperty(
            outputs('Convert_to_JSON')?['workbook']
        ,
            'workbookProtection'
        )
    )
)
Enter fullscreen mode Exit fullscreen mode

Finally we convert it back to XML using the xml() function.

3. The Flow

The complex part here is zipping the file back up again, as there is no action for that. Luckily I already did a blog all about it here https://dev.to/wyattdave/how-to-update-contents-of-a-zip-file-with-power-automate-3b1b, where we leverage an undocumented SharePoint API (the one that creates a zip file when you download multiple files).

So the steps are:

  • Create a new file with the Excel files content but name it a zip
  • Use the Extract action
  • Update the Workbook file
  • Loop over the worksheets and update them
  • Zip the folder

The flow I'm going to create will have 2 child flows, the update file (aka remove protection) and the zip folder.

Remove Protection

The remove protection flow is relatively simple, we pass in the file path, the tag to remove (workbookProtection for workbook, and sheetProtection for worksheet), and the parent node it is in (workbook for you guessed it, and worksheet for....).

We grab the file content, use our expression to convert to JSON, remove/modify, convert to XML.

xml(
    setProperty(
        json(xml(outputs('Get_file_content')?['body']))
    ,
        triggerBody()?['text_1']
    ,
        removeProperty(
            json(xml(outputs('Get_file_content')?['body']))?[triggerBody()?['text_1']]
        ,
            triggerBody()?['text_2']
        )
    )
)
Enter fullscreen mode Exit fullscreen mode

The last thing we do is a try/catch condition, using runAfter failed we can figure out if there was any protection set (as it will fail if it wasn't there), in that case we can pass back the unedited XML.

remove protection flow

Zip Folder

Im not going into much detail here, as its a little complex and I have done a full blog on it. But in a nutshell we pass the paths of the folder to zip, file name, and a unique GUID (we create that in the parent flow and use it to make every unzip unique). We send a request to run the zip process on the server, then build the request to download the zip file through another request. Finally we create the file in SharePoint (with the .xlsx file type). The childflow then sends back to link to the file.

zip folder flow

Full Flow

And that's it, we can now unzip the Excel, edit the XML files inside it, and then re-zip it.

full flow


I know the use cases for this are probably limited, but I think its a cool skill to have, and it was fun to figure out 😎

As always full solution available to download and have a nosey here.

Top comments (3)

Collapse
 
keithbr22872675 profile image
Keith Brown • Edited

I've used a similar methodology in a word docx to inject/update Drop-down controls, since that was apparently an oversight for the "update a word template" action. If you map the drop downs to XML and then just update that XML file (if you can identify the bugger), it works! What a journey that was! Wish I had a blog like this as a reference. Love your material David!

Collapse
 
wyattdave profile image
david wyatt

Sounds like a journey, dealing with Word is such a pain, can't believe Microsoft put so little effort into it

Collapse
 
mark_nanneman_039fdb748ca profile image
Mark Nanneman

Very cool. Thanks David. My mind is already buzzing with things I could have done with this method in the past--and the future.