Category: Office 365

I have a Office 365 Business Premium account. I wanted to sync many large folders with OneDrive for Business. However, it was taking very long to upload. While I was searching for improvements, I saw many other people have observed that OneDrive is much slower than Dropbox, Box, AWS, Google Files etc. Apparently this is a ‘known issue’ and Microsoft is addressing it.

One of the things they have done is provide a standalone desktop application called ‘SharePoint Migration Tool (SPMT)’. While the name doesn’t specifically include OneDrive, the tool is intended for use with Sharepoint and OneDrive. However, there are a few pieces of information I learned to use it for OneDrive. That is the purpose of this post.

First, the user who will be using SPMT must have permissions for the OneDrive for Business account you want to upload files to.

The SPMT tool can be used in any of several migration ‘scenarios’, but the one I am focusing on here is the ‘File share to OneDrive and SharePoint’ scenario. I wanted to use SPMT as a potentially faster way to upload files from my desktop computer to OneDrive for Business.

After you download, install and start SPMT the start screen will ask you to sign into to your Office 365 account. This will be user that has permission on OneDrive.

After successfully signing in you will see 3 scenarios to choose from:

The two that I was interested in testing were:

File Share

CSV file for bulk migration

Note / spoiler alert
While SPMT provides much more flexibility in terms of identifying source and target, and capability to batch migration jobs, it was not dramatically faster than just using OneDrive desktop sync tool. It took me a couple of hours to upload only 1000 files total 3 GB in size.

The Microsoft documentation refers to migrations with multiple TB’s of files but my experience suggests that would be impossible if the files were coming from outside of Microsoft cloud. Therefore, it is clear that for large migrations, the SPMT is intended primarily for use within Microsoft cloud.

File Share
The File share option is pretty straightforward. Use SPMT to select the local folder to upload, then manually enter the OneDrive location url to upload them to. However, there are some things to pay attention to here:

When you select a local (aka source) folder only it’s contents will be uploaded, not the entire folder with selected local folder name.

The string you enter for the OneDrive (aka target) location url is not 100% intuitive or completely described in the instructions. See below.

OneDrive (aka target) location url
The exact specifications for this are not given in a way that is intuitively helpful. There are subtle differences in the url between SharePoint and OneDrive. The linked documentation does indicate these but I had to search for additional clarification which I pass on below.

Column B: Optional. For Sharepoint migration only. Not for file migration like we are talking about here.

Column C: Optional. Subfolder of source path.

Column D: Mandatory. OneDrive (aka target) location url. Note do not include anything other than what is above. Column D and E are used to specify folders.

Column D: Mandatory. Standard value “Documents”

Column F: Optional. Subfolders that will be created if they are not already present under your OneDrive root url. Enter as “Folder/subfolder/”. Note the use of fore slash and double quotes if any spaces.

The file has one row per ‘job’. For example, in File Share method above you select a single folder to migrate. That would be one row in CSV (or JSON) file.

Use SPMT to upload the CSV (or JSON) file and it processes all of the migration ‘jobs’.

However, it processes them in parallel which seems counterproductive. I would rather it process them in sequence, one row at a time.

This bulk migration method would be useful for processing many folders.

I did not try JSON method but would expect it would requir same information just in specified JSON foramt.

In both scenarios above when the migration is happening you will see screen similar to below.

Note that SPMT ‘scans’ the files to be uploaded first in any migration scenario to see if they meet specifications. There are strict file name prohibitions but it did rename ‘bad’ files which is nice.

Once it indicates files are being uploaded you should see corresponding folders and files in OneDrive online.

At any time during migration you can click the “View Status” link on the form to open Windows folder where SPMT keeps job status including fails etc in csv files. These are updated like log files so you can monitor status. This was helpful to see what failed and why.

I experienced a few quirky behaviour such as pausing or stopping a migration freezing SPMT app and had to close it using Task Manager. Also one very weird result of closing a job was that a local file was corrupted and I could not move or delete it with error “windows 10 cannot delete file the tag present in the reparse point buffer”. I had to run chkdsk to fix the file.

Overall SPMT does the job advertised. As mentioned it does not dramatically speed uploading files from local computer into OneDrive but it does offer more control and flexibility over OneDrive desktop sync for large uploads with many folders and files.

The Python scripts use Microsoft’s Azure Active Directory Library (ADAL) to authenticate off Azure Active Directory (Azure AD or ADD), and OneDrive API and Python Requests to use the authentication to upload the files to Sharepoint from my web host.

There are many things to consider when working with Microsoft’s APIs to work with its online services such as Office 365.

The first is how to authenticate. Microsoft is trying to move everyone to use Azure AD to do oAuth authentication. Microsoft services still have their own authentication methods but this exercise I used Azure AD.

The second is what API to use. Microsoft has recently released their Graph API that is ‘one endpoint to rule them all’. However Microsoft services still have their own API’s so while Graph API looks tempting for this exercise I used the OneDrive API.

Azure AD Authentication

The authentication will be done in two parts.

Create Azure AD application to do the authentication for the Microsoft service(s) you want to interact with.

Use ADAL to interact with Azure AD to do the oAuth flow.

Setup Azure AD – create application

Microsoft provides free use of Azure AD for light authentication needs. You can register and create account. Once you have your account you need to create a new application.

For my purposes I created an Azure AD native client application. Azure AD also has web application and web APIs but both require user to enter username and password in web browser. The native client application does technically also require user to enter these too but I hacked past this by using ADAL user authentication and hard coding username and password into the Python code. Since these are going onto my web host in protected directory to run as cron jobs they will be safe.

I am not going to go through the detail of creating an Azure AD application there are some good blog posts and Microsoft does good job of describing it. For example take a look at this site which has decent information about creating a new Azure AD application.

The Azure AD applications allow you to choose which Microsoft services it will be used to authenticate. Confusingly these are also called ‘applications’ too. They are represent Microsoft Services such as Office 365 Sharepoint Online, OneNote, Power BI, etc and is the place where you assign the permissions (also called ‘scopes’) that authentication will allows with that Microsoft service.

An Azure AD application might provide authentication for more than one Microsoft Service. But my native client application has only Windows Azure Active Directory permissions (which are there by default) and Office 365 Sharepoint Online permissions set to Read and write user files and Read and write items in all site collections.

After you have created your client application make sure to copy the client_id and resource_uri to use in code below. The client_id is automatically assigned and the resource_uri for a native client app can be any url and is just a unique identifier. I chose the Office 365 login url. The web applications need a real url because that is where the user will be prompted to enter credentials.

Azure Active Directory Library (ADAL)

Microsoft’s Azure Active Directory Library (ADAL) authentication libraries are created for developer’s to use with Azure AD. I used the ADAL Python SDK which was easily installed with pip install adal.

The oAuth authentication flow can seem very complex but you don’t have to worry about that if you use ADAL. ADAL uses your Azure AD application credentials (client_id, resource_uri in case of native client application) to retrieve a token response which is a text string in JSON format.

This JSON string includes the actual access token that is used to authenticate accessing Sharepoint and upload the files. You can use Python to retrieve the access token (it is a Dictionary). Then you simply put the access token into a header that will be used in the Put Request as the method of passing the access token to the OneDrive API.

ADAL also takes care of refreshing tokens which expire. In my case where the scripts are running on the server as cron jobs I want the token to refresh automatically. ADAL gets a refresh token that you can save to get a new access token when previous one expires. I actually write the refresh token to a text file on the server and refresh the access token each time code is run. I could only refresh it if the previous one expires.

OneDrive API

The OneDrive API has different configurations depending on whether you are using it to access a OneDrive Personal, OneDrive Business or Sharepoint Online account.

Be warned that the documentation for OneDrive API can be very dense and there are different ways of presenting required syntax to identify interactions. Of course the representations vary with different SDKs too. Also there are different versions of Microsofts file storage services over the years. So I recommend to focus on the newest OneDrive API and make sure you are looking at documentation relevant to newest version.

The Gotchas

ADAL Default Values

ADAL has default client_id and resource values that it uses for the username authentication. I changed these default values to match my Azure AD application.

This error means the url being used to create the token response was not same as the one that the file was being uploaded to.

EDIT August 21, 2016 Microsoft has updated the ADAL library so that you can specify the client id and the resource value because authentication against different services needs different client id and resource endpoint urls. That means the hack I used below is no longer required. For more details seehttps://github.com/AzureAD/azure-activedirectory-library-for-python

In ADAL’s __init__.py file look for the class _DefaultValues class at bottom of code and replace the default values:

I changed client_id to my application’s client_id

I changed resource from https://management.core.windows.net/ to https://tenant.sharepoint.com/

The acquire_token_with_username_password function sets these to None so they get set to default values. So this could be changed so they accept values from the code.

Sharepoint Site and Folder Paths

The OneDrive API dev documentation https://dev.onedrive.com/getting-started.htm demonstrates the different service urls:

I have Power BI Free account and uploaded a report to Power BI Service. The report has dataset that gets data from a Sharepoint file.

In Power BI Service I went to the dataset “Schedule Refresh”, selected “Connect Directly”, “Enter Credentials” as oAuth, then entered my Office 365 credentials. This setup the connection successfully to the Sharepoint file, and then I could switch the “Keep your data up to date” to “Yes”.

Then I accidentally clicked the “Try Pro for Free” button.

From that point on, every time I selected the Power BI Service dataset or the report, I got a pop up blocking message “To see this report upgrade to power bi pro”.

The only way to make it stop was to switch the “Keep your data up to date” to “No”.

The only Pro feature is hourly updates. The Free Power BI Service version only allows daily updates. I hadn’t selected hourly updates so that wasn’t the problem. Just some weird quirk.

The resolution was to delete the report and dataset that I just scheduled refresh for, and then upload the report again and then redo the schedule refresh as per above (without accidentally clicking on the Try Pro for free button) to make it work again.

Do you want to create a Power BI Report that gets a daily scheduled refresh of data from a Sharepoint csv file?

The first step is to create your Power BI report in Power BI Desktop using the Sharepoint csv file as data source.

In Power BI Desktop use Get Data – File – Sharepoint Folder to connect to your Sharepoint Folder.

The resulting dataset query (Power Query) will look something like mine below. You will replace “mydomain” with your Sharepoint account name or domain.

You will also replace “datafile.csv” with your csv file name. The Power BI connection is to a Sharepoint folder which might have more than one file like I did. If you have only one file in the folder the filter will be redundant but can’t hurt.

After you publish your report to your Power BI Online account you can select your newly uploaded dataset’s “Schedule Refresh” property where you can set up the refresh schedule.

First go to “Gateway connection”.

I selected “Connect Directly” which requires that you also enter Sharepoint credentials in the “Edit credentials” link which pops up a web page that prompts you to login into your Sharepoint account. This gives Power BI Service permission to access your Sharepoint account to refresh file.

If you have an enterprise gateway setup you could try “Enterprise Gateway” and enter the required credentials for that.

If you entered credentials correctly you should now be able to select the “Keep your data up to date” switch to “Yes”.

Then you can select which four 6-hour window you want refresh to run. Power BI Service free accounts can do daily refreshes. Pro accounts can have hourly updates.

You can try refreshing the dataset manually (On demand) or wait for the next scheduled refresh (Scheduled) to happen to see if the data does refresh. You can see refreshes are successful and when they ran by clicking the “refresh history ” link.

I want to create a Power BI Online report with a data source from a file on a remote web server that updates automatically so my Power BI report is always up to date.

Power BI Desktop and Online have lots of data connectors to third party ‘Online Services’ eg Salesforce, Mailchimp, Github, etc, as well as file and database connectors. But none of these help to get the file from my remote server directly.

There is no feature to connect to a file on a remote server. I could put my remote file data into MySQL or Postgres database and Power BI could connect to those but my remote server doesn’t allow external connections to hosted databases. So that is not an option for me.

A Power BI Online report can get data from a Sharepoint site file that will update automatically on schedule.

Since I have an Office 365 E3 account which has Sharepoint site I upload my remote file to the Sharepoint site and create a Power BI Online report linked to that Sharepoint file.

I would still have to figure out how to automate uploading my remote server file to my Office 365 E3 account Sharepoint site. But I am pretty sure I can do that with the OneDrive API but more on that in Part 2.

Here is a diagram outlining what I think my solution could be.

In the meantime to test using Office 365 Sharepoint file as data source for a Power BI Online report, I created a Power BI report in Desktop with file data source from my Office 365 Sharepoint site and Published it to my Power BI Online account.

After publishing the report to my Power BI Online account, I logged into Power BI Online, opened the newly published report and went to data source options and selected ‘Schedule Refresh’ which produced screen below.

I set ‘Keep your data up to date’ to ‘Yes’ and selected ‘Connect Directly’ which gave me error message telling me I had to update credentials.

Not surprisingly the report I published to Power BI Online didn’t ‘remember’ that I had already authorized the Power BI Desktop report to get file from my Office 365 Sharepoint site so I have to do it again in Power BI Online.

So I selected ‘Edit Credentials’ and then selected ‘oAuth’ as type of credentials which popped up Office 365 login screen where I entered my user and password clicked login and was returned back to Power BI online page.

The error message was gone so this must have created oAuth authentication to link the file data source from my Office 365 Sharepoint site into the data source in Power BI Online.

Then I set the automatic refresh to one of the daily 6-hour windows (below i have selected 12 pm to 6 pm) for refresh to run (hourly refresh is a Power BI Pro feature).

The report data source now refreshes from my Sharepoint file on daily automatic schedule so it looks like I am half way to my solution.

I will write another blog post detailing how I will automate moving my data file from my remote server to my Office 365 Sharepoint site. Pretty sure I will be using the OneDrive API https://dev.onedrive.com but there are other options too.

In the meantime talk a look at the previous blog post summarizing the OneDrive and Sharepoint API options.

One challenge I have encountered so far is that the OneDrive Python SDK is made for web apps and I want to setup server app (native client app). More to come.

OneDrive Business and Sharepoint authenticate against oAuth account created in Azure Active Directory and must be done with Office 365 account. Authentication url is: https://login.microsoftonline.com/common/oauth2/

You can create two types of applications that will have different methods and parameters:

Web application – web site based application that user can sign into. Require definition of an active redirect url and definition of client secret. Scopes or permissions are identified on the fly when authentication is made.

Sign-in with user account/password or send authentication url to authentication server to get authentication code.

Server sends back url with authentication code.

Retrieve authentication code from url.

Send another url comprised of code and other parameters back to server to get tokens.

Use tokens to list, view, upload, download files etc.

There are development SDKs available for popular languages.

I was only interested in thePython SDK . Some key notes about it include:

It is created specifically for web applications, not native client applications. The SDK authentication method relies on using a web browser to pass urls with parameters, codes and tokens back and forth. A native client application will not use web browser. A work around was to use head-less browser but that is a bit hacky.

It only has OneDrive Personal authentication urls. To use this with OneDrive Business or Sharepoint these urls are easily replaced with the OneDrive Business authentication urls in two files: auth_provider.py and the onedrivesdk_helper.py.

The change to the unified OneDrive API and oAuth authentication only happened in late 2015 so this is pretty new.

There weren’t many well developed or well documented OneDrive Python example code available.

Note it is still also possible to work with OneDrive Business, Sharepoint and OneDrive Personal without using oAuth authentication and this new OneDrive API by simply using urllib, request and templating along with hard coded Office 365 username and password in your code to authenticate.

Finally Microsoft Graph API can be used to interact with OneDrive Business, Sharepoint and OneDrive Personal once oAuth is setup.

Microsoft has a free Microsoft Data Management Gateway that you can install on your desktop or server to act as a broker to create an OData data source from a local data source. The OData source can be consumed by any application that can connect to OData url endpoint such as Power BI in Office 365 or in Microsoft’s new standalone online version of Power BI, or Excel etc.

For example setup Microsoft Data Management Gateway on your desktop which creates a local Service that is connects a local SQL Server database to a Data Management Gateway Cloud Service in Power BI.

Remote users can then connect to Office 365 or Power BI via internet and connect to the endpoint to use data using Excel Power Query add-on.