In some use cases I found the requirement to convert Word files to PDF’s. Think of Word documents that should be archived after a certain period of time. I’m sure you can also think of some use cases where a customer would like to convert it’s Word or Excel file to a PDF automatically. So it’s always handy to have this possibility in custom code, while not moving to a 3th party API which might expose your document data.
Using the content of my post below, you are also able to convert other files into PDF’s. We’ll be using the Microsoft Graph, so we can check up the awesome docs to see which file types are supported for conversion. Check it out right here.
1. Of course we’ll need a Graph authentication to start querying to the Microsoft Graph. You can check my previous post, if you don’t know how to do this.
2. Get the drive id from the document library in SharePoint, where the file we want to convert is located.
//Check my previous post if you don't know how to create a GraphClient, I'm using my own extension method here for app-only auth GraphServiceClient client = GraphManager.CreateGraphServiceClient(); //Getting the drives for the current site (each document library is a drive) ISiteDrivesCollectionPage drives = client.Sites[ctx.Site.Id.ToString()].Drives.Request().GetAsync().GetAwaiter().GetResult(); string absoluteUrlToDocumentLib = new Uri(ctx.Url).GetLeftPart(UriPartial.Authority) + ctx.Web.ServerRelativeUrl + "/" + "DocumentsToConvert"; //Getting the correct drive (drive for the document library) for the documents, for now only 1 static document library is supported in one call Drive driveToGetFilesFrom = drives.Where(x => absoluteUrlToDocumentLib == x.WebUrl).FirstOrDefault();
3. Use the drive id together with the id of the SharePoint item to get to the driveItem in Graph.
Stream pdfBytes = null; List<QueryOption> options = new List<QueryOption> { new QueryOption("format", "pdf"), }; var convertCounter = 0; do { //Notice the do while around this call... in the past (few weeks ago) I was experiencing this API to fail sometimes with timeouts (the same for the Flow connectors, that's why I'm pretty sure they use the same backend API) //This already seems to go a lot better now, but better safe than sorry I guess... try { //Last call to convert the file in the document library to a PDF Stream pdfBytes = client.Drives[driveToGetFilesFrom.Id.ToString()].List.Items[{SharePointItemID}].DriveItem.Content.Request(options).GetAsync().GetAwaiter().GetResult(); } catch (Exception ex) { l.LogWarning("Graph convert to PDF failed: " + ex.ToString()); } convertCounter++; Thread.Sleep(convertCounter * 1000); } while (convertCounter < 10 && (pdfBytes == null || pdfBytes.Length == 0));
4. The last step can be different depending on your scenario. In my scenario I used this inside an Azure function which is uploading the generated PDF back to SharePoint in another document library.
The full code can look like this. I’m using PDFSharp NuGet to handle my PDF streams a bit easier.
using Microsoft.Graph; using Microsoft.SharePoint.Client; using PdfSharp.Pdf; using PdfSharp.Pdf.IO; //Build a clientcontext for SharePoint, I'm using my own extension method here to do app-only auth via a certificate in Azure Keyvault //But you can use any auth method e.g. a service account to make it a bit more easy using (ClientContext ctx = RERManager.GetElevatedContextRER({SharePointSiteUrl})) { ctx.Load(ctx.Site); ctx.ExecuteQuery(); //Check my previous post if you don't know how to create a GraphClient GraphServiceClient client = GraphManager.CreateGraphServiceClient(); //Getting the drives for the current site (each document library is a drive) ISiteDrivesCollectionPage drives = client.Sites[ctx.Site.Id.ToString()].Drives.Request().GetAsync().GetAwaiter().GetResult(); string absoluteUrlToOrderDocuments = new Uri(ctx.Url).GetLeftPart(UriPartial.Authority) + ctx.Web.ServerRelativeUrl + "/" + Lists.OrderDocuments; //Getting the correct drive (drive for the document library) for the documents, for now only 1 static document library is supported in one call Drive driveToGetFilesFrom = drives.Where(x => absoluteUrlToOrderDocuments == x.WebUrl).FirstOrDefault(); Stream pdfBytes = null; List<QueryOption> options = new List<QueryOption> { new QueryOption("format", "pdf"), }; var convertCounter = 0; do { //Notice the do while around this call... in the past (few weeks ago) I was experiencing this API to fail a lot with timeouts (the same for the Flow connectors, that's why I'm pretty sure they use the same backend API) //This already seems to go a lot better now, but better play safe than sorry I guess... try { //Last call to convert the file in the document library to a PDF Stream pdfBytes = client.Drives[driveToGetFilesFrom.Id.ToString()].List.Items[{SharePointItemID}].DriveItem.Content.Request(options).GetAsync().GetAwaiter().GetResult(); } catch (Exception ex) { l.LogWarning("Graph convert to PDF failed: " + ex.ToString()); } convertCounter++; Thread.Sleep(convertCounter * 1000); } while (convertCounter < 10 && (pdfBytes == null || pdfBytes.Length == 0)); //Do whatever you want with the PDF Stream (save to disk, send back to SharePoint, ...) //I'll upload it back to SharePoint in this example, using PDFSharp here to make it more easy to work with the stream var pdfFile = PdfReader.Open(pdfBytes, PdfDocumentOpenMode.Import); //Thanks to the github post of PNP for this function :) https://github.com/SharePoint/PnP/tree/master/Samples/Core.LargeFileUpload var uploadedFile = UploadFileSlicePerSlice(ctx, "GeneratedPDFLibrary", Guid.NewGuid().ToString() + ".pdf", pdfFile); }
In case you don’t know the UploadFileSlicePerSlice function from the PnP Samples yet. I’ll just paste my implementation of the function below so you can use it right away.
public static Microsoft.SharePoint.Client.File UploadFileSlicePerSlice(ClientContext ctx, string libraryName, string fileSaveName, PdfDocument pdf, int fileChunkSizeInMB = 3) { using(var fStream = new System.IO.MemoryStream()) { pdf.Save(fStream); // Each sliced upload requires a unique ID. Guid uploadId = Guid.NewGuid(); // Get the folder to upload into. List docs = ctx.Web.Lists.GetByTitle(libraryName); ctx.Load(docs, l => l.RootFolder); // Get the information about the folder that will hold the file. ctx.Load(docs.RootFolder, f => f.ServerRelativeUrl); ctx.ExecuteQuery(); // File object. Microsoft.SharePoint.Client.File uploadFile; // Calculate block size in bytes. int blockSize = fileChunkSizeInMB * 1024 * 1024; // Get the size of the file. long fileSize = fStream.Length; if (fileSize <= blockSize) { // Use regular approach. FileCreationInformation fileInfo = new FileCreationInformation(); fileInfo.ContentStream = fStream; fileInfo.Url = fileSaveName; fileInfo.Overwrite = true; uploadFile = docs.RootFolder.Files.Add(fileInfo); ctx.Load(uploadFile); ctx.ExecuteQuery(); // Return the file object for the uploaded file. return uploadFile; } else { // Use large file upload approach. ClientResult<long> bytesUploaded = null; try { using (System.IO.BinaryReader br = new System.IO.BinaryReader(fStream)) { byte[] buffer = new byte[blockSize]; Byte[] lastBuffer = null; long fileoffset = 0; long totalBytesRead = 0; int bytesRead; bool first = true; bool last = false; // Read data from file system in blocks. while ((bytesRead = br.Read(buffer, 0, buffer.Length)) > 0) { totalBytesRead = totalBytesRead + bytesRead; // You've reached the end of the file. if (totalBytesRead == fileSize) { last = true; // Copy to a new buffer that has the correct size. lastBuffer = new byte[bytesRead]; Array.Copy(buffer, 0, lastBuffer, 0, bytesRead); } if (first) { using (System.IO.MemoryStream contentStream = new System.IO.MemoryStream()) { // Add an empty file. FileCreationInformation fileInfo = new FileCreationInformation(); fileInfo.ContentStream = contentStream; fileInfo.Url = fileSaveName; fileInfo.Overwrite = true; uploadFile = docs.RootFolder.Files.Add(fileInfo); // Start upload by uploading the first slice. using (System.IO.MemoryStream s = new System.IO.MemoryStream(buffer)) { // Call the start upload method on the first slice. bytesUploaded = uploadFile.StartUpload(uploadId, s); ctx.ExecuteQuery(); // fileoffset is the pointer where the next slice will be added. fileoffset = bytesUploaded.Value; } // You can only start the upload once. first = false; } } else { // Get a reference to your file. uploadFile = ctx.Web.GetFileByServerRelativeUrl(docs.RootFolder.ServerRelativeUrl + System.IO.Path.AltDirectorySeparatorChar + fileSaveName); if (last) { // Is this the last slice of data? using (System.IO.MemoryStream s = new System.IO.MemoryStream(lastBuffer)) { // End sliced upload by calling FinishUpload. uploadFile = uploadFile.FinishUpload(uploadId, fileoffset, s); ctx.ExecuteQuery(); var finalFile = ctx.Web.GetFileByServerRelativeUrl(docs.RootFolder.ServerRelativeUrl + "/" + fileSaveName); ctx.Load(finalFile); ctx.ExecuteQuery(); // Return the file object for the uploaded file. return finalFile; } } else { using (System.IO.MemoryStream s = new System.IO.MemoryStream(buffer)) { // Continue sliced upload. bytesUploaded = uploadFile.ContinueUpload(uploadId, fileoffset, s); ctx.ExecuteQuery(); // Update fileoffset for the next slice. fileoffset = bytesUploaded.Value; } } } } //while ((bytesRead = br.Read(buffer, 0, buffer.Length)) > 0) } } finally { if (fStream != null) { fStream.Dispose(); } } } } return null; }
Important note: The connectors in preview from Power Automate (Flow) to convert a document to PDF will be using the same API in the backend. But we can just get the drive id from the SharePoint document library to do the conversion with Microsoft Graph, without copying the file to OneDrive. See the OneDrive connectors for Power Automate (Flow) below.
Appreciate it for sharing your well put together website.
Many thanks, this site is extremely helpful.
Hi,
This is really helpful information.
Can you pls share the complete code from GITHub link.
That will be more helpful.
Thanks Regards,
TJ