/////

Filter large lists in SharePoint Online

25 mins read

Microsoft.SharePoint.SPQueryThrottledException: The attempted operation is prohibited because it exceeds the list view threshold enforced by the administrator.

Every SharePoint dev faced this issue at least once or multiple times. It’s a well-known ‘issue/limitation’ when working with larger data sets in SharePoint lists. Some customers might be asking for that ‘contains search behaviour’ which is not supported in a SharePoint search query.

In this blog post I’m going to try and find a solution for most filtering issues / scenarios in large SharePoint lists. I’ll compare different methods to check which are limited in which cases. You will also find a few tips & tricks, for whenever you want to filter large lists, to avoid the threshold limitations.

In one of my next blog posts, we’ll also check how SharePoint is handling this limitation in its modern list UI and what will work / what will break whenever we hit the list threshold of 5000 items.

On a side note: this does not mean that my methods / tips & tricks are the best solution for you. These are just a few solutions that I thought of and used in the past to solve different large list query scenarios. If you have any suggestions or want to add some corrections to my blog, please leave them in the comments.

I will discuss 3 possible methods of filtering large lists in SharePoint Online.

Method 1 – Query all items and filter in memory

For both CSOM and REST there are options to query all items. Which is of course always an option if you are sure that there will be enough time to do this.

CSOM

By using the ‘RowLimit’ property in a CAML Query we can limit our results. This option makes use of a ‘paging’ principle which is adjusting the ListItemCollectionPosition to return the next set of 5000 items until we reach the ‘end’ of the list.

When querying a document library, make sure you use the Scope=’RecursiveAll’ attribute in the View element. This way you will query all folders, subfolders and items that are nested in your document library. This attribute has no impact on a normal list without folders.

Always make use of the ViewFields element to limit your return values. This will significantly improve query performance.

var list = ctx.Web.Lists.GetByTitle("listName");

CamlQuery camlQuery = new CamlQuery();
camlQuery.ViewXml = @"<View Scope='RecursiveAll'>
<ViewFields><FieldRef Name='ID'/><FieldRef Name='Title'/></ViewFields> 
<RowLimit>5000</RowLimit>
</View>";

List<ListItem> allListItems = new List<ListItem>();
do
{
    ListItemCollection listItemCollection = list.GetItems(camlQuery);
    ctx.Load(listItemCollection);
    ctx.ExecuteQuery();

    allListItems.AddRange(listItemCollection);

    camlQuery.ListItemCollectionPosition = listItemCollection.ListItemCollectionPosition;

} while (camlQuery.ListItemCollectionPosition != null);

//Filter your data with a standard C# linq query
var filteredItems = allListItems.Where(item => item.FieldValues["Title"].ToString().Contains("Test")).ToList(); 

REST

We can do the same with REST using the $top query parameter in our request URI. We’ll get all items first and then do the filtering in memory.

Once you add $top=5000&paged=true to your REST request. You’ll get the first 5000 items with a ‘link’ element that has an href attribute. This attribute contains the URL to get the next page of results.

There is a very simple way to implement this thanks to the PnPjs library. Click here to see a sample on how you could implement this option with just 1 line of code using the getAll() function.

Make sure you define your $select parameter when doing this… there is a huge performance improvement and is almost necessary for such large queries.

Looking at the browsers network inspector, we’ll see these kind of requests passing by when using the getAll() function of PnPjs.

After completing this getAll() function we can of course just filter in memory, just as we did in C#. No limitations to column types for filtering. But might be a slow solution again if you are talking about a list with for example 50000+ items.

Method 1 – Pros

  • Any filtering/ordering after processing can be easily done in memory, no matter which field types, values, …
  • You should never query the list more than once, because we already have all list data in memory

Method 1Cons

  • Could be too slow for your use case (query took 2 – 5 seconds / 5000 items, depending on the defined view fields)
  • Throttling might even kick in with very large lists. This will force you to build a throttling check, which will result in even more delay for your final query result.

Method 2 – Most used REST & CSOM filter method

I guess the methods below in CSOM and REST API are the most used methods, but this will point out that they are clearly not the ‘best’ methods in all scenarios. Especially not for big lists.

CSOM

We have a lot of different ways of filtering our data with the CAML Query. Using all the operators available in CAML, we can do a lot of stuff… Contains, In, Eq, Geq, Leq, …

Things might get pretty frustrating once we are using these on large lists. Once a list reaches 5000+ items it will always return the following error when doing any CAML query. I’ll just throw in an example here which is not working for me at the moment.

var list = ctx.Web.Lists.GetByTitle("Big data list");

var camlQuery = new CamlQuery();

camlQuery.ViewXml = @"<View>
    <ViewFields><FieldRef Name='ID'/><FieldRef Name='Title'/></ViewFields>
    <Query><Where><Contains><FieldRef Name='Title'/><Value Type='Text'>Testtttt</Value></Contains></Where></Query>
    <RowLimit>5000</RowLimit>
    </View>";

var items = list.GetItems(camlQuery);

ctx.Load(items);
ctx.ExecuteQuery();

Here we run into our favourite exception again… Microsoft.SharePoint.Client.ServerException: ‘The attempted operation is prohibited because it exceeds the list view threshold enforced by the administrator.’

So why does this still happen now, even if we have set the RowLimit to 5000? It did work in method 1 for the CSOM CAML Query.

After applying a filter (Where element) in our CAML query, it will look at this filter first before applying the RowLimit. Which is causing our error here, because our list does contain more than 5000 items. So how could we fix this?

Indexing

We could apply an index to the field(s) which we are trying to filter. In case you don’t know how to do this, check the 2 screenshots below where you can find the indexed fields of a list in the list settings page.

Now you’ve entered the page where all the indexed columns of your list are listed. Notice that there is a maximum of 20 indexes / list.

After applying an index to the ‘Title’ field which I was filtering on in my sample, I did get the result I was looking for. My query ran successfully now.

Note: Only the following column types can be indexed.

  • Single line of text
  • Date
  • Choice (only single value)
  • Lookup (only single value)
  • Taxonomy (only single value)
  • Person / Group (only single value)

Any other column types than the ones mentioned above (most of the time these are columns which can contain a lot of data) cannot be indexed at this time.

We are not finished yet with this method, because there might still raise an error, even when indexing is applied.

If there are still more than 5000 results after applying your first filter in your Where element. It will keep crashing on the ‘list view threshold‘ error. Even if our filtered column is properly indexed. So there is no way to filter if you are not a 100% sure that your first filter is not going to limit your result set to 5000 or less items. We’ll have a solution for this issue in method 3.

REST API

We can do the same as above in REST with or without the use of a CAML query.

Every SharePoint developer probably knows the {sitecollectionURL}/_api/web/lists/GetByTitle(‘listname’)/Items endpoint

We can easily leverage this one through PnPjs also. You can check some samples here on the official documentation, explaining how to filter in the REST API with PnPjs.

import { sp } from "@pnp/sp/presets/all";


export default class TestSPRestCall extends React.Component<ITestSPRestCallProps, ITestSPRestCallState> {
    
    constructor(props) {
        super(props);
        
        //Hook WebPartContext to PnPjs
        sp.setup({
          spfxContext: this.props.context
        });
    }
    
    private async getSPListItems(){
        
        let items = await sp.web.lists.getByTitle("Big data list").items.filter("substringof('TestValue', Title)").get();
        
    }
    
}

As for limitations to large list querying we are somewhat at the same level as in CSOM for this method. But this one has one more limiting factor, namely the fact that it can not filter by CAML query. This means that for example filtering on a managed metadata field etc. is not possible.

Although there is a getItemsByCAMLQuery (‘getItems’ API endpoint) function in PnPjs which also enables the exact same functionality for filtering in CAML as Option 2 in CSOM does. With the exact same limitations of your first filter parameter having to limit the result set to 5000 items or less, else it will throw the same list view threshold exceeded error.

import { sp } from "@pnp/sp/presets/all";


export default class TestSPRestCall extends React.Component<ITestSPRestCallProps, ITestSPRestCallState> {
    
    constructor(props) {
        super(props);
        
        //Hook WebPartContext to PnPjs
        sp.setup({
          spfxContext: this.props.context
        });
    }
    
    private async getSPListItems(){
        
        let items = await sp.web.lists.getByTitle("Big data list").getItemsByCAMLQuery({
          ViewXml: "<View><Query><Where><Contains><FieldRef Name='Title'/><Value Type='Text'>TestValue</Value></Contains></Where></Query></View>"
        });
        
    }
    
}

So indexing is again important if you are using this one in large lists. Check the part above, to get more info on field indexing.

Method 2 – Pros

  • Direct query to the SharePoint list, so minimal data transfer
  • Could return results pretty fast if the result set of the first Where parameter is limited to 5000 results or less

Method 2 – Cons

  • You need to be 100% sure of your first Where parameter in your CAML Query OR first condition in your $filter query parameter to return 5000 results or less, else your solution will break
  • No possibility to index specific types of columns. So these column types cannot be filtered properly.

Method 2 – Tip(s)

  • Most of the time when I’m able to use this method, I’m trying to find a way to limit my items by a date range. You could index a date field (maybe the standard ‘Modified’ or ‘Created’ field) and apply a filter where the date of your items should be in between a certain date range of let’s say the last 4 weeks. If you are sure that this scope will always return 5000 items or less, then you’ll be fine with this method.
  • As mentioned in the scenario above, you are able to use ‘<And><Geq></Geq><Leq></Leq></And>’ in your CAML query or ‘(condition1 and condition2)’ in your $filter query param as long as you target the same indexed field in both conditions. This will count as ‘1’ filter param and does apply both operators correctly in the And element if this first filter does limit your result set to 5000 items or less.

Method 3 – Filter with RenderListDataAsStream

In my opinion this is the most flexible way to query/filter your large SharePoint lists. But how is it different from the previous option that we just covered?

This API endpoint simply removes the requirement of your first filter parameter in your Where element or in your $filter parameter to limit the result set to 5000 items or less. For some reason this API is not limited to the view threshold. This means that after applying my first filter (if indexed) the result might contain more than 5000 results. It doesn’t matter for this endpoint, it will return your results just fine.

Let’s show some samples in SPFx and CSOM. I’ll include paging in the sample, so you know you could also page through the records, just as SharePoint does whenever you are scrolling down in modern list views.

REST

Below a sample in SPFx with the use of PnPjs.

import { sp } from "@pnp/sp/presets/all";


export default class TestSPRestCall extends React.Component<ITestSPRestCallProps, ITestSPRestCallState> {
    
    constructor(props) {
        super(props);
        
        //Hook WebPartContext to PnPjs
        sp.setup({
          spfxContext: this.props.context
        });
    }
    
    private async getSPListItems(){
        
        //Title field is indexed, filter / orderBy fields should still be indexed for this endpoint
        let firstPageResult = await sp.web.lists.getByTitle("Big data list").renderListDataAsStream({
          ViewXml: `<View><Query><OrderBy><FieldRef Name='ID' Ascending='TRUE'/></OrderBy>
          <Where><Contains><FieldRef Name='Title'/><Value Type='Text'>harga</Value></Contains></Where></Query>
          <ViewFields><FieldRef Name='ID'/><FieldRef Name='Title'/></ViewFields>
          <RowLimit Paged='TRUE'>5000</RowLimit></View>`
        });
        
        //With Paged set to true in the RowLimit, we can query the next page
        let nextPageResult = await sp.web.lists.getByTitle("Big data list").renderListDataAsStream({
          ViewXml: `<View><Query><OrderBy><FieldRef Name='ID' Ascending='TRUE'/></OrderBy>
          <Where><Contains><FieldRef Name='Title'/><Value Type='Text'>harga</Value></Contains></Where></Query>
          <ViewFields><FieldRef Name='ID'/><FieldRef Name='Title'/></ViewFields>
          <RowLimit Paged='TRUE'>5000</RowLimit></View>`,
          Paging: firstPageResult.NextHref.substring(1)
        });
        
    }
    
}

CSOM

public void GetSPItems(){
    using (var ctx = new ClientContext("https://tenantname.sharepoint.com/sites/testbench"))
    {
        ctx.Credentials = new SharePointOnlineCredentials("username@tenant.onmicrosoft.com", "pass".ToSecureString());
        var bigDataList = ctx.Web.Lists.GetByTitle("Big data list");
        
        //Return type of the RenderListDataAsStream in CSOM is ClientResult<Stream>
        //So we just get a JSON string back from the server, just as with the REST API
        var firstPageResult = bigDataList.RenderListDataAsStream(new RenderListDataParameters()
        {
            ViewXml = "<View>" +
            "<Query><OrderBy><FieldRef Name='ID' Ascending='TRUE'/></OrderBy>" + 
            "<Where><Contains><FieldRef Name='Title'/><Value Type='Text'>harga</Value></Contains></Where></Query>" +
            "<ViewFields><FieldRef Name='ID'/><FieldRef Name='Title'/></ViewFields>" +
            "<RowLimit Paged='TRUE'>4999</RowLimit>" +
            "</View>"
        }, new RenderListDataOverrideParameters() {});
        ctx.ExecuteQuery();

        var result = "";
        using (var reader = new StreamReader(firstPageResult.Value, Encoding.UTF8))
        {
            result = reader.ReadToEnd();
        }
        
        dynamic dynamicResult = JsonConvert.DeserializeObject(result);
        
        
        var nextPageResult = bigDataList.RenderListDataAsStream(new RenderListDataParameters()
        {
            ViewXml = "<View>" +
            "<Query><OrderBy><FieldRef Name='ID' Ascending='TRUE'/></OrderBy>" + 
            "<Where><Contains><FieldRef Name='Title'/><Value Type='Text'>harga</Value></Contains></Where></Query>" +
            "<ViewFields><FieldRef Name='ID'/><FieldRef Name='Title'/></ViewFields>" +
            "<RowLimit Paged='TRUE'>4999</RowLimit>" +
            "</View>",
            Paging = (dynamicResult["NextHref"].Value as string).Substring(1)
        }, new RenderListDataOverrideParameters() { });
        ctx.ExecuteQuery();
        
        //Do something with the results of both queries
    }
}




public static SecureString ToSecureString(this string plainString){
    if (plainString == null)
        return null;

    SecureString secureString = new SecureString();
    foreach (char c in plainString.ToCharArray())
    {
        secureString.AppendChar(c);
    }
    return secureString;
}

Make sure you are using an OrderBy element whenever you use this RenderListDataAsStream endpoint. As a default OrderBy value, I always use ID in Ascending order. This is also what SharePoint is doing, whenever using the RenderListDataAsStream endpoint if no other Order field is specified by the user. If you want to OrderBy any other column, this is possible, but the column should be indexed before the OrderBy will work on lists with more than 5000 items.

It’s important that we also understand the limitations of this API endpoint, because SharePoint is using this endpoint a lot in it’s own modern UI. Almost all list data shown in SharePoint itself is queried through this endpoint.

There is actually only one thing still missing here and that’s the fact that it still needs indexing to filter / order properly in large lists. So this means that we can not filter / order properly on multi person fields, multi lookup fields, multi taxonomy fields, … .

This might still be a blocker for your scenario. You could be creative with a Note field (multi line text field) and convert those multi user/taxonomy/lookup fields into JSON. After that start using the SharePoint Search API to query and filter these fields also.

But the conclusion is that it requires a pretty big workaround and involves some custom work to get those fields, that can’t be indexed, filtered properly.

Method 3 – Pros

  • The only endpoint that allows a full filtering experience (if the fields being filtered are indexed) no matter the size of the result set.
  • Has the best flexibility available now of filtering with any kind of operator using CAML Query in REST as well as CSOM.
  • Does also offer options to render more detailed information of your field values than other endpoints. Check this blog post to see how it can expose certain field data more easily.

Method 3 – Cons

  • Still doesn’t offer a solution to filter multi value taxonomy / user / lookup fields in large lists, because we are not able to index these fields.
  • Might be more complicated to use in CSOM, because it is returning a raw JSON string as a result.
  • Much less documentation & samples available.

Summary

I think we can conclude that the RenderListDataAsStream (method 3) endpoint is the most flexible one, not only when speaking about filtering capabilities. But also when it comes to returning more details for certain field types without the need of expanding.

You might need to consider this if you really need the flexibility of this endpoint. Because in CSOM it’s not always the most ‘handy’ endpoint to work with as it returns raw JSON. Instead of a typed C# object as you are used to with the normal list GetItems function.

In one of my next blogs I’ll go into detail on the standard modern list UI in SharePoint Online. This blog will go more in depth on 2 specific points:

  • How they are fetching their data using the RenderListDataAsStream endpoint and how you could benefit from it to easily extend the standard list features.
  • The scenarios where your standard filtering in your SharePoint list might break and how we can fix them.

For example creating a ListView Command Set which displays the total amount of list items after a user has applied some filters on a view.

13 Comments

  1. Thanks for the wonderful, useful and very informative article Ward. When can we expect the next installment?

  2. This was very helpful. Thank you!

    I had expected applying $top=100 to a lists/getbytitle REST call would ensure the list threshold limit was never exceeded, but I was still getting the threshold exceeded error. This post explains why.

    To check regarding method 2 for clarity, is it just the _first_ filter parameter (reading left to right in the URL) that must get the result set below the threshold on its own? I would have expected SharePoint to simply pass all the filter parameters to SQL Server in a SQL query and SQL Server will then apply all the WHERE conditions with no regard to textual ordering in the query statement.

    Also, is it essential that the field being filtered is indexed for the pre-threshold limit filtering to work? Of course we would want indexing for performance but must the filter field be indexed for some other special reason to work to get below the list threshold? A user could end up selecting a little used field that we rarely query on (and so isn’t worth indexing) as their only filter criteria and I would want that query to still return filter results successfully even if it runs slowly (due to SQL Server doing a table scan).

    Thanks!
    Tim

    • Hi Tim,

      Thanks for the feedback and the questions.

      For method 2 you actually rely on your first filter (left to right) in your caml query to limit your query under 5000 items. You could combine 2 filters to dive under the 5000 item limit with the AND operator, only if you are able to combine the 2 fields in 1 index on your list.

      That’s why I use Method 3 -> RenderListDataAsStream endpoint the most when working with large lists. This one does not have these limitations of your first filter needing to limit the result set to 5000 items or less. (This is actually the endpoint that SharePoint itself uses to show it’s list data in SharePoint Online)

      Kind Regards,
      Ward

  3. Thank you This was very helpful !!
    Also is there a logic to update data back to Share point using RenderListDataAsStream (method 3).

  4. Thank you very much. This is a very helpful article.. appreciate you taking time to share this with the rest of the developer community

Leave a Reply

Your email address will not be published.

Previous Story

Convert SharePoint files (Word, Excel, PowerPoint, ...) to PDF in .NET using Microsoft Graph

Next Story

Make scanned PDF's searchable in SharePoint & Teams