Finding the Best Patents – Forward Citation Analysis Still Wins

Source: Richardson Oliver Law Group LLP –

License: Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Originally posted at: IPWatchdog – Part 1: ; Part 2:

By Erik Oliver & Michael Costa & Kent Richardson
March 24, 2016

We would like you to find the best patents in this pile of 50,000 candidates. Oh, and we need it done for $30K.” We hear requests like this so often we’ve built processes and tools to help us address them.  Our team has over 60 years of experience developing, evaluating, monetizing, litigating, and licensing patents; we’d like to share some of our experience and methodology with you.

Let’s return to that pile of 50,000 patents – how can we find the highest quality patents reliably and efficiently?

We’ve identified five primary factors for consideration in patent ranking (in order of weighting):

  1. Forward citations (45%)
  2. Age of patent from priority date (19%)
  3. Independent claim count (adjusted by number of means claims) (14%)
  4. Claim 1 word count (12%)
  5. Family size and international filings (10%)

We were surprised to discover that forward citations dominate the analysis. We evaluated millions of patents – and consistently forward citations were the biggest predictor of a higher value patent. More on this below.

What Ranking Tools Can Do

When we start projects sorting through thousands of patents, we first meet with the client to define success. In other words, what does it mean for the client to be successful in a project like this? Finding the best patents? Eliminating the worst patents?

Ultimately, we want to find patents that meet the clients’ needs. This means quickly eliminating 95% of the patents that are less likely to fit those needs. For this, we built a tool and ranking system that helps us identify the patents that are both most and least likely to be useful. With this smaller pool, we can start human review looking for the patents the client seeks.

ROL Group’s 2016 Patent Ranking System

Where do you look for benchmarks of better patents? We believe that patents that are bought and sold better reflect patents more likely to meet business needs. We started with our database tracking over $7B of patents that companies are trying to sell, or have sold.  We also looked at the characteristics of patents that had been litigated.

Our requirements for a ranking system:

  • Ranking system must be fully transparent –all aspects and formulas available to both ourselves and our clients for review, discussion, and per-project adjustments. Most existing commercial systems hide their ranking systems and are therefore precluded
  • Factors based in data and intuitively explained –each factor we use should be based in data but also intuitively explainable to clients

In developing the new ranking system, we began with our previous heuristics-based system and tested the existing factors and others against our patent deals database as well as data sets of litigated patents. The new ranking factors were determined based on simulations comparing different potential weights.

Forward Citation


We found that forward citations (later patents that cite the subject patent) were the most significant factor in identifying patents that were likely to be purchased. In fact, the patents that were sold—or even highlighted by brokers, e.g. the representative patent—in a brokered patent package exhibited an even more extreme number of forward citations than litigated patents.

Why forward citations? Why not claim length or any number of other factors? We believe that forward citations are a proxy for industry-wide R&D investment in a technology area. With more investment, there are generally more products. With more products, there is a higher chance of infringement. Infringement drives value and most likely meets a client’s needs. Specifically, a purchase either eliminates the client’s own infringement or provides a tool to use against someone else).

Our analysis focused on looking at forward citation counts for four primary sets of patents: (i) a set of all issued patents from 2005-2014, (ii) a set of litigated patents from the same period, (iii) a set of patents from the brokered market that were sold from 2009-2014, and (iv) the representative patents from brokered patent packages. The results were striking; the sold (set iii) and representative (set iv; e.g. the patent highlighted by brokers in packages) patents had exponentially more forward citations than the broad set of issued patents (set i).

Because there was evidence of significantly higher forward citations in the set of litigated patents (set ii) compared to the broad set of issued patents (set i), we decided to use the forward citations counts deltas between litigated patents and issued patents to set our ranking metric.

Turning to the chart, the light green line shows the forward citation count by years from publication date for litigated patents (set ii). The dark green line shows the forward citation count by years from publication date for the broader set of issued US patents (set i). As is evident in the first three years, there is minimal difference between the two data sets, but then a clear gap emerges.

For patents more than three years from the publication date, we identified four regions for ranking adjustments:

  • Region A: The patent being ranked massively exceeds the number of expected citations for a litigated patent (rank = 1)
  • Region B: The patent being ranked has more citations than expected for a litigated patent, it is defined to be the same size as region C
  • Region C: The patent being ranked has more citations than expected for a typical patent, but not more than a litigated patent
  • Region D: The patent being ranked has fewer citations than expected for a typical patent


Age of Patent from Priority Date

We know that our clients are generally looking to purchase patents that are actively adopted and in use in industry, but also are looking for sufficient remaining life to get the benefit of their purchase. For example, if a client is buying for a potential dispute that has not yet materialized, at least five years of remaining life is generally desirable.

From our time at Rambus, we know that patents in the range of 8-12 years from priority had the highest probability of being valuable in licensing. There is, additionally, a wealth of academic research on the timing of litigation vs. remaining life of patents. See, e.g. Brian Love, “An Empirical Study of Patent Litigation Timing” Univ. of Penn. Law Review, Vol 161, p 1309 (2013). As well as work by Mark Lemley together with John Allison and David Schwartz, “Understanding the Realities of Modern Patent Litigation”, 92 Texas L. Rev. 1769 (2014). Additionally, as seen in our prior article on Intellectual Venture’s (IV’s) patent portfolio and our forthcoming article (IAM Magazine Issue 77), IV’s purchase windows overlap heavily with the ranges we model as well.

We used the information from those papers as well as our experience to model this factor:


High Value Patents: Does family size matter when looking for better patents? (Part 2/2)

By Erik Oliver & Michael Costa & Kent Richardson
March 27, 2016

We evaluated millions of patents – and consistently forward citations were the biggest predictor of high value patents. In our last article we discussed why forward citations are relevant, and the importance of remaining patent term. Now we’d like to consider the remaining three factors we use to rank patents, and why they may be of use in helping to eliminate less useful patents quickly and efficiently.

Independent Claim Count (Adjusted by Means Claims)

We hypothesized that paying for additional claims (three are included in the basic filing fee) would be highly correlated with value. Our analysis focused on looking at claim counts for four primary sets of patents: (i) a set of all issued patents from 2005-2014, (ii) a set of litigated patents from the same period, (iii) a set of patents from the brokered market that were sold from 2009-2014, and (iv) the representative patents from brokered patent packages.

As predicted, having more than three claims was highly correlated to the probability of the patent being litigated, sold, or being listed as the representative patent for a sales package, e.g. the most important patent in the package.

We decided to model this ranking factor again by comparison between the prevalence of the claim count in the litigated patent (set ii) and the larger set of US issued patents (set i):



However, we know that the number of independent claims alone is insufficient consideration if, for example, all of the independent claims are formed as means-plus-function claims (35 USC §112(f)). At least in the United States, given the present case law, such claims generally have less value for our clients.

We analyzed the prevalence of means claims in our data sets (sets i-iv discussed above) and then developed a number of claims rank adjustment factor based on the number of means claims. By analyzing the different data sets, we arrived at an adjustment factor that a means claim generally has the value of 1/10th of a non-means claim. We did, however, provide an exception that if there were at least 5 independent non-means claims; no adjustment was done to the claims rank.

We then back-tested this ranking by looking at approximately 5000 randomly selected patents with issue dates from 2005-2014 and looked at the distribution of the new ranking factor. Notably, this ranking factor will only-lower the rank of ~12-13% of patents.

Claim 1 Word Count

Historically, our ranking heuristic viewed claim 1 word count as one of the more significant ranking factors and in put a heavy emphasis on shorter claims. However, when we analyzed the multiple data sets (sets i-iv discussed above) there was no significant variation between any of the sets that are proxies for higher value (litigated, sold, representative patent) and the baseline set of all patents.

Instead, we realize now that claim 1 word count is better viewed as a component to remove from consideration applications with extreme word counts. We used the data from litigated patents (set ii) as a guide in removing extreme claim 1 word counts from consideration.

Thus, as you can see the new ranking factor heavily down ranks patents with a word count for claim 1 less than 25 words or more than about 250 words. We identified a range from 63-163 words as being a sweet spot for the length of litigated claim 1 word counts. (Note, in a future version of the ranking system we might evaluate the shortest independent claims.)

Family Size and International Filings

Does family size matter when looking for the better patents? Intuitively, family size and diversity of international filings should be good indicators of value. We hypothesized that like independent claim count, the investment to produce a larger patent family and file international patents would correspond to greater value. However, we found the impact was less significant than even the word count of claim 1 – only a 10% contribution to the overall weighting.

Our new ranking system provides a maximum of 10 points for family size and international filing size:

  • Up to 5 points for family size scaled linearly based on family size ranging from 0 to 12 (family with over 12 INPADOC publications is treated as 12 publications)
  • Multiply the family size rank by:
    • 2 if there is an issued EP, JP, CN patent
    • 5 if there is a published EP, JP, CN patent
    • 25 if there is a PCT publication and it is <2.75 years from priority
    • 25 if <1.75 years from priority (adjust for risk of no data)
    • 1 otherwise


Let’s begin by making it clear that these metrics needed to be combined based on weighting factors to create a balanced total score. While doing this, there were two major considerations. A properly weighted system should create a large ranking spread between interesting and uninteresting patents, but it should also use a mix of the metrics in order to give a more rounded perspective.

We limited the weighting factor for each metric to 10-to-60%. We then repeatedly ranked sets of random patents and known valuable sets with more than 400 different weighting factor possibilities. By comparing the possibilities that had the largest spread between the median patent ranks of each set we were able to see trends. We averaged the top 10 weighting factor possibilities to get our baseline factors, and then adjusted these slightly upon a manual review.

We then tested the system against smaller sets of patents, which we had previously reviewed. The automated ranking system was able to consistently rank the focus patents of each set highly. This confirmed that the automated ranks would allow us to quickly identify the patents that are most likely to be useful and also eliminate a number of less interesting patents quickly as well.

We set out to use the USPTO data on issued US patents (formerly hosted on Google Books but now directly hosted by the USPTO at to refine our ranking system to provide a fully transparent, data-based ranking that can intuitively be explained to clients.

We successfully built a parser for the USPTO XML data set, using it to analyze the characteristics of US patents (issuing from 2005-2014) and compare different subsets of that data. This included leveraging our unique database of over $7B worth of brokered patents, allowing us to quickly highlight those of most interest to our buying clients.


Appendix –Formulas

The following table summarizes our ranking factors with Excel-like formulas (click to enlarge):

Ranking Factor Testing What Proposed Ranking Formula for 0-1 Ranking Weighting Factor
WL_Rank Tests length of claim 1. Use scaled litigated curve with plateau from 63 to 163 words as model

Scale range values 0 to max è 0 to 1

Any negative values è 0

=IF(OR(-3.45316863451576E-12*WC^5+2.48654446634233E-09*WC^4-2.1228999102672E-07*WC^3-0.000178490429602357*WC^2+0.0357547835160767*WC-0.517153098240328<0, WC>400),0,IF(-3.45316863451576E-12*WC^5+2.48654446634233E-09*WC^4-2.1228999102672E-07*WC^3-0.000178490429602357*WC^2+0.0357547835160767*WC-0.517153098240328>1,1,ROUND(-3.45316863451576E-12*WC^5+2.48654446634233E-09*WC^4-2.1228999102672E-07*WC^3-0.000178490429602357*WC^2+0.0357547835160767*WC-0.517153098240328,4))) 12
FR_Rank Citing patents (forward references) adjusted by age. If < 3 year past issue: If Citing = 0, .5

If FR >0, = 1

Else: Compare deviation off best fit curve of median number of citations per year since issued for litigated patents to deviation difference between litigation curve and all patents curve

=IF(((today()-PUB)/365)>=16,IF(FR>=36.7454545,1,IF(FR<=10.8606061,0,((FR-23.8030303)/(23.8030303-10.8606061)+1)/2)),IF(((today()-PUB)/365)<3,IF(FR>0,1,0.5),IF(((FR-(0.102272727272727*((today()-PUB)/365)^2-0.144696969696966*((today()-PUB)/365)-0.0636363636363875))/((0.102272727272727*((today()-PUB)/365)^2-0.144696969696966*((today()-PUB)/365)-0.0636363636363875)-(0.053030303030303*((today()-PUB)/365)^2-0.174242424242424*((today()-PUB)/365)+0.0727272727272652))+1)/2>1,1,IF(((FR-(0.102272727272727*((today()-PUB)/365)^2-0.144696969696966*((today()-PUB)/365)-0.0636363636363875))/((0.102272727272727*((today()-PUB)/365)^2-0.144696969696966*((today()-PUB)/365)-0.0636363636363875)-(0.053030303030303*((today()-PUB)/365)^2-0.174242424242424*((today()-PUB)/365)+0.0727272727272652))+1)/2<0,0,((FR-(0.102272727272727*((today()-PUB)/365)^2-0.144696969696966*((today()-PUB)/365)-0.0636363636363875))/((0.102272727272727*((today()-PUB)/365)^2-0.144696969696966*((today()-PUB)/365)-0.0636363636363875)-(0.053030303030303*((today()-PUB)/365)^2-0.174242424242424*((today()-PUB)/365)+0.0727272727272652))+1)/2)))) 45
NC_Rank Number of independent claims. Claims = 1:       0

Claims = 2:       .179

Claims = 3:       .321

Claims = 4:       .639

Claims >=5:     1

=IF(NC<=1,0,IF(NC=2,0.179,IF(NC=3,0.321,IF(NC=4,0.639,1)))) 14*ML_Rank
ML_Rank Test for means in independent claims. Max 15 points for no means claims. If there are 5 or more independent non-means claims: 1

Else: scale independent claims count Rank so means claims only count for 10% of a claim

=IF((NC-IF(MC=””,0,LEN(MC)-LEN(SUBSTITUTE(MC,”,”,””))+1))>=5,1,(L2-IF(MC=””,0,LEN(MC)-LEN(SUBSTITUTE(MC,”,”,””))+1))/NC) N/A: Scale NC_Rank
YP_Rank Age of patent from priority date 0 =< Age < 4: 0

4 <= Age < 8: Linear scale from 0 to 1

8 <= Age <= 12: 1

12 < Age <= 17: Linear scale from 1 to .6

17 < Age <= 19: Linear scale from .6 to 0

19 < Age: 0

=IF((today()-Pri)/365<4,0,IF(AND((today()-Pri)/365>=4,(today()-Pri)/365<8),((today()-Pri)/365-4)/4,IF(AND((today()-Pri)/365>=8,(today()-Pri)/365<=12),1,IF(AND((today()-Pri)/365>12,(today()-Pri)/365<=17),1-((today()-Pri)/365-12)*0.4/5,IF(AND((today()-Pri)/365>17,(today()-Pri)/365<=19),0.6-((today()-Pri)/365-17)*0.6/2,0))))) 19
FS&FF_Rank Family size and foreign filling If FS <= 12: Scale linearly from 0.0 to 0.5

Else: .5

Multiply FS Rank by:


Else if (EPA|CNA|JPA): 1.5

Else If YP < 2.75 & WOA exists: 1.25

Else If YP < 1.75: 1.25

Else: 1

INPADOC_temp = INPADOC with all numerals deleted