How to add a new CPD language
Edit me

First of all, thanks for the contribution!

Happily for you, to add CPD support for a new language is now easier than ever!

Pro Tip: If you wish to add a new language, there are more than 50 languages you could easily add with just an Antlr grammar.

All you need to do is follow this few steps:

  1. Create a new module for your language, you can take GO as an example
  2. Create a Tokenizer

    • For Antlr grammars you can take the grammar from here and extend AntlrTokenizer taking Go as an example
       public class GoTokenizer extends AntlrTokenizer {    
           @Override protected AntlrTokenManager getLexerForSource(SourceCode sourceCode) {   
               CharStream charStream = AntlrTokenizer.getCharStreamFromSourceCode(sourceCode);   
               return new AntlrTokenManager(new GolangLexer(charStream), sourceCode.getFileName());   
  3. Create your Language class

     public class GoLanguage extends AbstractLanguage {    
         public GoLanguage() {   
             super("Go", "go", new GoTokenizer(), ".go");   
    Pro Tip: Yes, keep looking at Go!

    You are almost there!

  4. Update the list of supported languages

    • Write the fully-qualified name of your Language class to the file src/main/resources/META-INF/services/net.sourceforge.pmd.cpd.Language

    • Update the test that asserts the list of supported languages by updating the SUPPORTED_LANGUAGES constant in BinaryDistributionIT

  5. Please don’t forget to add some test, you can again.. look at Go implementation ;)

    If you read this far, I’m keen to think you would also love to support some extra CPD configuration (ignore imports or crazy things like that)
    If that’s your case , you came to the right place!

  6. You can add your custom properties using a Token filter