CONTRIBUTING.md (linguist-7.23.0) | : | CONTRIBUTING.md (linguist-7.24.0) | ||
---|---|---|---|---|
skipping to change at line 25 | skipping to change at line 25 | |||
Linguist is a Ruby library so you will need a recent version of Ruby installed. | Linguist is a Ruby library so you will need a recent version of Ruby installed. | |||
There are known problems with the macOS/XCode supplied version of Ruby that caus es problems installing some of the dependencies. | There are known problems with the macOS/XCode supplied version of Ruby that caus es problems installing some of the dependencies. | |||
Accordingly, we highly recommend you install a version of Ruby using Homebrew, ` rbenv`, `rvm`, `ruby-build`, `asdf` or other packaging system, before attempting to install Linguist and the dependencies. | Accordingly, we highly recommend you install a version of Ruby using Homebrew, ` rbenv`, `rvm`, `ruby-build`, `asdf` or other packaging system, before attempting to install Linguist and the dependencies. | |||
Linguist uses the [`charlock_holmes`](https://github.com/brianmario/charlock_hol mes) character encoding detection library which in turn uses [ICU](http://site.i cu-project.org/), and the libgit2 bindings for Ruby provided by [`rugged`](https ://github.com/libgit2/rugged). | Linguist uses the [`charlock_holmes`](https://github.com/brianmario/charlock_hol mes) character encoding detection library which in turn uses [ICU](http://site.i cu-project.org/), and the libgit2 bindings for Ruby provided by [`rugged`](https ://github.com/libgit2/rugged). | |||
[Bundler](https://bundler.io/) v1.10.0 or newer is required for installing the R uby gem dependencies. | [Bundler](https://bundler.io/) v1.10.0 or newer is required for installing the R uby gem dependencies. | |||
[Docker](https://www.docker.com/) is also required when adding or updating gramm ars. | [Docker](https://www.docker.com/) is also required when adding or updating gramm ars. | |||
These components have their own dependencies - `icu4c`, and `cmake` and `pkg-con fig` respectively - which you may need to install before you can install Linguis t. | These components have their own dependencies - `icu4c`, and `cmake` and `pkg-con fig` respectively - which you may need to install before you can install Linguis t. | |||
For example, on macOS with [Homebrew](http://brew.sh/): | On macOS with [Homebrew](http://brew.sh/) the instructions below under Getting s | |||
```bash | tarted will install these dependencies for you. | |||
brew install cmake pkg-config icu4c | ||||
brew install --cask docker | ||||
``` | ||||
On Ubuntu: | On Ubuntu: | |||
```bash | ```bash | |||
apt-get install cmake pkg-config libicu-dev docker.io ruby ruby-dev zlib1g-dev b uild-essential libssl-dev | apt-get install cmake pkg-config libicu-dev docker.io ruby ruby-dev zlib1g-dev b uild-essential libssl-dev | |||
``` | ``` | |||
The latest version of Bundler can be installed with `gem install bundler`. | The latest version of Bundler can be installed with `gem install bundler`. | |||
## Getting started | ## Getting started | |||
Before you can start contributing to Linguist, you'll need to set up your enviro nment first. | Before you can start contributing to Linguist, you'll need to set up your enviro nment first. | |||
Clone the repo and run `script/bootstrap` to install its dependencies. | Clone the repo and run `script/bootstrap` to install its dependencies. | |||
```bash | ```bash | |||
git clone https://github.com/github/linguist.git | git clone https://github.com/github/linguist.git | |||
cd linguist/ | cd linguist/ | |||
script/bootstrap | script/bootstrap | |||
``` | ``` | |||
To run Linguist from the cloned repository, you will need to generate the code s | ||||
amples first: | ||||
```bash | ||||
bundle exec rake samples | ||||
``` | ||||
Run this command each time a [sample][samples] has been modified. | ||||
To run Linguist from the cloned repository: | To run Linguist from the cloned repository: | |||
```bash | ```bash | |||
bundle exec bin/github-linguist --breakdown | bundle exec bin/github-linguist --breakdown | |||
``` | ``` | |||
## Adding an extension to a language | ## Adding an extension to a language | |||
We try only to add new extensions once they have some usage on GitHub. | We try only to add new extensions once they have some usage on GitHub. | |||
In most cases we prefer that each new file extension be in use in at least 200 u nique `:user/:repo` repositories before supporting them in Linguist | In most cases we prefer that each new file extension be in use in at least 200 u nique `:user/:repo` repositories before supporting them in Linguist | |||
(but see #5756 for a temporary change in the criteria). | (but see [#5756][] for a temporary change in the criteria). | |||
To add support for a new extension: | To add support for a new extension: | |||
1. Add your extension to the language entry in [`languages.yml`][languages]. | 1. Add your extension to the language entry in [`languages.yml`][languages]. | |||
Keep the extensions in alphabetical order, sorted case-sensitively (uppercase before lowercase). | Keep the extensions in alphabetical order, sorted case-sensitively (uppercase before lowercase). | |||
The exception is the primary extension: it should always be first. | The exception is the primary extension: it should always be first. | |||
1. Add at least one sample for your extension to the [samples directory][samples ] in the correct subdirectory. | 1. Add at least one sample for your extension to the [samples directory][samples ] in the correct subdirectory. | |||
We prefer examples of real-world code showing common usage. | We prefer examples of real-world code showing common usage. | |||
The more representative of the structure of the language, the better. | The more representative of the structure of the language, the better. | |||
1. Open a pull request, linking to a [GitHub search result][search-example] show ing in-the-wild usage. | 1. Open a pull request, linking to a [GitHub search result][search-example] show ing in-the-wild usage. | |||
If you are adding a sample, please state clearly the license covering the cod e. | If you are adding a sample, please state clearly the license covering the cod e. | |||
If possible, link to the original source of the sample. | If possible, link to the original source of the sample. | |||
If you wrote the sample specifically for the PR and are happy for it to be in cluded under the MIT license that covers Linguist, you can state this instead. | ||||
Additionally, if this extension is already listed in [`languages.yml`][languages ] and associated with another language, then sometimes a few more steps will nee d to be taken: | Additionally, if this extension is already listed in [`languages.yml`][languages ] and associated with another language, then sometimes a few more steps will nee d to be taken: | |||
1. Make sure that example `.yourextension` files are present in the [samples dir ectory][samples] for each language that uses `.yourextension`. | 1. Make sure that example `.yourextension` files are present in the [samples dir ectory][samples] for each language that uses `.yourextension`. | |||
1. Test the performance of the Bayesian classifier with a relatively large numbe r (1000s) of sample `.yourextension` files (ping **@lildude** to help with this) . | 1. Test the performance of the Bayesian classifier with a relatively large numbe r (1000s) of sample `.yourextension` files (ping **@lildude** to help with this) . | |||
This ensures we're not misclassifying files. | This ensures we're not misclassifying files. | |||
1. If the Bayesian classifier does a bad job with the sample `.yourextension` fi les then a [heuristic][] may need to be written to help. | 1. If the Bayesian classifier does a bad job with the sample `.yourextension` fi les then a [heuristic][] may need to be written to help. | |||
See [My Linguist PR has been merged but GitHub doesn't reflect my changes][merge | ||||
d-pr] for details on when your changes will appear on GitHub after your PR has b | ||||
een merged. | ||||
## Adding a language | ## Adding a language | |||
We try only to add languages once they have some usage on GitHub. | We try only to add languages once they have some usage on GitHub. | |||
In most cases we prefer that each new file extension be in use in at least 200 u nique `:user/:repo` repositories before supporting them in Linguist | In most cases we prefer that each new file extension be in use in at least 200 u nique `:user/:repo` repositories before supporting them in Linguist | |||
(but see #5756 for a temporary change in the criteria). | (but see [#5756][] for a temporary change in the criteria). | |||
To add support for a new language: | To add support for a new language: | |||
1. Add an entry for your language to [`languages.yml`][languages]. | 1. Add an entry for your language to [`languages.yml`][languages]. | |||
Omit the `language_id` field for now. | Omit the `language_id` field for now. | |||
1. Add a syntax-highlighting grammar for your language using: | 1. Add a syntax-highlighting grammar for your language using: | |||
```bash | ```bash | |||
script/add-grammar https://github.com/JaneSmith/MyGrammar | script/add-grammar https://github.com/JaneSmith/MyGrammar | |||
``` | ``` | |||
This command will analyze the grammar and, if no problems are found, add it t o the repository. | This command will analyze the grammar and, if no problems are found, add it t o the repository. | |||
If problems are found, please report them to the grammar maintainer as you wi ll otherwise be unable to add it. | If problems are found, please report them to the grammar maintainer as you wi ll otherwise be unable to add it. | |||
**Please only add grammars that have [one of these licenses][licenses].** | **Please only add grammars that have [one of these licenses][licenses].** | |||
1. Add samples for your language to the [samples directory][samples] in the corr ect subdirectory. | 1. Add samples for your language to the [samples directory][samples] in the corr ect subdirectory. | |||
1. Generate a unique ID for your language by running `script/update-ids`. | 1. Generate a unique ID for your language by running `script/update-ids`. | |||
1. Open a pull request, linking to [GitHub search results][search-example] showi ng in-the-wild usage. | 1. Open a pull request, linking to [GitHub search results][search-example] showi ng in-the-wild usage. | |||
Please state clearly the license covering the code in the samples. | Please state clearly the license covering the code in the samples. | |||
Link directly to the original source if possible. | Link directly to the original source if possible. | |||
If you wrote the sample specifically for the PR and are happy for it to be in cluded under the MIT license that covers Linguist, you can state this instead. | ||||
In addition, if your new language defines an extension that's already listed in [`languages.yml`][languages] (such as `.foo`) then sometimes a few more steps wi ll need to be taken: | In addition, if your new language defines an extension that's already listed in [`languages.yml`][languages] (such as `.foo`) then sometimes a few more steps wi ll need to be taken: | |||
1. Make sure that example `.foo` files are present in the [samples directory][sa mples] for each language that uses `.foo`. | 1. Make sure that example `.foo` files are present in the [samples directory][sa mples] for each language that uses `.foo`. | |||
1. Test the performance of the Bayesian classifier with a relatively large numbe r (1000s) of sample `.foo` files (ping **@lildude** to help with this). | 1. Test the performance of the Bayesian classifier with a relatively large numbe r (1000s) of sample `.foo` files (ping **@lildude** to help with this). | |||
This ensures we're not misclassifying files. | This ensures we're not misclassifying files. | |||
1. If the Bayesian classifier does a bad job with the sample `.foo` files, then a [heuristic][] may need to be written to help. | 1. If the Bayesian classifier does a bad job with the sample `.foo` files, then a [heuristic][] may need to be written to help. | |||
Remember, the goal here is to try and avoid false positives! | Remember, the goal here is to try and avoid false positives! | |||
Note: New languages will not appear in GitHub's search results for some time aft | See [My Linguist PR has been merged but GitHub doesn't reflect my changes][merge | |||
er the pull request has been merged and the new Linguist release deployed to Git | d-pr] for details on when your changes will appear on GitHub after your PR has b | |||
Hub.com. | een merged. | |||
This is because GitHub's search uses [go-enry](https://github.com/go-enry/go-enr | ||||
y) for language detection but tends to lag behind Linguist by a few weeks to mon | ||||
ths. | ||||
This in turn requires an update to the underlying search code once go-enry is in | ||||
line with Linguist. | ||||
## Fixing a misclassified language | ## Fixing a misclassified language | |||
Most languages are detected by their file extension defined in [`languages.yml`] [languages]. | Most languages are detected by their file extension defined in [`languages.yml`] [languages]. | |||
For disambiguating between files with common extensions, Linguist applies some [ heuristics](/lib/linguist/heuristics.rb) and a [statistical classifier](lib/ling uist/classifier.rb). | For disambiguating between files with common extensions, Linguist applies some [ heuristics](/lib/linguist/heuristics.rb) and a [statistical classifier](lib/ling uist/classifier.rb). | |||
This process can help differentiate between, for example, `.h` files which could be either C, C++, or Obj-C. | This process can help differentiate between, for example, `.h` files which could be either C, C++, or Obj-C. | |||
Misclassifications can often be solved by either adding a new filename or extens ion for the language or adding more [samples][] to make the classifier smarter. | Misclassifications can often be solved by either adding a new filename or extens ion for the language or adding more [samples][] to make the classifier smarter. | |||
## Fixing syntax highlighting | ## Fixing syntax highlighting | |||
skipping to change at line 152 | skipping to change at line 142 | |||
If you can, try to reproduce the highlighting problem in the text editor that th e grammar is designed for (TextMate, Sublime Text, or Atom) and include that inf ormation in your bug report. | If you can, try to reproduce the highlighting problem in the text editor that th e grammar is designed for (TextMate, Sublime Text, or Atom) and include that inf ormation in your bug report. | |||
You can also try to fix the bug yourself and submit a pull-request. | You can also try to fix the bug yourself and submit a pull-request. | |||
[TextMate's documentation](https://manual.macromates.com/en/language_grammars) o ffers a good introduction on how to work with TextMate-compatible grammars. | [TextMate's documentation](https://manual.macromates.com/en/language_grammars) o ffers a good introduction on how to work with TextMate-compatible grammars. | |||
Note that Linguist uses [PCRE](https://www.pcre.org/) regular expressions, while TextMate uses [Oniguruma](https://github.com/kkos/oniguruma). | Note that Linguist uses [PCRE](https://www.pcre.org/) regular expressions, while TextMate uses [Oniguruma](https://github.com/kkos/oniguruma). | |||
Although they are mostly compatible there might be some differences in syntax an d semantics between the two. | Although they are mostly compatible there might be some differences in syntax an d semantics between the two. | |||
Linguist's grammar compiler will highlight any problems when the grammar is upda ted. | Linguist's grammar compiler will highlight any problems when the grammar is upda ted. | |||
Once the bug has been fixed upstream, we'll pick it up for GitHub in the next re lease of Linguist. | Once the bug has been fixed upstream, we'll pick it up for GitHub in the next re lease of Linguist. | |||
See [My Linguist PR has been merged but GitHub doesn't reflect my changes][merge | ||||
d-pr] for details on when the upstream changes will appear on GitHub. | ||||
## Changing the source of a syntax highlighting grammar | ## Changing the source of a syntax highlighting grammar | |||
We'd like to ensure Linguist and GitHub.com are using the latest and greatest gr ammars that are consistent with the current usage but understand that sometimes a grammar can lag behind the evolution of a language or even stop being develope d. | We'd like to ensure Linguist and GitHub.com are using the latest and greatest gr ammars that are consistent with the current usage but understand that sometimes a grammar can lag behind the evolution of a language or even stop being develope d. | |||
This often results in someone grasping the opportunity to create a newer and bet ter and more actively maintained grammar, and we'd love to use it and pass on it s functionality to our users. | This often results in someone grasping the opportunity to create a newer and bet ter and more actively maintained grammar, and we'd love to use it and pass on it s functionality to our users. | |||
Switching the source of a grammar is really easy: | Switching the source of a grammar is really easy: | |||
```bash | ```bash | |||
script/add-grammar --replace MyGrammar https://github.com/PeterPan/MyGrammar | script/add-grammar --replace MyGrammar https://github.com/PeterPan/MyGrammar | |||
``` | ``` | |||
This command will analyze the grammar and, if no problems are found, add it to t he repository. | This command will analyze the grammar and, if no problems are found, add it to t he repository. | |||
If problems are found, please report these problems to the grammar maintainer as you will not be able to add the grammar if problems are found. | If problems are found, please report these problems to the grammar maintainer as you will not be able to add the grammar if problems are found. | |||
**Please only add grammars that have [one of these licenses][licenses].** | **Please only add grammars that have [one of these licenses][licenses].** | |||
Please then open a pull request for the updated grammar. | Please then open a pull request for the updated grammar. | |||
See [My Linguist PR has been merged but GitHub doesn't reflect my changes][merge | ||||
d-pr] for details on when your changes will appear on GitHub after your PR has b | ||||
een merged. | ||||
## Changing the color associated with a language | ## Changing the color associated with a language | |||
Many of the colors associated with the languages within Linguist have been in pl ace for a very long time. | Many of the colors associated with the languages within Linguist have been in pl ace for a very long time. | |||
The colors were often chosen based on the colors used by the language at the tim e and since then users will have become familiar with those colors as they appea r on GitHub.com. | The colors were often chosen based on the colors used by the language at the tim e and since then users will have become familiar with those colors as they appea r on GitHub.com. | |||
If you would like to change the color of a language, we ask that you propose you r suggested color change to the wider community for your language to gain consen sus before submitting a pull request. | If you would like to change the color of a language, we ask that you propose you r suggested color change to the wider community for your language to gain consen sus before submitting a pull request. | |||
Please do this in a community forum or repository used and known by the wider co mmunity of that language, not the Linguist repository. | Please do this in a community forum or repository used and known by the wider co mmunity of that language, not the Linguist repository. | |||
Once you've received consensus that the community is happy with your proposed co lor change, please feel free to open a PR making the change and link to the publ ic discussion where this was agreed by the community. | Once you've received consensus that the community is happy with your proposed co lor change, please feel free to open a PR making the change and link to the publ ic discussion where this was agreed by the community. | |||
If there are official branding guidelines to support the colour choice, please l ink to those too. | If there are official branding guidelines to support the colour choice, please l ink to those too. | |||
## Testing | ## Testing | |||
You can run the tests locally with: | You can run the tests locally with: | |||
```bash | ```bash | |||
bundle exec rake test | bundle exec rake test | |||
``` | ``` | |||
You can test the classifier locally with: | ||||
```bash | ||||
bundle exec script/cross-validation --test | ||||
``` | ||||
Sometimes getting the tests running can be too much work, especially if you don' t have much Ruby experience. | Sometimes getting the tests running can be too much work, especially if you don' t have much Ruby experience. | |||
It's okay: be lazy and let [GitHub Actions](https://github.com/features/actions) run the tests for you. | It's okay: be lazy and let [GitHub Actions](https://github.com/features/actions) run the tests for you. | |||
Just open a pull request and the bot will start cranking away. | Just open a pull request and the bot will start cranking away. | |||
Here's our current build status: [](https://github.com/github/linguist/acti ons) | Here's our current build status: [](https://github.com/github/linguist/acti ons) | |||
## Maintainers | ## Maintainers | |||
Linguist is maintained with :heart: by: | Linguist is maintained with :heart: by: | |||
skipping to change at line 216 | skipping to change at line 216 | |||
- Releases are performed by GitHub staff so we can ensure GitHub.com always stay s up to date with the latest release of Linguist and there are no regressions in production. | - Releases are performed by GitHub staff so we can ensure GitHub.com always stay s up to date with the latest release of Linguist and there are no regressions in production. | |||
[grammars]: /vendor/README.md | [grammars]: /vendor/README.md | |||
[heuristic]: https://github.com/github/linguist/blob/master/lib/linguist/heurist ics.yml | [heuristic]: https://github.com/github/linguist/blob/master/lib/linguist/heurist ics.yml | |||
[languages]: /lib/linguist/languages.yml | [languages]: /lib/linguist/languages.yml | |||
[licenses]: https://github.com/github/linguist/blob/9b1023ed5d308cb3363a882531de a1e272b59977/vendor/licenses/config.yml#L4-L15 | [licenses]: https://github.com/github/linguist/blob/9b1023ed5d308cb3363a882531de a1e272b59977/vendor/licenses/config.yml#L4-L15 | |||
[new-issue]: https://github.com/github/linguist/issues/new | [new-issue]: https://github.com/github/linguist/issues/new | |||
[samples]: /samples | [samples]: /samples | |||
[search-example]: https://github.com/search?utf8=%E2%9C%93&q=extension%3Aboot+NO T+nothack&type=Code&ref=searchresults | [search-example]: https://github.com/search?utf8=%E2%9C%93&q=extension%3Aboot+NO T+nothack&type=Code&ref=searchresults | |||
[gpr]: https://docs.github.com/packages/using-github-packages-with-your-projects -ecosystem/configuring-rubygems-for-use-with-github-packages | [gpr]: https://docs.github.com/packages/using-github-packages-with-your-projects -ecosystem/configuring-rubygems-for-use-with-github-packages | |||
[#5756]: https://github.com/github/linguist/issues/5756 | ||||
[merged-pr]: /docs/troubleshooting.md#my-linguist-pr-has-been-merged-but-gitHub- | ||||
doesnt-reflect-my-changes | ||||
End of changes. 12 change blocks. | ||||
24 lines changed or deleted | 26 lines changed or added |