Running locally
Contents:
Quickstart
To use the tool for your own set of domains:
- Clone the repository
- Install NodeJS
- Run
npm install
- Delete
CNAME
- Update the
User-Agent
header in theheaders
object inscraper.js
- (Optional) Use custom domains or disable www validation
- Run
node scraper
- To only run one domain from the list, append the domain name to the end of the command (example:
node scraper domain.gov
)- Make sure to run the scraper with all domains first
- To only run one domain from the list, append the domain name to the end of the command (example:
- Run with or without the website
Running with the website
- Install Jekyll
- Run
bundle exec jekyll serve
to start the website - The URL of the local site will be printed
- (Optional) Publish to GitHub
Single-level government
If your organization doesn’t have sub-organizations:
- Open
/_includes/jumbotron.html
and search forsingle-level
- Delete or comment out everything between the two comments
- Open
/scripts/home.js
- Set
MULTI_LEVEL
to false
Running without the website
If you don’t want to have a graphical version and only want to use the data (located in /data
), delete:
- Any folder starting with
_
/assets
/blog
/css
/scripts
_config.yml
Gemfile
andGemfile.lock
/.jekyll-cache
if it exists
Customization
Custom domains
To use your own list of domains:
- Open
data/domains.csv
- Delete all lines except for the first
- Add in your domains in the format
domain,agency
(example)- Domains can be any case
- Domains shouldn’t start with
http
orwww
domain1.gov,Department of X
domain2.gov,Department of Y
domain3.gov,Department of Z
To add a single domain:
- Run the scraper as
node scraper new-domain.gov
- Fill in the organization name
Adding/removing metadata parameters
To add/remove parameters from the data:
- Open
scrapers/metadata.js
- Edit the
properties
,variables
, andcsvVariables
arrays- Make sure each index lines up (first element in
properties
matches first element invariables
)
- Make sure each index lines up (first element in
To add/remove parameters to the site:
- Make sure the parameter is in the data
- Open
scripts/variables.js
- Add an item in the
properties
,names
,variables
, anddescriptions
arrays- Make sure the name in
variables
matches the one in the data - Make sure each index lines up
- Make sure the name in
Removing WWW validation
If you don’t want to check for www canonicalization:
- Open
scrapers/url.js
- Change
CHECK_WWW
tofalse
- Open
scripts/util.js
- Change
CHECK_WWW
tofalse