USDA Food Data API database builder for self-hosted API access to the USDA Food Data database
npm install usda-food-data-api-builderNOTE: THIS PACKAGE IS UNRELATED TO THE OFFICIAL USDA FOOD DATA API
builder portion, which downloads the USDA Food Data API JSON archives and uses
ts/downloads.ts file.
mongodb:// uri when prompted.
shell
npx usda-food-data-api-builder --verbose
Starting usda-food-data-api-builder...
Enter your mongodb:// uri:
`
If you want to skip entering a mongodb uri, simply make the
./usda-food-data.json file yourself:
`shell
echo "{\"mongouri\":\"mongodb://localhost/usda-food-data\"}" > usda-food-data.json
`
For example if we use mongodb://localhost/usda-food-data, the program would
import the data into a database called usda-food-data on the MongoDB database
hosted on localhost:
`shell
npx usda-food-data-api-builder --verbose
Starting usda-food-data-api-builder...
Enter your mongodb:// uri: mongodb://localhost/usda-food-data
`
The program will save your mongodb:// url to the current directory in a
usda-food-data.json file. It will then proceed to download, unzip, and process the documents from the
JSON files
into the MongoDB database. The count of each document type are as follow:
- FoundationFoodItem has 159 entries
- BrandedFoodItem has 373,897 entries
- SRLegacyFoodItem has 7,793 entries
- SurveyFoodItem has 7,083 entries
$3
`shell
node dist --verbose
Starting usda-food-data-api-builder...
Finished importing 159 documents into FoundationFoodItem.
Finished importing 7793 documents into SRLegacyFoodItem.
Finished importing 7083 documents into SurveyFoodItem.
Finished importing 373897 documents into BrandedFoodItem.
Process completed in 0h 41m 35.55s
`
$3
By default, the program will batch saves to the MongoDB providing a minor
performance boost. If you need each document to save without batching, pass the
argument --no-batch to the program.
$3
By default, the program will attempt to remove duplicate copies of documents.
This speeds up the process, since most of the time is spent having mongoose
normalize documents for insertion. The trade off is the program requires a bit
of memory for the { [key: number ]: mongoose.ObjectId] } data structure. On
the Windows 64-bit machine used to develop this it's about ~1.5GB of memory. See
the Releases section below if you just need the data without needing the memory
requirements.
You can also pass the argument --no-link. This is unsupported at the moment,
but it skip the caching step. This will result in a much larger database, as
every JSON object in the USDA Food Data JSON files will be added as a Document.
$3
On an AMD FX8120 CPU, the process completes in an hour with default settings.
$3
If you are getting errors while downloading and uncompressing the archives, or
parsing the JSON files, try removing the files in the data directory and
redownloading. Please be mindful when downloading archives.
If you are authenticating via the admin database, make sure you include
?authSource=admin in your mongodb:// uri, for example:
`
mongodb://user:very_secure_random_password@localhost/usda-food-data?authSource=admin
`
This was written using Node v16.14.2 and TypeScript v4.6.2. Using older
versions may or may not work.
Releases
Since this process is intensive, releases are provided. Releases correspond to
the data URLs in that version ts/downloads.ts file. These releases are just
mongodump --gzip backups of the usda-food-data database. An example of using
mongorestore to restore the database to a MongoDB instance on localhost:
`shell
mongorestore --host=localhost --port=27017 --gzip \
--archive=usda-food-data-api-linked-v1.0.3.tar.gz
``