Introducing CI Sentinel!

A comprehensive map of America's critical infrastructure.

Methods

My methods of creating this map came from collecting data from several datasets, and applying distributed weights to each facility, and weighing each of those facilities depending on their own details.

Classification of Importance by Industry

One of the most challenging aspects of this projects is trying to classify and rank each of specific facility by importance to national security, and generally trying to rank how "critical" a facility is to the security of the United States. For example, a facility with the NAICS code of 336414 is for "Guided Missile and Space Vehicle Manufacturing;" So facilities with this classification are significantly more important and should be weighed much more than say a facility with the NAICS classification of 111150 which is "Corn Farming." However, as of now, only 3 facility types are being weighed, this will be fixed with time.

With the classification of industries past, it was much easier to rank the criticality of non-commercial and governmental facilities. This was done mainly by measuring the raw metrics of each of these facilities and giving more weight to the facilities with higher metrics.

Notebooks

Here is a list of my notebooks and the purposes each one served.

cleanGeoJSONs.ipynb

This notebook removed unnecessary fields from the GeoJSONs. In each of the datasets, there are fields that did not need to be measured for the importance of critical infrastructure, such as phone numbers and addresses. These fields took up a ton of storage space and memory use, so it was very necessary to remove a lot of the fields.

standardizeGeoJSONs.ipynb

This notebook changes any GeoJSON file that is not in EPSG:4326 to this format. GeoJSONs come with many different Coordinate Reference Systems, such as EPSG:3857, EPSG:7789, or EPSG:31983. These formats are not compatible with each other, so they need to be standardized into one single format. For this project, EPSG:4326 is what I chose because it is easy to read.

geoJSONtoGPKG.ipynb

This was created to convert all of the GeoJSONs to binary in order to process them faster. They were not able to speed up the process and I could not figure out why, but I would love some contribution on this.

weights.ipynb

This is the one where the magic happens. This notebook takes in the cleaned datasets, calculates the weight of each facility (and give a weight to each object) and then deletes the rest of the fields that aren't the "weight" or "geometry." Deletign the rest of the fields proved to be critical for the use of this application, because if not, it would take an absurd amount of time to process these large files and plot everything onto the map.

map_creator_best.ipynb

This notebook creates the map at county, state, and national level. With this, all of the weighted datasets are ingested into memory, and the notebook plots each county and state alphabetically. In cell 6, all of the counties in America are processed alphabetically. In cell 7, all of the states are processed. In cell 5, the whole nation is procesed and the legend is created. For each county amd state, it creates a folder in which there are 4 objects stored, The first one is a map of all of the critical infrastructure for each. Next, it creates a text file which includes the datasets used, the ones not used, and ones that ran into any errors. Next, there is a map of the top 20 most important facilities in each county by weight. Lastly, there is a map of the top 5 aggragate points for each county. This process is repeated for the states in cell 7. Note that I used Google Colab Pro with High-RAM in order to speed up the process.

List of sectors and datasets:

Communications

- Cell Towers

- Microwave Service Towers

Education

- Colleges and Universities

- Private Schools

Emergency Services

- Local Law Enforcement

- Fire and Emergency Services

- State Emergency Services

Energy

Electric

- Electric Power Grid and Transformers

Gas

- Above Ground LNG Storage
- Biodiesel Plants
- DOE Petroleum Reserves
- Ethanol Plants
- Hydrocarbon Gas Liquid Pipelines
- LNG Import and Export Terminals
- Natural Gas Compressor Stations
- Natural Gas Pipelines
- Natural Gas Processing Plants
- Peak Shaving Facilities
- Petroleum Terminals

Oil

- Oil and Natural Gas Wells
- Oil Refineries

Economy and Finance

- County Business Patterns

- Federal Reserve Banks

- FDIC Insured Banks

- Gold Bullion Depositories

Geographic

- Counties and State Lines GeoJSON

This dataset contains no weight, it is simply a GeoJSON outline of each of the counties and states.

Food

- Public Refrigerated Warehouses

Government

- Courthouses

- Major State Government Buildings

- State Capital Buildings

- US Army Corp of Engineers Offices

Healthcare

- Health Facilities

Industry

- Fortune 500 Headquarters

- Manufacturing Facilities

Military

- Military Installations

Mines

- Agricultural Mineral Operations

- Construction Mineral Operations

- Ferrous Metal Mines

- Ferrous Metal Processing Plants

- Mines and Mineral Resources

- Non-ferrous Metal Mines

- Non-ferrous Metal Processing Plants

- Refractory Abrasive and Other Industrial Mineral Operations

- Sand and Gravel Operations

- Uranium Deposits

Population

- Population by Tract

Transportation

- Airports

- Bridges

- Ports

- Railroads

- Roads

- Spaceports

Waste

- Solid Waste Landfill Facilities

Water

- Aquifers

- Dams

- USACE Owned and Operated Reservoirs

Contribution and Improvements

I am making this repository open source because an issue as important as critical infrastructure must be mapped out for the average person to see and measure. Contributions are more than welcome to anyone who would like to contribute. Below is a current list of some items that I was not able to find/measure and would welcome any solution to these issues.

Other HIFLD Datasets:

There are many other datasets from HIFLD that I have not processed, over 400 to be exact. I would love help in expanding for all of these datasets.

All Manufacturing Facilities

The General Manufacturing Facilities dataset has a lot of variability, and all of the NAICS codes need to be graded manually one by one. There are over 65,000 of them as well.