Warning: Javascript must be enabled to use all the features on this page!
Click to hideNews Bulletins

Water Data for the Nation Automated Retrievals

Water Data for the Nation Automated Retrievals

Obtaining USGS Water Data via Automated Methods


USGS Water Data for the Nation Notification Service

USGS Water Data for the Nation is a highly available system. Like any system, it can experience downtime due to scheduled hardware and software upgrades, as well as unplanned network, equipment and power failures. Enhancements to the system can also result in changes to the output-file format that can adversely affect automated retrievals. In addition, water data are collected at millions of sites around the country that are maintained by different USGS Water Science Center offices. These science centers at times announce that data may be unavailable for certain periods. Although the USGS posts announcements on our public web site when significant issues arise, users performing automated retrievals often do not view the site with a browser and so do not receive these important messages.

If you depend on this system, it is in your best interest to join the USGS Water Data Notifications Service. We will send emails to subscribers of the list with information on any significant planned outages, unexpected system problems as well as any changes to the system that might affect the automated retrieval community. This email list is for announcements only, so you cannot send mail to it.

We provide a simple web-based interface to subscribe and unsubscribe from the Water Data Notifications Service. It is accessible from any web page on this site. Look for the link "Subscribe for system changes" near the bottom right corner of the page.


USGS Water Web Services and how it impacts you

The USGS has heard from many communities that its water data need to be more highly available and easier to acquire. Most data from this site are currently downloaded as tab-delimited (rdb) data files. While this approach works, it uses 20th century approaches rather than 21st century approaches. Extensible Markup Language (XML) is the 21st century's most common means for sharing data.

In addition, the USGS is being requested to provide its water data in friendlier formats, such as native Microsoft Excel spreadsheets, Javascript Object Notation (JSON) for Web 2.0 applications, Keyhole Markup Language (KML) to support integration with products such as Google Maps and Google Earth, and Geographic Information System (GIS) formats.

This transformation is underway but will take many years to complete. It involves creating multiple web services. USGS water web services will be as highly available, if not more so, than the data services available on this site. In addition, they will strive to be faster, more flexible and allow more data formats. Geography Markup Language (GML), an Open Geospatial Consortium standard XML schema will be used to describe site data. To the extent practical, the USGS is standardizing on WaterML as a common XML data format for its time series water data. WaterML is a standard by the Consortium of Universities for the Advancement of Hydrologic Science (CUAHSI), with whom the USGS has a partnership agreement. Each USGS water web service will include other formats as well, including our legacy tab-delimited (RDB) format.

Over a period of years, you should expect a rich set of web services to replace services found on this site. When new production USGS water web services are announced, you are encouraged to convert any applications to use them instead. While existing data services on the waterdata.usgs.gov domain will continue for the present, it is possible that some years in the future you will need to use the equivalent water service instead.

The USGS has a water services web site with detailed information on these emerging services. Currently, four production services are available, including a popular instantaneous values web service that can be used to retrieve our real-time data, a daily values web service and a new site service, that returns information about USGS hydrologic sites.

Through our notifications service, we will keep you informed of relevant new or enhanced web services, as well as warnings should any of the current data services become deprecated.


Frequently Asked Questions regarding Automated Retrievals

Can I use FTP to get USGS water data?

No. The data on this site are NOT available on an FTP site.

Can I get USGS water data using a web service?

Yes. Instantaneous (real-time) data, daily values data, site information and water quality data are now available via web services. All services can show data in an XML format. See the USGS Water Services web site for full details. All services will be enhanced in the future. Some interim services may not offer the same depth of selection and data attributes available through this site.

Users are encouraged to begin using these services to acquire data where possible as these services are designed to be highly available and in most cases faster than downloading tab-delimited files using this site.

Can I retrieve USGS Water data in XML format on this site?

The new Water Services site (waterservices.usgs.gov) allow instantaneous and daily values data to be downloaded in the XML format. This site (waterdata.usgs.gov) allows only "site-description" data to be retrieved in XML format. Unfortunately, at this time no other data can be retrieved as XML within this site. Water quality data can also be downloaded in XML from a separate site.

What machine-readable data format does this site support?

The principal machine-readable data format supported by this site is a variant of a tab-delimited ASCII file structure called rdb. The rdb file structure consists of a header section containing zero or more comment lines. The rdb header contains important information such as disclaimers, sites, parameters and location names. The header is followed by exactly one tab-delimited column-name row, which is followed by exactly one column-definition row, and a data section consisting of any number of rows of tab-delimited data fields. The header comment lines start with a sharp sign (#) followed by a space character followed by any text desired. Lines are delineated with Windows line endings (\r\n). The fields in the tab-delimited column-name row contain the names of each column. The fields in the tab-delimited column-definition row contain the data definitions and optional column documentation for each column. Data rows must have exactly the same number of tab-delimited columns as both the column-name and column-definition rows. Null data values are allowed.

Example rdb file:

# -------------------------------------------
# Documentation lines. These describe and
# identify the rdb file contents.
# -------------------------------------------
NAME COUNT TYP AMT OTHER RIGHT
6s 5n 3s 5n 8s 8s
Bill 44 A 133 Another This
John 44 23 One Is
Gary 77 77 Here On
Mark 77 B 244 And The
Greg 77 D 1111 So Right

When reading (parsing) water data rdb files, it is important to first parse the column-name row (the first non-comment row in the file) to determine the column position of each data value as different sites may return data columns in a different order.  Information detailing the column-name syntax of water data files are contained in the header comments of the file for each data type. General information regarding the rdb file structure can be found here.

I need very specific data. How do I get just the data I need?

All web queries of this site are done via the Hypertext Transfer Protocol (HTTP) GET method. This is significant because all information defining the query is contained in the various fields and arguments of the URL string. All data in this site can be retrieved with a URL by providing the correct URL-argument specifications. A number of examples are shown at the end of this document.

To begin the process, interactively navigate this site's pages to obtain the tab-delimited data you are interested in and then note the URL syntax. Once you have the URL that precisely describes the data you want, it can be bookmarked, or run in an automated fashion using various tools that may be available in your programming language or operating system.

This site supports a large number of URL arguments, which allow requests to be fine-tuned and generalized -- for example, by station number and specific data parameters of interest. It also supports desired time-periods of the data or the time period of last update. Note that any URL argument with a null value (ex. &variable=) can be ignored arguments with a non-null value (ex. &variable=flow) are essential.

What techniques are used to automate the retrieval of data?

Most users will prefer to use USGS Water Web Services to acquire USGS water data if an appropriate service exists there. However, you can also use this site to retrieve water data. The techniques are the same regardless of which site is hosting the data.

Automated retrievals are made by developing a program or application to submit the appropriate URLs and then parse the results in whatever way is appropriate for the intended use. On this site, some users have developed programs that read the HTML formatted by this site's pages rather than the tab-delimited (rdb) formatted data and scan for the data values amongst the HTML tags. This "screen scraping" approach is generally imprecise and therefore "brittle", processing intensive, and is not recommended. Even downloaded data as tab-delimited fields can introduce problems when the data format changes. When possible we suggest downloading the appropriate data as XML from the USGS Water Services Web Site rather than this site.

There are numerous ways to automate the downloading of data. Most operating systems come with the ability to automatically perform a task at a regular time. If your computer runs either Linux or some variant of Unix, the cron utility will be of interest. Windows XP has a task scheduler. Windows 7 has a similar scheduling utility. You can use the appropriate utility to run your program.

If your operating system is Linux or some Unix variant, curl and wget are popular utilities for retrieving files over the Internet. wget is also available for Windows. If you are using Windows, it is possible to put the commands in a batch (.bat) file and call it from the Windows task scheduler. In addition, most modern programming languages support functions to retrieve files over HTTP. Check your programming language documentation for more detail.

Regardless of the means, please take care to write your queries carefully and to run the queries only when necessary. Please follow the best practices outlined below.

How do I get help refining my URL query?

Please tell us what you want to do by sending an email to gs-w_waterdata_support@usgs.gov.

How do I automate the download of tab-delimited water data into Microsoft Excel?

Many users have found that when they retrieved the data in tab-delimited format (rdb) they can create an Excel macro to transform the information so it can be rendered easily into MS Excel. Sorry, we cannot program your macros for you!

More advanced users will find ways of using Visual Basic for Applications (VBA) available in Excel to do programmatic logic. In most cases, it is difficult to fully automate the process.

How do I convert a tab-delimited RDB file to a Microsoft Excel spreadsheet?

Instructions for Excel 2007 and Excel 2010:

Instructions for Excel 2003:

Is there a limit to the amount of data I can retrieve?

Yes, a single request will not return more than 100,000 records, a limitation intended to prevent any one data consumer from unduly affecting other users of the system.

Is there a good time of day to retrieve data?

Yes.  If possible, we prefer that you retrieve information during "off peak" hours. Midnight to 6 AM Eastern Time is ideal.

Are there best practices for writing programs to retrieve data?

Absolutely. At times (less frequently these days) we have had to shut down users performing automated retrievals from accessing this site in order to keep our system available. We hate to do it, but the public depends on this system's availability. Here are some tips to help you get data efficiently and reduce the likelihood that you will impact other users of the system:

Thank you for your cooperation.

If I serve USGS data, do I need to give credit to the U.S. Geological Survey?

In general, there is no requirement to attribute USGS data, since the data are in the public domain. However, the USGS strongly encourages those who serve data from this site to credit the USGS following these guidelines.

Please be aware that the USGS logo is a trademarked symbol. As such the logo can only be used if appropriate policies are followed. The USGS maintains a visual identity web site with more information.

The USGS provides valuable timely and scientifically reliable water data to the nation. By crediting the USGS, and by linking your site to the USGS, you help spread the word about USGS water science data. We would appreciate it if you would take a few moments to let us know how you are using USGS water science data and the communities you are serving.

The link to the USGS web site is:

http://www.usgs.gov

The proper link for this site is:

http://waterdata.usgs.gov/nwis

How do I ensure I properly convey the correct meaning of the data?

Because water science can be complex, proper interpretation of the data on this system can be error prone. Much information is available in the Help System. We encourage users to contact us if you have any questions on how to correctly interpret the data.

I need a large amount of data from this system. Can the U.S. Geological Survey (USGS) retrieve the data and send it to me?

Unfortunately, the USGS does not provide National data retrievals for the public; however, this system offers many ways to retrieve and download the data that you need. For your convenience, these data are available 24 hours a day, 7 days a week. If you need assistance in the use of this system to retrieve the data you need, please ask us.

If you require USGS state or U.S. territory data that is not available in this system, in many cases the local USGS Water Science Center can provide it for you. An easy way to contact a local USGS Water Science Center is to navigate to any state page using this system. Select the data category and the geographic area from the upper right-hand corner and press "GO". Once you are on the desired state page, select the "Questions about sites/data?" link at the bottom of the page. By filling out the form, your request will go directly to the local USGS Water Science Center.

How do I get a copy of your Site file?

This system contains detailed information on all the sites (locations) where the USGS and affiliated agencies collect water information. This collection of information is sometimes referred to as the "Site file". Each site contains information including the site location, its period of record, and the type of data collected.

There are nearly 1.5 million sites defined in the USGS National Water Information System. Unfortunately, the USGS does not maintain a single National file for download that contains all this information; however, it is possible through multiple queries of this system to assemble all the information in the Site file.

It is recommended that you use the USGS Site Web Service to acquire site data. Site data can also be acquired on this site.

Either way of acquiring data has the same problem: there is too much data for all site data to be retrieved in a single call. However, data can be acquired slowly over time through repeated calls to different geographical areas. Unfortunately, since there are more than a million sites, you cannot specify a box that covers a large area like the continental United States. One way is to use the system to retrieve a list of stations by one degree of longitude at a time. On this site, it works with the limitation in the system that restricts any query to a maximum of 100,000 records.

See the example below as a method of acquiring data using this site.

How do I download water quality sample and results data?

Note that there is a web service that allows you do download water quality samples and results. We encourage you to use this service.

Similar data are also available on this site. Click on the Water Quality button from either the national or state page of interest. Then select "Field/lab Samples" and follow the navigation interface. By default, the data will save to a tab-delimited (rdb) file.

The output formats for these data are documented here.

How do I embed a graph of your current streamflow conditions for a site onto my web page?

The URL which retrieves the graph must follow the following syntax:

http://waterdata.usgs.gov/nwisweb/graph?<args>

where <args> should be limited to:

Arguments may be specified in any order and must be separated by a &.

Example URL:

http://waterdata.usgs.gov/nwisweb/graph?agency_cd=USGS&site_no=06025500&parm_cd=00060&period=7

To embed a graph inside a web page, the <img> tag must be used with the src attribute set to the URL of the graph and placed in the desired spot of your HTML. All images are 576x400 pixels. To speed your page's rendering time, it is recommended that these values be specified in the width and height attributes of the <img> tag.

Here is an example:

<img src="http://waterdata.usgs.gov/nwisweb/graph?site_no=06025500&parm_cd=00060" width="576" height="400" alt=" USGS Water-data graph for site 06025500" />

Most sites report measurements hourly. Consequently, there is no point in retrieving data more than hourly. We would appreciate it if you would be kind to our servers by marking up your HTML or passing a proper HTTP header so the browser does not attempt to refresh the page more than once an hour. Here is an example of markup that would do this when placed in the <head> section of your HTML:

<meta http-equiv="refresh" content="3600" />

Do you provide water "widgets" similar to sites like weather.com and wunderground.com?

In general, this site is oriented around providing USGS water data. Users are invited to create any widgets or similar services they wish using data on this site, since the data are in the public domain.

The USGS has a number of communities which help interpret the data on this site. More information is available on the USGS Water site. Some may provide services like widgets in the future.


Examples

Some data are now available as web services. Examples are shown using web services if they exist, but also showing the legacy way.

When you implement your automated-retrieval script using the old methods, be sure to use host "waterdata.usgs.gov" to ensure the most reliable response. However, if interactively navigating this site to obtain the data you are interested in redirects you to the host "nwis.waterdata.usgs.gov", then use that hostname to access discrete water-quality data, peak streamflow data, and groundwater level data. Remember, any data on this site can be retrieved via automated methods.

Please refer to http://nwis.waterdata.usgs.gov/nwis/help?codes_help for domains.

Examples of URLs used for automated retrieval of real-time data

Retrievals by Period

New Way

You can now use the instantaneous values web service to retrieve real-time data. While you can return tab-delimited (rdb) data using the web service, you can also return data in XML (WaterML schema) or in a Javascript Object Notation (JSON) format. Three examples below show using the web service to retrieve the last 7 days of real-time data for all available parameters for site number in each format. There is a test tool available for the service that helps you understand the various outputs and filters, and lets you create a workable query.

Note: One important difference with the web service is that when specifying a period, the service return data for x days from now, whereas the old data service include all values thru midnight site local time x days from now.

XML (WaterML):

http://waterservices.usgs.gov/nwis/iv?sites=08313000&period=P7D&format=waterml

Tab-delimited (rdb):

http://waterservices.usgs.gov/nwis/iv?sites=08313000&period=P7D&format=rdb

JSON:

http://waterservices.usgs.gov/nwis/iv?sites=08313000&period=P7D&format=json
Old Way

This simple URL retrieves the last 7 days of real-time data for all available parameters for site number, 08313000 in tab-delimited (rdb) format:

http://waterdata.usgs.gov/nwis/uv?format=rdb&period=7&site_no=08313000

Retrieving changed data only for a parameter

The following URLs represents the most efficient way to maintain a cache of all of today's real-time streamflow data for a list of sites on your local computer. The URL shown below will retrieve all of today's streamflow data (parameter code 00060) in tab-delimited (rdb) format for any of the five sites shown that have received updated data in the previous 30 minutes. If only one site has received updated data in the previous 30 minutes, only data for that one site will be returned.

New Way

Up to 100 sites can be specified in each request. With tab-delimited (rdb) data, if no data exists for a site, the headers will appear for the site, but no data will follow. Note that WaterML and JSON formats are also supported with the web service.

(URL shown in paragraph format for readability. This would normally appear all in one line.)

http://waterservices.usgs.gov/nwis/iv
 ?format=rdb
 &sites=06006000,06012500,06016000,06017000,06018500
 &period=P1D
 &modifiedSince=PT30M
 &parameterCd=00060
Old Way

If none of the five sites have received updated data in the previous 30 minutes the URL will return the string "No sites/data available for the selection criteria specified". Up to 20 sites can be specified in each request. The URL shown is intended to be reissued every 30 minutes. To return a different parameter code, modify the &index_pmcode_* argument appropriately. To retrieve data for all parameters for the sites, omit the &index_pmcode_* argument entirely.

(URL shown in paragraph format for readability. This would normally appear all in one line.)

http://waterdata.usgs.gov/mt/nwis/uv
 ?multiple_site_no=06006000,06012500,06016000,06017000,06018500
&result_md=1&result_md_minutes=30
 &index_pmcode_00060=1
 &period=1
 &format=rdb

The RDB output will match the example show for the new service.

Retrieving changed data only for all available parameters

Similarly, the following URLs represents the most efficient way to maintain a cache of only the most recent real-time data value for each available parameter for a list of sites on your local computer. As above, the URL will only retrieve data for any of the listed sites that have received update data in the previous 30 minutes. The URL is intended to be reissued every 30 minutes.

New Way

(URL shown in paragraph format for readability. This would normally appear all in one line.)

http://waterservices.usgs.gov/nwis/iv
 ?format=rdb
 &sites=06006000,06012500,06016000,06017000,06018500
 &period=P1D
 &modifiedSince=PT30M
Old Way

(URL shown in paragraph format for readability. This would normally appear all in one line.)

http://waterdata.usgs.gov/mt/nwis/current
 ?multiple_site_no=06006000,06012500,06016000,06017000,06018500
 &result_md=1&result_md_minutes=30
 &period=1
 &format=rdb

Retrieving all real-time sites for a state

The same as above, but for all parameters at all real-time sites in New Mexico that have received updated data in the previous 30 minutes:

New Way
http://waterservices.usgs.gov/nwis/iv?format=rdb&stateCd=NM&modifiedSince=PT30M
Old Way

(URL shown in paragraph format for readability. This would normally appear all in one line.)

http://waterdata.usgs.gov/mn/nwis/current
 ?result_md=1
 &result_md_minutes=30
 &format=rdb

Examples of URLs used for automated retrieval of daily value data:

The examples shown will retrieve all daily value streamflow data (parameter code 00060) for site number 06090800 from 2005-01-01 through the present. To obtain the entire period-of-record use a start date of 1880-01-01 but be careful because you receive a lot of data!!

From a Start Date

New Way

Use the new daily values web service. Tab-delimited (rdb) output is also supported using format=rdb.

http://waterservices.usgs.gov/nwis/dv?format=waterml,1.1&sites=06090800&startDT=2005-01-01

Note that a test tool is available that helps you create a query, as there are many possible outputs and filters with the service.

Old Way

Data are retrieved in a tab-delimited (rdb) format only. Note that an argument of &end_date=YYYY-MM-DD is also supported.

(URL shown in paragraph format for readability. This would normally appear all in one line.)

http://waterdata.usgs.gov/nwis/dv
 ?site_no=06090800
 &cb_00060=on
 &begin_date=2005-01-01
 &format=rdb

For a Period

New Way

To obtain the most recent 60 days of daily value data for a single site in a WaterML XML format, use the following syntax. Tab-delimited (rdb) format is also supported using format=rdb.

http://waterservices.usgs.gov/nwis/dv?format=waterml,1.1&sites=06090800&period=P60D
Old Way

To obtain the most recent 60 days of daily value data for a single site in a tab-delimited (rdb) format, use the following syntax:

(URL shown in paragraph format for readability. This would normally appear all in one line.)

http://waterdata.usgs.gov/nwis/dv
 ?site_no=06090800
 &cb_00060=on
 &period=60
 &format=rdb

Examples of URLs used for automated retrieval of site-description information:

New Way

A site water web service is now available but does not yet support an XML format. However, the tab-delimited (rdb) format is supported and is currently the default. Google Earth and Google Maps formats are supported as well using different format parameter values. The following URL will return basic site-description information for site 06090800 in a tab-delimited (rdb) format:

http://waterservices.usgs.gov/nwis/site?format=rdb&sites=06090800

Note that a test tool is available that helps you create a query, as there are many possible outputs and filters with the service.

Old Way

This site also supports a site data service that return data in well-formed XML. The site data are high level but contains most attributes considered useful.

The following URL will return the specified site-description information for site 06090800 in XML format:

(URL shown in paragraph format for readability. This would normally appear all in one line.)

http://waterdata.usgs.gov/nwis/inventory
 ?search_site_no=06090800
 &format=sitefile_output
 &sitefile_output_format=xml
 &column_name=agency_cd
 &column_name=site_no
 &column_name=station_nm
 &column_name=dec_lat_va
 &column_name=dec_long_va
 &column_name=alt_va

The selected site-description information can vary in the output navigate the site's interface before setting up the programs that retrieve data to see what site-description field specifications are available. This URL will return the specified site-description information for sites that have daily value streamflow data in the state of New Mexico in tab-delimited (rdb) format.

(URL shown in paragraph format for readability. This would normally appear all in one line.)

http://waterdata.usgs.gov/nm/nwis/inventory
 ?data_type=discharge
 &format=sitefile_output
 &sitefile_output_format=rdb
&column_name=agency_cd
 &column_name=site_no
 &column_name=dec_lat_va
 &column_name=dec_long_va
&column_name=state_cd&column_name=alt_va