Data Transfer

Last update: January 8, 2024

This guide will outline and instruct methods of transferring data between TACC resources and and your local machine. Transfer methods generally fall into two categories:

  1. Command-line (CLI) tools e.g. scp, sftp, rsync
  2. Graphical User Interface (GUI) tools, e.g. Globus, Cyberduck

Attention

Globus Users: See Globus v5.4 Transition update

Command-Line Tools

A common method of transferring files between TACC resources and/or your local machine is through the command line. Whether on a Windows or a Mac, open the terminal applicaton on your local laptop to use one of the following Unix command-line (CLI) tools.

These three command line tools are secure and can be used to accomplish data transfer. You can run these commands directly from the terminal if your local system runs Linux or macOS: scp, sftp, & rsync

Note

It is possible to use these command line tools if your local machine runs Windows, but you will need to use a ssh client (ex. PuTTY ).

To simplify the data transfer process, it is recommended that Windows users follow the How to Transfer Data with Cyberduck guide as detailed below.

Determining Paths

Before beginning data transfer with command-line tools, you will need to know:

  • the path to your data file(s) on your local system
  • the path to your transfer directory on the remote system

In order to transfer your project data, you will first need to know where the files are located on your local system.

To do so, navigate to the location of the files on your computer. This can be accomplished on a Mac by using the Finder application or on Windows with File Explorer application. Common locations for user data at the user's home directory, the Desktop and My Documents.

Once you have identified the location of the files, you can right-click on them and select either Get Info (on Mac) or Properties (on Windows) to view the path location on your local system.

Figure 1. Use Get Info to determine "Where" the path of your data file(s) is

Figure 1. Use Get Info to determine "Where" the path of your data file(s) is

For example, a file located in a folder named portal-data under Documents would have the following path:

On Windows
On Mac/Users/username/Documents/portal-data/my_file.txt
\Users\username\My Documents\portal-data\my_file.txt

Transfer with scp

The scp command copies files between hosts on a network. To transfer a file (ex. my_file.txt) to the remote secure system via scp, open a terminal on your local computer and navigate to the path where your data file is located.

On Mac
localhost$ cd ~/Documents/portal-data/
On Windows
localhost$ cd %HOMEPATH%\Documents\portal-data\

Assuming your TACC username is bjones and you are affiliated with UT Austin, a scp transfer that pushes my_file.txt from the current directory of your local computer to the remote secure system would look like this:

This command will copy your data file directly to your individualized transfer directory on the remote storage system.

localhost$ scp ./my_file.txt bjones@host:/transfer/directory/path

If you have not done so already, enter this command in your terminal, replacing the file name, TACC username, and your individualized transfer directory path appropriately.

After entering the command, you will be prompted to login to the remote secure system by entering the password associated with your TACC account as well as the token value generated from your TACC token app.

A successful data transfer will generate terminal output similar to this:

my_file.txt     100% ##  #.#          KB/s   ##:##

If you wish to learn more about scp and how to synchronize your file transfer, you can do so the online man page for scp or follow the file transfer section of the user guide for the appropriate TACC system:

Transfer with sftp

sftp is a file transfer program that allows you to interactively navigate between your local file system and the remote secure system. To transfer a file (ex. my_file.txt) to the remote secure system via sftp, open a terminal on your local computer and navigate to the path where your data file is located. 

On Mac
localhost$ cd ~/Documents/portal-data/
On Windows
localhost$ cd %HOMEPATH%\Documents\portal-data\

Assuming your TACC username is bjones and you are affiliated with UT Austin, an sftp transfer that pushes my_file.txt from the current directory of your local computer to the remote secure system would look like this:

localhost$ sftp bjones@host:/transfer/directory/path
Password:
TACC Token Code:
Connected to host.
Changing to:
  /transfer/directory/path
sftp>

If you have not done so already, enter this command in your terminal, replacing the TACC username and your individualized transfer directory path appropriately.

You are now logged into the remote secure system and have been redirected to your transfer directory. To confirm your location on the server, enter the following command:

sftp> pwd
Remote working directory:
/transfer/directory/path

To list the files currently in your transfer directory:

sftp> ls
utaustin_dir.txt

To list the files currently in your local directory:

sftp> lls
my_file.txt

Note

The leading l in the lls command denotes that you are listing the contents of your local working directory.

To transfer my_file.txt from your local computer to your transfer directory:

sftp> put my_file.txt
Uploading my_file.txt to /transfer/directory/path
my_file.txt     100% ##  #.#          KB/s   ##:#

To check if my_file.txt is in the utaustin subfolder:

sftp> ls
my_file.txt
utaustin_dir.txt

To exit out of sftp on the terminal:

sftp> bye
localhost1$

If you wish to learn more about sftp, you can do so at the online man page for scp.

Transfer with rsync

rsyncis a file copying tool that can reduce the amount of data transferred by sending only the differences between the source files on your local system and the existing files in your transfer directory. To transfer a file (ex. my_file.txt) to the remote secure system via rsync, open a terminal on your local computer and navigate to the path where your data file is located.

On Mac
localhost$ cd ~/Documents/portal-data/
On Windows
localhost$ cd %HOMEPATH%\Documents\portal-data\

Assuming your TACC username is bjones and you are affiliated with UT Austin, an rsync transfer that pushes my_file.txt from the current directory of your local computer to the remote secure system would look like this:

localhost$ rsync ./my_file.txt bjones@host:/transfer/directory/path

If you have not done so already, enter this command in your terminal, replacing the TACC username and your individualized transfer directory path appropriately.

If the command returns 0 in your terminal, the data transfer was successful.

If you wish to learn more about rsync and how to synchronize your file transfer, you can do so the online man page for rsync or follow the file transfer section of the user guide for the appropriate TACC system:

Consult your resource's respective user guide's "Transferring Files" section for more detailed information on the scp and rsync utilities:

GUI Tools

Cyberduck

Cyberduck is a free graphical user interface for data transfer and is an alternative to using the command line. With a drag-and-drop interface, it is easy to transfer a file from your local system to the remote secure system. You can use Cyberduck for Windows or macOS.

Download and install Cyberduck for Windows on your local machine.

Windows

Once installed, click "Open Connection" in the top left corner of your Cyberduck window.

Figure 2. Windows Cyberduck and "Open Connection" setup screen

To setup a connection, type in the server name, host. Add your TACC username and password in the spaces provided. If the "More Options" area is not shown, click the small triangle button to expand the window; this will allow you to enter the path to your transfer directory, /transfer/directory/path, so that when Cyberduck opens the connection you will immediately be in your individualized transfer directory on the system. Click the "Connect" button to open your connection.

Consult Figure 3. below to ensure the information you have provided is correct. If you have not done so already, replace the "Path" with the path to your individualized transfer directory.

Figure 3. Windows "Open Connection" setup screen

Note

You will be prompted to "allow unknown fingerprint…" upon connection. Select "allow" and enter your TACC token value.

Once connected, you can navigate through your remote file hierarchy using the graphical user interface. You may also drag-and-drop files from your local computer into the Cyberduck window to transfer files to the system.

Mac

Once installed, go to "Bookmark > New Bookmark" to setup a connection.

Note

You cannot select "Open Connection" in the top left corner of your Cyberduck window as macOS’ setup screen is missing the "More Options" button.

To setup a connection using "New Bookmark", type in the server name, host. Add your TACC username and password in the spaces provided. If the "More Options" area is not shown, click the small triangle or button to expand the window; this will allow you to enter the path to your transfer directory, /transfer/directory/path, so that when Cyberduck opens the connection you will immediately be in your individualized transfer directory on the system. As you fill out the information, Cyberduck will create the bookmark for you. Exit out of the setup screen and click on your newly created bookmark to launch the connection.

Figure 4. macOS "New Bookmark" setup screen

Consult Figure 4. above to ensure the information you have provided is correct. If you have not done so already, replace the "Path" with the path to your individualized transfer directory.

Note

You will be prompted to "allow unknown fingerprint…" upon connection. Select "allow" and enter your TACC token value.

Once connected, you can navigate through your remote file hierarchy using the graphical user interface. You may also drag-and-drop files from your local computer into the Cyberduck window to transfer files to the storage system.


Globus v5.4 Transition

January 8, 2024

Beginning Monday, January 8th, 2024, Globus will be transitioning to version 5.4. This transition will impact all TACC researchers who use Globus and will require you to update your profile with your ePPN (an acronym-to not-worry-about) to continue using the Globus service.

  1. Login to CILogon and click on "User Attributes":

  2. Login to your TACC user profile and click "ePPN" on the left menu:

  3. Enter your ePPN from Step 1 in and save. Allow at least 15 minutes for your change to propagate through the system.

Once you've completed these steps, you will be able to use the Globus File Manager as usual. If you encounter any issues, please submit a support ticket.

Important

Select an endpoint that has "GCS v5.4" in the title.


Globus Data Transfer Guide

Globus supplies high speed, reliable, asynchronous transfers to the portal. Globus is fast, for large volumes of data, as it uses multiple network sockets simultaneously to transfer data. It is reliable for large numbers of directories and files, as it can automatically fail and restart itself, and will only notify you when the transfers are completed successfully.

To get these benefits there are a few setup steps you have to do beyond the normal Data Depot transfer process. Most of the steps you are only required to do once when you set up Globus to use for the first time. Several steps will need to be repeated each time you set up a new computer to use Globus for the portal. Once you are set up, you can use Globus not only for transfers to and from the portal, but also to access other cyberinfrastructure resources at TACC and around the world.

To start using Globus, you need to do two things: Generate a unique identifier for all Globus services, and enroll the machine you are transferring data to/from with Globus (this can be your personal laptop or desktop, or a server to which you have access). Follow this one-time process to set up the Globus file transfer capability.

PLEASE NOTE: You must use your institution’s credentials and not your personal Google account when setting up Globus. If you use a personal account, you will encounter an issue with the transfer endpoint (Frontera, Stampede2, Corral, Ranch, etcetera).

Step 1: Retrieve and Associate a Distinguished Name (DN) with Your TACC Account

In order for Globus to know who you are when you move data in and out of the CEP portal from your computer, or between any other pair of systems, Globus needs a unique identifier for you, which is called a “Distinguished Name”, or DN. You can generate a DN instantly for free. To create a DN, you need to log in from some authoritative source that can verify your identity, typically your university or employer. If you already have a DN from another source, you can use that. If you do not, you can associate one with your account from many of the major universities in the world via the “CI Logon” service.

To retrieve your DN, go to https://cilogon.org in your browser. Select an Identity Provider from the drop-down list, and click "Log On" which will take you to the login screen for the Identity Provider you selected. If your university or employer is not in the list, we recommend registering for an XSEDE account as XSEDE is a CILogon Identity Provider.

After successfully authenticating at your chosen Identity Provider, you are redirected back to CILogon, where you can find your Certificate Subject that you will need to copy and paste in the next step:

/DC=org/DC=cilogon/C=US/O=University of Texas at Austin/CN=Sample Person A00000

Login to the TACC User Portal and select "Manage Account" under your login name in the top right corner.

Click on the "Manage DNs" on the Manage Account page.

You will be presented with a list of the DNs currently associated with your TACC account and a text field to associate a new DN to your account. Enter the Certificate Subject obtained from CILogon.org in the text field. Click the button to "Link DN". This will associate the new DN with your account. Please allow up to 30 minutes for this change to propagate across all TACC systems.

Step 2: Activate Endpoint and Transfer Files

Step 2: Activate Your Desktop/Laptop as a Globus Endpoint and Transfer Files { #step-2-globus-endpoint }

Now that you have associated the DN with your TACC account and given the DN time to propagate to the systems (up to thirty minutes), you can activate the Globus transfer endpoints and begin transferring files.

Go to https://globus.org and log in.

Upon successful login you, you will be directed to the "File Manager" landing page.

Click on Endpoints.

Click “+ Create new endpoint” and follow the instructions to set up your desktop/laptop as an endpoint.

Enter a Display Name to identify your local endpoint like My Laptop, My Desktop at Home, etcetera and then click Generate Setup Key and click copy to copy the Personal Setup Key.

Download and Install the Globus Connect Personal client.

After install, open the Globus Connect Personal application. A pop menu pops up asking your setup key. Copy the setup key from the previous step to complete the setup.

Click on “File Manager”, and next click on the Collection field. You can choose "Your collections" and click on "My Laptop" to select the created endpoint to your computer.

You can now access the files on your desktop/laptop via Globus.

You can also click on Panels to look at two endpoints at the same time. In the other transfer endpoint, search for "TACC" and select the appropriate allocation storage system (Frontera, Stampede2, Corral, Ranch, etcetera) for the desired data.

After successfully authenticating, you will be redirected back to Globus and you will now be able to access your data on the allocation storage system (Frontera, Stampede2, Corral, Ranch):

Examples

To access "My Data", use the appropriate endpoint and set "Path" to the path of your $WORK location on your system.

  • To find that path, run the following commands in a terminal.

    localhost$ ssh username@host
    
    [authenticate with your password and TACC Token]
    
    login2(#)$ cd $WORK
    login2(#)$ pwd
    
  • The output of the pwd command is your path to your $WORK directory.

  Stampede Frontera Lonestar6
Endpoint Stampede Frontera Lonestar6
Hostname stampede2.tacc.utexas.edu frontera.tacc.utexas.edu ls6.tacc.utexas.edu
  • To access a project in "My Projects" use the appropriate endpoint and set Path to: /path/to/storage/PORTAL/projects/PORTAL-ProjectIDNumber

    /corral-repl/tacc/aci/FRONTERA/projects/FRONTERA-26

You will find the Project ID on your “My Projects” list in the second column.

3DEM

If you are viewing a project, the Project ID will be appended to the URL in your browser as:

https://frontera-portal.tacc.utexas.edu/workbench/data/tapis/projects/frontera.project.FRONTERA-23

To access "Community Data", use the appropriate endpoint and set Path to: /path/to/portal/data/PORTAL/community/

  • /corral-repl/tacc/aci/UTRC/community/
  • /corral-repl/tacc/aci/Frontera/community/

You can transfer files between the selected endpoints.

Once the transfer is initiated, you can see the task id for the transfer being initiated.

Click activity to check status on all the transfers you have initiated.

You will also receive an email to the registered email address once the transfer is completed.