FAQs for the Personal Sequence Database
The Personal Sequence Database will be referred to as 'PSD'
- How do I register for a Personal Sequence Database account?
- How do I change my PSD password?
- What if I forgot my PSD password?
- How do I enter a single sequence into my PSD?
- How do I enter multiple sequences into my PSD?
- How do I delete sequences from my PSD?
- How long can my sequences be?
- How many sequences can my PSD contain?
- How do I enroll a sequence in BLASTAgent, the automatic BLAST search system?
- When are the automatic BLAST searches run?
- I get too many BLAST hits. How do I make my BLAST searches more specific?
- How do I change my BLAST parameters?
- I want to keep track of BLAST hits, but I don't want to enroll my sequence in BLASTAgent. How do I do it?
- What's the difference between the 'Initialize BLAST' and 'Search for New Hits' buttons?
- How do I use NCBI Entrez queries in my BLAST searches?
- How do I see the BLAST hits for a sequence?
- How do I see only new BLAST hits?
- How do I view the BLAST hit alignments?
- I get a list of BLAST hits, but when I try to view an alignment I get a "no significant similarity was found" message. What's up?
- How do I make multiple sequence alignments of the BLAST hits?
- How do I retrieve the sequences of BLAST hits?
- What's the PSD Job Queue?
- How do I identify protein domains?
- What are sequence notes?
- What are sequence groups?
- How do I make a sequence group?
- How do I add sequences to a group?
- How do I delete sequences from a group?
- How do I delete a sequence group?
- I’m an advanced user. Can I use SOAP to access the PSD?
- How do I see all my sequence groups?
- Is there a way to submit more than one PSD sequence to a BLAST search?
Answers:
-
To register for the PSD, you can use the PSD registration web page. You will gain immediate access to the PSD after completing the registration form.

The PSD User Menu
-
To change your PSD password, follow these steps. After logging into the PSD from the PSD Home page, follow the User Admin link in the PSD User menu along the left of the page. Towards the bottom of the page, enter the new password into both boxes of the Change Password section and submit the form.
-
If you forget your PSD password, we can reset it to a temporary term that you can change to something more familiar. Use the Reset Password tab on the PSD Home page. To generate a new temporary password, you will need to know the email address you used when registering for the PSD. If you have multiple email addresses, enter each until one is recognized by the PSD. After receiving the email, log–in to the PSD using the username and temporary password. Once logged in, change your password as described above.
-
After you log–in to the PSD, click on the Add A Sequence link. Fill in the boxes for sequence name and description (optional). Paste your sequence into the large sequence box. Make sure to designate your sequence as either DNA or protein. Press the "Add this sequence" button to store the sequence into your PSD. If desired, enter information about the type of BLAST search you want to run for this sequence. You can enter or change the BLAST search parameters at any time after the sequence has been entered into your PSD. Whenever you make a change you must press the "Enter Changes" button for the changes to take effect.

My PSD view
-
To enter multiple sequences at once into the PSD, you will need to have your sequences in FASTA format. FASTA format is of the form:
>identifier description
MDDEGEFVLYLRSLTEMILKFGIERILSSPYPCPSPTISTPATSPSSISPTFASPNGTPN
IASSMYPAWVFSTRYSDRPSAGPRHRKSRKRESTGSSGSSEEEKRPRTAFTGDQLDRLKT
EFRESRYLTEKRRQELAHELGLNESQIKIWFQNKRAKLKKSTSSVPRDRCSSVTPNPHNH
PSIHGGYQLMAQLAKVQARAYMPThe first line for each sequence contains the sequence identifier and an optional sequence description. Every character after the > until the first space character defines the sequence identifier. The maximum length for sequence identifiers in the PSD is 20 characters. The sequence description can be up to 254 characters. The residues of the biological sequence begin on the next line – NO EMPTY LINE. If you have multiple sequences, each sequence entry should be formatted in this way and then all individual sequence entries can be concatenated into one file.
Once your sequences are formatted properly, use the PSD Bulk Load program. Either choose a file from your desktop computer containing the sequences or paste the sequences into the input box, optionally choose a sequence group, and submit the form. Once the sequences are added, you should see a list of all the sequence identifiers and descriptions (if present) from the sequences. You can find a link to "Add Many Sequences" in the PSD navigation menu. Note that you can add single sequences using the PSD Bulk Load program also.

The PSD Navigation Menu
-

Section of Sequence Table View
To delete sequences from your PSD, choose the "View a table of all my sequences" link, or one of your sequence group links from the My PSD page. In the table view, the left column allows you to select any number of sequences from the table. At the bottom of the table, you can press the "Toggle Retrieve/Delete Menu" button to display a menu. One of the choices on this menu is "Delete". So, select the sequences you want to delete and select Delete on the menu and press the "Submit" button. This will delete the selected sequences from your PSD. Note that there will be no confirmation message – the sequences will be deleted immediately.
-
The maximum length for any single sequence in the PSD is 5,000 residues. OSU students, faculty and staff can store sequences up to 2MB.
-
Every PSD user can store up to 5,000 sequences. OSU students, faculty and staff can store up to 25,000 sequences.
-
To enroll a sequence in BLASTAgent, follow these steps. From a table view of sequence entries, click on the "Name" link for a PSD sequence, which will take you to the Edit page. Make sure the "Sequence Type" is correct. Select the type of BLAST search you want to run. Select the E-value filter value. Note that it is usually a good idea to restrict BLAST searches with a fairly low E-value, ie. 1e-06. Select "Yes" from the "Enroll in BLASTAgent?" menu. After you've made your selections, press the "Enter Changes" button. You must press the "Enter Changes" button to save the BLAST parameters. After pressing the "Enter Changes" button, press the "Initialize BLAST" button to register the current BLAST hits. After pressing the "Initialize BLAST" button, you will be directed to the PSD Job Queue. An option for BLAST searches is to use NCBI Entrez terms.
-
Automatic BLAST searches are currently run on a weekly basis. The searches are started around 8 PM (Pacific Time Zone, USA) on Tuesdays. Note that users can initiate searches for new hits manually at any time. On the PSD Edit page, select the "Search for New Hits" button. Pressing this button first sets all current BLAST hits to "old" and then runs a new BLAST search. Any BLAST hit identified that didn't already exist for the PSD sequence entry will be stored and tagged as "new".

PSD Edit View
-
The easiest way to make your BLAST searches more specific is to specify a lower E-value. A good initial E-value is 1e-06. This means that the minimum E-value to accept is 1e-06, which indicates that you would expect to find a hit with an alignment score at least that good 1 in a 1,000,000 times in the database of sequences you searched. See the NCBI Handbook for a discussion of BLAST scoring.
-
You can change your BLAST search parameters at any time from the PSD Edit page. Note that after changing any of the BLAST search parameters, you must press the "Enter Changes" button.
-
To run BLAST searches without enrolling the sequence in the BLASTAgent, follow all the steps outlined above without selecting "Yes" in the "Enroll in BLASTAgent?" menu on the PSD Edit View page.
-
On the PSD Edit View page, you will see three related buttons: "Initialize BLAST" and "Search for New Hits". The only time the "Search for New Hits" button affects anything is if a BLAST search has ever been run for the specific PSD sequence entry. If a BLAST search has been run, then pressing the "Search for New Hits" button will initiate a new BLAST search. All current BLAST hits will be set to "old" and new hits will be stored. This is a way for users to manually execute BLAST searches for new hits without enrolling the sequence in the automatic BLAST tracking system. Pressing the "Initialize BLAST" button erases all current BLAST hits and runs a new BLAST search. Pressing either of these buttons will eventually direct you to the PSD Job Queue.
-
The PSD allows any valid Entrez query to be used with any of the BLAST searches. Note that the user is responsible for designing an Entrez query that is relevant for the particular BLAST search. If you are unfamiliar with NCBI Entrez, see the NCBI Handbook. Including an Entrez query with a BLAST search enables the user to focus the sequence similarity search on a specific group of NCBI database sequences. For example, if the user wants to search for similarities between a query sequence and all sequences from Drosophila melanogaster, the following Entrez query would accomplish the task: Drosophila melanogaster[orgn]. The "[orgn]" term is an Entrez Search Field Qualifier telling the BLAST program to communicate with NCBI and only search against sequences that originate from the indicated organism, in this case Drosophila. As stated earlier, the user can use any valid Entrez query. Valid Entrez queries can be pasted into the "Limit BLAST search by" text box on the PSD Edit view page.
-

BLAST hit links in PSD Table View
There are mainly 2 ways to see the list of BLAST hits for a PSD sequence — as long as a BLAST search has been run previously. First, in a sequence table view, there are links to the list of BLAST hits in the far right column. The link will also include how many BLAST hits were identified in the last search; for example, "All Hits [72]" indicates that 72 BLAST hits were identified. If the sequence is enrolled in the automatic BLAST tracking system, there will be an —AUTO— label present in the "BLAST" column. Additionally, if new BLAST hits were detected in the last BLAST search, a "New Hits [2]" link will be present (with 2 new hits in this example). The second way to see BLAST hits is from the PSD Edit view page. Click on the "See current BLAST hits"
-
There are at several ways to see a list of only the new BLAST hits for a sequence enrolled in the BLAST tracking system. The easiest way is to follow the link to new BLAST hits from the "My PSD" webpage. Alternatively, in a PSD sequence table view there will be a link to the new BLAST hits in the far right column (see above). The link will also indicate how many new BLAST hits were identified. Finally, when viewing the table of BLAST hits, there is a link towards the top of the page to view only the new BLAST hits, if present.

BLAST summary at top of hit table
-
To view an alignment between the query and hit sequence from a BLAST search, navigate to the BLAST hit table. One of the columns in this table is labled "E-value". By clicking on the E-value listed in this column, the user will be presented with a BLAST alignment between the query and hit sequence.
-
It is possible to get a "no significant similarity was found" message when attempting to view the alignment between a BLAST query and hit. This is a consequence of several factors: 1) the PSD doesn't store individual BLAST alignments, 2) the PSD generates query X hit alignments dynamically, 3) the program the PSD uses to generate the alignments (wblast2.cgi) assumes a database size similar to the current NCBI nr database, and 4) when the PSD searches for new BLAST hits, it searches only those NCBI sequences that have been submitted or have changed within the last month. Consider the Karlin–Altschul equation for calculating the E-value of a BLAST hit:

Karlin–Altschul equation
In this equation,
mis the length of the query sequence,nis the length of the database sequence,Kandλ(<– should be lambda) are values based upon the search space size and scoring system, respectively, andSis the raw score of the alignment. As you can see,Eis directly proportional ton, the size of the search sequence. When BLAST searches a database,nis set to the total length of all the sequences in the database (this isn't exactly true, but it's close enough for this discussion). When a BLAST search is initialized for a PSD entry,nis very large because it's based upon a large database potentially containing millions of individual sequences and billions of residues. In contrast, when the PSD runs a BLAST search to find new hits, it searches a much smaller database — one containing only those sequences submitted to NCBI or modified within the last month. Therefore,nis much smaller. Therefore, for a givenλS, which represents the normalized score for the alignment between two sequences, E will increase as the size of the search space,mn, increases. So, let's say that when the PSD looks for new BLAST hits it identifies an alignment whereE= 0.75. This hit passes the minimum E-value threshold set by the PSD user and is added to the PSD as a new hit. The small database contains, for example, 48,548,552 residues. When the PSD user clicks on the link to see the alignment, wblast2.cgi assumes it should be basing its calculation ofEon the NCBI nr database, which contains, for example, 1,071,028,609 residues. Therefore, wblast2.cgi will calculateE=∼ 16.5. By default, wblast2.cgi will not display alignments whereE< 10, thus for this alignment the user would see the "no significant similarity was found" message. In general, the "no significant similarity was found" message is displayed only when the user is attempting to view the alignment for a marginal BLAST hit. -
Users can generate multiple sequence alignments of BLAST hits. The left column of the BLAST table contains check boxes that can be used to select specific BLAST hits. The check boxes in combination with the BLAST action menu at the bottom of the BLAST table can be used for several purposes:

BLAST Action Menu
- Retrieve the BLAST hits in FASTA format
- Retrieve the BLAST hits using the CGRB SeqTool
- Generate multiple sequence alignment of BLAST hits
- Generate multiple sequence alignment of BLAST hits and PSD sequence entry
Note that there may be more than one option for multiple alignment algorithm, depending on the PSD sequence type and the specific type of BLAST search. Also, you will not be able to include the PSD sequence if its type is different than the sequence type of the BLAST hits. If including the PSD sequence is possible, the checkbox for doing so will be displayed below the BLAST hit action menu.

Checkbox to include PSD sequence
-
To retrieve BLAST hit sequences, use the checkboxes in the left column of the BLAST table and the BLAST hit action menu at the bottom of the BLAST table. Users can retrieve sequences in FASTA format or have them piped directly to the CGRB SeqTool (CGRB–users only).
-
Many of the capabilities of the PSD rely on computationally-intensive software. To manage this load, the CGRB has developed a custom queuing system — the PSD Job Queue.

PSD Job Queue
The PSD Job Queue is designed to distribute PSD jobs to nodes of the CGRB "Genome" cluster. The PSD Job Queue also allows users to monitor the progress of jobs and to access the results from their last 10 jobs submitted to the Queue. Every job is assigned a unique identifier, which is listed in the "Job ID" column. The identifier for a specific job is displayed as a hyperlink to the results as soon as they are available. Additionally, results from the last 10 jobs remain available and can be accessed by clicking on the Job ID hyperlink. Short descriptions in the "Job Type" column allow users to see what types of jobs they've recently submitted to the queue. The "Submitted" column lists the approximate time the job was run (Pacific Time Zone, USA) and the "Status" column displays the status of the jobs. In this example, all the jobs have finished, but users can expect to see other status values like Submitted and Running. The PSD Job Queue is set to automatically reload approximately every 10 seconds, but the user can reload the page as often as desired.
-
Users can generate and store data pertaining to potential protein domains contained within PSD sequence entries. To analyze a PSD sequence for protein domains, navigate to the PSD Edit view and scroll to mid-page. In the "Protein Domains" section, press the "Find Domains" button.

PSD protein domains
After several seconds, a summary of the protein domains identified will appear. The summary includes a graphical representation of potential domains and their locations on the PSD sequence and text output summarizing the domain search. Users can view the entire report (in text) by clicking on the link provided below the report summary.
-
PSD users can attach formatted information to any sequence entry. These "Notes" can be of 3 different formats. The simplest note is hand–entered as text. This type of note can be created by clicking on the "enter a new note" link at the bottom of the "Notes and Data Collected" table towards the bottom of the PSD Edit View.

PSD notes table
Another type of note is created whenever a user generates a multiple sequence alignment of BLAST hits. A short form towards the top of a multiple sequence alignment allows the user to give an alignment a title and to attach it to the sequence from which the BLAST hits were identified.

PSD Clustalw alignment
The last type of note contains a BLAST alignment. As outlined above, when a user clicks on an E–value link in a BLAST table, the BLAST alignment is dynamically generated and the user is presented with graphical and text representations of the alignment. As with a PSD ClustalW alignment, the user is provided a form with which they can give the alignment a name and attach it to the PSD sequence.

PSD BLAST alignment
A note remains attached to a PSD sequence until the user deletes it. Deleting notes is easy: in the Notes and Data Collected table use the checkboxes to select the notes and press the "Delete Selected Notes" button. To view a note, click on the link in the "Title" column of the table.
-
"Sequence groups" allow PSD users to organize sequence entries. A sequence group is a collection of PSD sequences that have some higher association with one another. The rules of the association are defined by the user. For example, a user might define a sequence group as "those sequences that contain a S/T kinase domain". The rules of the group can be as loose or restrictive as the user finds useful. Sequence groups can contain any mixture of DNA or protein sequences. A PSD sequence entry can be a member of only one sequence group.
-
To create a PSD sequence group, navigate to the "My PSD" page. The navigation menu towards the bottom contains a link to "Add – A Group". Click on this link and fill-in the two form boxes: Group Name and Group Description. Push the "Create New Group" button to create the group.
-
To add a PSD sequence entry to an existing PSD sequence group, navigate to a table view of your PSD sequences. Use the checkboxes in the left column to select the sequences you want to add to a group. After selecting the sequences press the "Toggle Group Menu" button at the bottom of the table. Choose "Add", select the group and press the "Submit" button.

PSD Group Menu
-
To delete a PSD sequence entry from a sequence group, follow steps outlined for adding a PSD sequence entry to an existing group, but choose "Remove" from the Group Menu instead of "Add". The PSD sequence entry will be removed from the specified group and transferred to group "All"
-
To delete a sequence group or change the name or description of a sequence group, navigate to the "PSD Sequence Groups" page. Your sequence groups will be listed in alphabetical order. Click the "SGA" link after the group description to go to the PSD Sequence Group Administration (SGA) page.

PSD Sequence Group Administration
Here you can change the name and description of a sequence group by modifying the appropriate text box in the form and pressing the "Change Group" button. To delete the sequence group, press the "Delete This Group" button. Note that the actual sequence entries will not be deleted. All PSD sequence entries from the deleted group will be transferred to group "All".
-
Yes. To use SOAP to access the PSD, please refer to the PSD SOAP page.
-
To view all your sequence groups, navigate to the "PSD Sequence Groups" page: My → Groups. All of your sequence groups will be listed in alphabetical order. On this page you will see links to download, batch-BLAST or browse the sequences within the group. You will also find links to the Sequence Group Administration (SGA) page for each sequence group.
-
Yes, the PSD offers a batch-BLAST feature. Once sequences are organized into sequence groups, all the sequences within a sequence group can be subjected to the same type of BLAST search. Note that to obtain meaningful results all the sequences should be the same type: either DNA or protein. batch-BLAST jobs are submitted to the job queue at a low priority, so it may take a relatively longer time to complete them all. You can follow the progress of jobs using the PSD job queue. To find the batch-BLAST feature, follow the "batch-BLAST" link on the PSD Sequence Groups page.



