Questions tagged [biopython]

Biopython is a set of freely available tools for biological computation written in Python. Please only use this tag for issues relating to the Biopython suite of tools.

0
votes
0answers
6 views

How to remove an invalid sequence from a Genbank file containing multiple genome sequences based on ID

I have a ~3 GB Genbank file containing complete Genbank annotations for ~20,000 bacterial genome sequences. My goal is to use BioPython to parse these sequences, and write individual fasta files for ...
-5
votes
0answers
42 views

Python3 compare two neighboring string elements in a list and remove char of string [closed]

what I want: remove ")" symmetrically, preferably starting from the end of file. e.g.) if elem[-1] ends with ")" and elem[-2] starts with "(" remove these "(" and ")" I want to compare neighboring ...
0
votes
0answers
32 views

How to create a loop to test multiple queries in PubMed using BioPython?

I'm currently trying to form a loop in which I can test different queries in PubMed. I used BioPython's Entrez and Medline to set up one query (see "TERM"). However, I'm not very proficient in ...
0
votes
0answers
15 views

How to assess the specific sequence in a blast_record

The following codes will print out all the sequences in the record. How do i assess the specific one? from Bio.Blast import NCBIWWW from Bio.Blast import NCBIXML fly=0 grape=0 with open('input.txt',...
1
vote
0answers
22 views

Biopython globalcc returns empty list

I am using Biopython's globalcc function to align a standardized version of a word and a dialectal version. Unfortunately, sometimes the alignment fails for no obvious reasons and an empty list is ...
0
votes
3answers
83 views

How to write a string algorithm

given a FASTA text file (Rosalind_gc.txt), I am supposed to go through each DNA record and identify the percentage (%) of Guanine-Cytosine (GC) content. Example of this is : Sample Dataset: >...
0
votes
1answer
40 views

Extracting gene location from FASTA file

I am trying to extract the gene location from a fasta file using BioPython but the function .location is not working. I would like to avoid regex because this function will have to work with different ...
1
vote
4answers
70 views

Populating a dictionary with multiple lines as one string

I have a file with multiple lines in FASTA format, which I want to break up in pieces and populate a dictionary with these pieces. >piece_1 Lorem ipsum dolor sit amet consectetur adipiscing elit. ...
-1
votes
2answers
52 views

How To Convert CSV to GFF3?

I want to convert a .csv file to a GFF3 file. The csv file contains annotation data. I know that I should parse the .csv file and then write the .gff file, but I dont know the complete code
0
votes
0answers
27 views

Running clustalw on google platform with error in generating .aln file in ubuntu

I was trying to run clustalw from Biopython library of python3 on Google Cloud Platform, then generate a phylogenetic tree from the .dnd file using the Phylo library. The code was running perfectly ...
-1
votes
1answer
67 views

how to share anaconda packages with the user of HTTP server

I am user of ubuntu and I run many scripts written with python3 which was installed through anaconda. All modules that I need have been installed there previously i.e. biopython. However, I can't ...
0
votes
0answers
34 views

How to deal with gaps during translation with biopython

I need to translate aligned DNA sequences with biopython from Bio.Seq import Seq from Bio.Alphabet import generic_dna seq = Seq("tt-aaaatg") seq.translate() Running this script will get error: Bio....
0
votes
1answer
22 views

Searching for variable amino acid motif in fasta dataset

I need to find out in which proteins of my dataset is this aminoacid motif: PoXGXXHyXHy. I'm using biopython and python 2.7, but I'm not exactly bioinformatician and I got stucked. How do I make ...
0
votes
0answers
37 views

How to install Biopython on Mac after all options in tutorial have failed?

PyCharm installed and running. Biopython package downloaded and opened in PyCharm. pip install biopython typed in to terminal to check, SyntaxError: invalid syntax returned. Running MacOS Mojave 10....
3
votes
1answer
56 views

Python 3.x - How to efficiently split an array of objects into smaller batch files?

I'm fairly new to Python and I'm attempting to split a textfile where entries consists of two lines into batches of max. 400 objects. The data I'm working with are thousands of sequences in FASTA ...
0
votes
0answers
28 views

how to extract substring form a sequence using positions which are already in a tsv file

I'm trying to write a code that automatically extracts form a tsv file containing the start and end positions. And I want to use these positions to extract the substring form a fasta sequence. I tried ...
0
votes
1answer
51 views

How to get the sequence counts (in fasta) with conditions using python?

I have a fasta file (fasta is a file in which header line starts with > followed by a sequence line corresponding to that header). I want to get the counts for sequences matching TRINITY and total ...
1
vote
0answers
44 views

How can I run a pubmed query in a Django app deployed on Elastic Beanstalk?

I wrote a Django app to query the Pubmed database using the Entrez tool provided by the Biopython package. Everything runs smoothly local. After deploying on AWS Elastic Beanstalk I get a "Permission ...
1
vote
0answers
59 views

SeqIO.parse Biopython - which file format should I specify?

I am trying to extract information from a multi-fasta file (e.g. C/G/A/T count, CG%) using biopython. I keep running into trouble when I try to iterate over the file for each fasta sequence - I can ...
1
vote
1answer
58 views

How to use EPOST and than use ESEARCH in biopython?

I have a lit of gene ids: id_list = ["19304878", "18606172", "16403221", "16377612", "14871861", "14630660"] how I can take just the nucleotide sequence of this genes using EPOST and ESEARCH in ...
1
vote
1answer
33 views

Limiting the number of hits in a Biopython NCBIWWW Search

I'm working on trying to automate some BLAST searches. I need to pick up only the top three results from the BLAST results, however the parameter hitlist_size doesn't seem to be limiting my searches ...
2
votes
0answers
42 views

PDB files to angles to again converting in PDB format

I have PDB file then i am converting it in file which gives dihedral angles.But now i had modified some angles and need to convert these modified angles again to a new pdb file.Is there any library ...
1
vote
0answers
16 views

How to intialize a PDB.DSSP object correctly

I am trying to get DSSP of a PDB file but python is throwing File not found. I downloaded dssp from window's Ubuntu. Calling which dssp in the ubuntu terminal gives'/usr/bin/dssp' Calling dssp with ...
0
votes
0answers
19 views

PDB file parser not being created Python

I am trying to declare a PDB parser object however the object does not get created. Im showing the method where this is a problem I've downloaded BIOpython and scikit. The line that causes problems ...
0
votes
1answer
133 views

How can you analyse fna.gz in python?

I want to return the nth basepair given my fna.gz genome input. Theoretically it would work like this: allele = genome[14325] print(allele) #: G This is the code I have now: from Bio import SeqIO ...
1
vote
0answers
47 views

Trying to understand how importing modules (BioPython) in Sublime Text works

I am a beginner with coding in general. I have been enjoying using Sublime Text 3, but I've run into a problem that I can't figure out. I want to use the BioPython module in Sublime Text 3, but when I ...
0
votes
0answers
38 views

Multiple server resquest and file writing without waiting for answer

I'm doing a protein prediction program based on genomic data and at some point I need to send multiple request to a server and write the results in a file. I have around 100 request and file writing ...
0
votes
1answer
70 views

Show only dna alignment score in biopython

I have DNA sequence data. For instance, X="ACGGGT" Y="ACGGT" I want to know the alignment score, thus I used biopython pairwise2 function. For example, from Bio import pairwise2 from Bio.pairwise2 ...
0
votes
1answer
52 views

How do I blast a local query against a local database in python/biopython?

First of all, I want to come clean that I am a super beginner in programming. I have 2 zip files (containing one database each) and 4 fasta files (three containing a protein sequence each and one ...
0
votes
1answer
40 views

Bio.Motifs throws KeyError 'd'

I'm using Biopython to process some NGS data. But I meet a strange problem when I use motif module in Biopython. Here is the code. frame = pd.DataFrame({'Spacer': seqs1.values()}, index=seqs.keys()) ...
0
votes
1answer
72 views

biopython in anaconda, not jupyter notebook

I am trying to install biopython in Jupyter Notebook, Anaconda, Ubuntu 16.04. I follow the procedure in biopython website and it runs on python. Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, ...
0
votes
0answers
36 views

Transforming a string DNA in to a Seq object in Biopython

There are any way that i can transform a user input string in to a byopython Seq object? I try a lot of things and search in the google but not a answer. Thanks
1
vote
1answer
41 views

How to download _full_ RefSeq record using Efetch?

I have a problem downloading a full record from Nucleotide db. I use: from Bio import Entrez from Bio import SeqIO with Entrez.efetch(db="nuccore", rettype="gb", retmode="full", id="NC_007384") as ...
0
votes
0answers
31 views

Improve performance on counting of %letters in dic values over a loop (python)

I have tree entities in the process : A dictionnary (chimera) that contains one key (the sequence name) and a huge dna sequence composed of X ACGT letters: >>> chimera {'Chimera_seq': ...
0
votes
1answer
19 views

Convert e-utils command to equivalent in Bio.Entrez (BioPython)

I'm having some trouble to connect each command using BioPython. Can someone help me to transform this command line to the equivalent using BioPython? esearch -db assembly -query "GCF_002514765.1" | ...
0
votes
1answer
57 views

Get genome from NCBI with biopython

Python newby here. I want to download the genome sequence for genome (NC_007779.1) using BioPython packages Entrez and SeqIO. So far, I have this code: from Bio import Entrez from Bio import SeqIO ...
1
vote
1answer
27 views

Iterating through a series of GenBank genes and appending each gene's features to a list returns only the last gene

I'm having a problem with my code. I'm trying to iterate through the genbank file's list of genes using BioPython. Here's what it looks like: class genBank: gbProtId = str() gbStart = int() ...
0
votes
2answers
37 views

Substring multifasta file using python

I am trying to extract sequences from a multifasta file from position 2 to 8 (seeds of microRNAs). To do this I have written a small python script. The script works but I couldn't write an output file....
1
vote
1answer
69 views

How to print the first few records using SeqIO from Biopython

I have a fasta file that has several hundred records but I'm trying to return a table with just the first 20 records (record description, AA length, and name). My code is not working and I would ...
0
votes
0answers
38 views

How do I import biopython from php

I'm developing a web application in a remote server and I cannot find a way to import Biopython from a php script. First I got this error: ImportError: No module named Bio Then, to solve this issue ...
3
votes
1answer
70 views

Directly calling SeqIO.parse() in for loop works, but using it separately beforehand doesn't? Why?

In python this code, where I directly call the function SeqIO.parse() , runs fine: from Bio import SeqIO a = SeqIO.parse("a.fasta", "fasta") records = list(a) for asq in SeqIO.parse("a.fasta", "...
1
vote
1answer
16 views

Biopython Genbank.Record : trying to understand source code

I am writing a csv reader to generate Genbank files to capture annotations with sequence. First I used a Bio.SeqRecord and got correctly formatted output but the SeqRecord class lacks fields that I ...
0
votes
1answer
49 views

Retrieve data from GenBank with Bio.Entrez module

I am trying to solve one of the Rosalind challenges and I can't seem to find a way to retrieve data, within a specific time frame. http://rosalind.info/problems/gbk/ Do/How Do I modify Entrez....
2
votes
2answers
57 views

Extracting gene sequences from FASTA File?

I have the following code that reads a FASTA file with 10 gene sequences and return each sequences as a matrix. However the code seems to be missing on the very last sequence and I wonder why? file=...
0
votes
1answer
96 views

problems with pairwise blast in biopython

I try to run a pairwise blast between two sequences within a python script and using the biopython blast tools. I have no problems running a blast against a local database by adding parameter db='...
0
votes
1answer
51 views

Utilizing biopython NcbitblastnCommandline to extract Nonsynonymous substitutions

I'm trying to use NcbitblastnCommandline to blast a protein query against a nucleotide sequence, and then report the hit. The program ran without error. However, in the result, my query sequence ...
2
votes
1answer
73 views

Replacing all of instances of a letter in a column of a FASTA alignment file

I am writing a script which can replace all of the instances of an amino acid residue in a column of a FASTA alignment file. Using AlignIO, I just can read an alignment file and extract information ...
0
votes
1answer
49 views

Entrez (biopython): how to restrict the term search to a specific journal? (PubMed)

I want to obtain all the articles in a specific journal that are related to a specific term/topic. I am trying to do so through PubMed using the Entrez package contained in Biopython. The ...
0
votes
0answers
36 views

How to not truncate my protein sequence output

Using biopython, I've parsed a fasta file to get a list of protein sequences. However, I can only get the truncated version, so when I write them in excel, I do not have the whole protein sequence (...
1
vote
1answer
57 views

How to generate IUPAC code from nucleotides?

I want to find the IUPAC equivalent to 2 different nucleotides. Example: I have A and C and I want M. Or: I have R and T and I want D. Is there a method for doing that in Biopython? (It sound easy ...