python delete ID from file 2 not find in file1 and save new result of file2 after delete id not find in file 1 in new file call (file3).
I also want to keep delete id from from file 2 in new file call (file 4)
file 1:
ID num seq
CD_000009.1 237 SIRTSAVPSPKGKYYTLNGSK
CD_000009.1 250 SIRTSAVPSPKGKYYTLNGSK
CD_000010.1 126 STPCTTINKVKASGMKAIMMA
CD_000010.1 196 DVYNKIHMGSKAENTAKKLNI
CD_000088.3 198 GGISCVLQDGKVFEKAGVSIS
CD_000170.1 615 TKTILKSSLSKSLQEGLIPGS
CD_000180.2 794 TKFLSQIESDKLALLQVRAIL
CD_000185.1 106 TVDFIRLKSYKNDQSTGDIKV
CD_000225.1 895 EQATTSAQVAKLYRKQSQIQN
CD_000312.2 853 EKFQKINQMVKNSDRVLKRSA
file2
ID num seq
CD_000009.1 603 HPTAQHEKMLKDTWCIEAAAR
CD_000009.1 607 QHEKMLCDTWKIEAAARIREG
CD_000009.1 215 LASGETVAAFKLTEPSSGSDA
CD_000009.1 433 SEAAWKVTDEKIQIMGGMGFM
CD_000009.1 477 ILRLFVALQGKMDKGKELSGL
CD_000010.1 119 LGAGLPISTPKTTINKVCASG
CD_006187.2 201 PVICAGGQDRKSDAAGYPHAT
CD_000088.3 13 LQLGRLSSGPKWLVARGGCGG
CD_000088.3 21 GPCWLVARGGKGGPRAWSQCG
CD_001073867.1 1551 SGVGLGTKGGKASVIVSLTTQ
CD_001073867.1 1562 ASVIVSLTTQKPQDLTPYSGK
CD_000180.2 799 TKFLSQIESDKLALLQVRAIL
CD_000009.1 477 ILRLFVALQGKMDKGKELSGL
CD_000088.3 48 WSQRSAAGRVKRPPGPAGTEQ
CD_065741.3 1026 QTTECLTPESKKQTTSNVASQ
expected result file3: (delete ID from file 2 not find in file1 save in file 3)
CD_000009.1 603 HPTAQHEKMLKDTWCIEAAAR
CD_000009.1 607 QHEKMLCDTWKIEAAARIREG
CD_000009.1 215 LASGETVAAFKLTEPSSGSDA
CD_000009.1 433 SEAAWKVTDEKIQIMGGMGFM
CD_000009.1 477 ILRLFVALQGKMDKGKELSGL
CD_000010.1 119 LGAGLPISTPKTTINKVCASG
CD_000088.3 13 LQLGRLSSGPKWLVARGGCGG
CD_000088.3 21 GPCWLVARGGKGGPRAWSQCG
CD_000180.2 799 TKFLSQIESDKLALLQVRAIL
CD_000009.1 477 ILRLFVALQGKMDKGKELSGL
CD_000088.3 48 WSQRSAAGRVKRPPGPAGTEQ
keep delete ID from file 2 and save it in file 4:
# id from file2 not find in file 1:
CD_006187.2 201 PVICAGGQDRKSDAAGYPHAT
CD_001073867.1 1551 SGVGLGTKGGKASVIVSLTTQ
CD_001073867.1 1562 ASVIVSLTTQKPQDLTPYSGK
CD_065741.3 1026 QTTECLTPESKSQTTSNVASQ
I tried to do these steps but I could not get same what I expected to have :
my code:
# result3 contian id from file 2 in file 1
#result4 contian id from file2 not in file 1
file1 = open(‘file1.txt’).readlines()
with open(‘result_file3.txt’, ‘w’) as result3:
with open(‘result_file4.txt’, ‘w’) as result4:
for line in open(‘file2.txt’):
if line in file1:
result3.write(line)
else:
result4.write(line)
Expert Answer
#!usr/bin/python
file2 = open(“file21.txt”,’r’)
file3 = open(“file31.txt”,’w’)
file4 = open(“file41.txt”,’w’)
for line in file2:
if (len(line) == 0):
continue
if line.split()[0] == “ID”:
continue
found = 0
file1 = open(“file11.txt”,’r’)
for line1 in file1:
if line.split()[0] in line1:
file3.write(line)
found = 1
break
file1.close()
if found == 0:
file4.write(line)
file2.close()
file3.close()
file4.close()
ID num seq
CD_000009.1 603 HPTAQHEKMLKDTWCIEAAAR
CD_000009.1 607 QHEKMLCDTWKIEAAARIREG
CD_000009.1 215 LASGETVAAFKLTEPSSGSDA
CD_000009.1 433 SEAAWKVTDEKIQIMGGMGFM
CD_000009.1 477 ILRLFVALQGKMDKGKELSGL
CD_000010.1 119 LGAGLPISTPKTTINKVCASG
CD_006187.2 201 PVICAGGQDRKSDAAGYPHAT
CD_000088.3 13 LQLGRLSSGPKWLVARGGCGG
CD_000088.3 21 GPCWLVARGGKGGPRAWSQCG
CD_001073867.1 1551 SGVGLGTKGGKASVIVSLTTQ
CD_001073867.1 1562 ASVIVSLTTQKPQDLTPYSGK
CD_000180.2 799 TKFLSQIESDKLALLQVRAIL
CD_000009.1 477 ILRLFVALQGKMDKGKELSGL
CD_000088.3 48 WSQRSAAGRVKRPPGPAGTEQ
CD_065741.3 1026 QTTECLTPESKKQTTSNVASQ
file11.txt:
ID num seq
CD_000009.1 237 SIRTSAVPSPKGKYYTLNGSK
CD_000009.1 250 SIRTSAVPSPKGKYYTLNGSK
CD_000010.1 126 STPCTTINKVKASGMKAIMMA
CD_000010.1 196 DVYNKIHMGSKAENTAKKLNI
CD_000088.3 198 GGISCVLQDGKVFEKAGVSIS
CD_000170.1 615 TKTILKSSLSKSLQEGLIPGS
CD_000180.2 794 TKFLSQIESDKLALLQVRAIL
CD_000185.1 106 TVDFIRLKSYKNDQSTGDIKV
CD_000225.1 895 EQATTSAQVAKLYRKQSQIQN
CD_000312.2 853 EKFQKINQMVKNSDRVLKRSA