vendredi 29 mai 2015

Reading mulitple data from a text file

I am trying to read two pieces of data from a single text file. Here is how the file looks:

PaxHeader/data-science000755 777777 777777 00000000262 12525446741 015207 xustar00armourp000000 000000 18 gid=1050026054
17 uid=488147323
20 ctime=1431779590
20 atime=1431779720
38 LIBARCHIVE.creationtime=1431719347
23 SCHILY.dev=16777218
24 SCHILY.ino=110226037
18 SCHILY.nlink=4
data-science/000755 Äâ{Ä>ñ F00000000000 12525446741 013547 5ustar00armourp000000 000000 data-science/PaxHeader/merged-sensor-files.csv000644 777777 777777 00000000214 12525446724 021646 xustar00armourp000000 000000 18 gid=1050026054
17 uid=488147323
20 ctime=1431779590
20 atime=1431779720
23 SCHILY.dev=16777218
24 SCHILY.ino=110226038
18 SCHILY.nlink=1
data-science/merged-sensor-files.csv000644 Äâ{Ä>ñ F00016452751 12525446724 020164 0ustar00armourp000000 000000 MTU, Time, Power, Cost, Voltage
MTU1,05/11/2015 19:59:06,4.102,0.62,122.4
MTU1,05/11/2015 19:59:05,4.089,0.62,122.3
MTU1,05/11/2015 19:59:04,4.089,0.62,122.3
MTU1,05/11/2015 19:59:06,4.089,0.62,122.3
MTU1,05/11/2015 19:59:04,4.097,0.62,122.4
MTU1,05/11/2015 19:59:03,4.097,0.62,122.4
MTU1,05/11/2015 19:59:02,4.111,0.62,122.5
MTU1,05/11/2015 19:59:03,4.111,0.62,122.5
MTU1,05/11/2015 19:59:02,4.104,0.62,122.5
MTU1,05/11/2015 19:59:01,4.090,0.62,122.4
MTU1,05/11/2015 19:59:00,4.093,0.62,122.4
MTU1,05/11/2015 19:58:59,4.112,0.62,122.5
data-science/PaxHeader/weather.json000644 777777 777777 00000000214 12525446741 017610 xustar00armourp000000 000000 18 gid=1050026054
17 uid=488147323
20 ctime=1431779590
20 atime=1431779720
23 SCHILY.dev=16777218
24 SCHILY.ino=110226039
18 SCHILY.nlink=1
data-science/weather.json000644 Äâ{Ä>ñ F00000000766 12525446741 016112 0ustar00armourp000000 000000 {"1431388800":"75.4","1431392400":"73.2","1431396000":"72.1","1431399600":"71.0", "1431403200":"70.7","1431406800":"69.6","1431410400":"69.0","1431414000":"68.8","1431417600":"69.2","1431421200":"67.9","1431424800":"68.6","1431428400":"68.7","1431432000":"72.1","1431435600":"76.2","1431439200":"80.1","1431442800":"80.7","1431446400":"80.9","1431450000":"83.3","1431453600":"84.5","1431457200":"85.1","1431460800":"87.0","1431464400":"84.2","1431468000":"84.4","1431471600":"83.0","1431475200":"81.1"}

So basically I want to get the values like below

MTU, Time, Power, Cost, Voltage
    MTU1,05/11/2015 19:59:06,4.102,0.62,122.4

as separate pandas frame and then another frame for the below dictionary.

{"1431388800":"75.4","1431392400":"73.2","1431396000":"72.1","1431399600":"71.0", "1431403200":"70.7","1431406800":"69.6","1431410400":"69.0","1431414000":"68.8","1431417600":"69.2","1431421200":"67.9","1431424800":"68.6","1431428400":"68.7","1431432000":"72.1","1431435600":"76.2","1431439200":"80.1","1431442800":"80.7","1431446400":"80.9","1431450000":"83.3","1431453600":"84.5","1431457200":"85.1","1431460800":"87.0","1431464400":"84.2","1431468000":"84.4","1431471600":"83.0","1431475200":"81.1"}

I can manually cut and copy paste these two portions in separate files and read in, but I want to automate it using regex. I think I know how we can regex it, but while reading the whole file as a text, I am seeing the following values.

So I did this:

f=open("file",'r').read()
print(f)

'PaxHeader/data-science\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00000755 \x00777777 \x00777777 \x0000000000262 12

These are the first few lines of file. Not sure why I see \x00 a lot. Is it becauuse of some space or some non -recognised character?

Any idea how to get the desired result?

Thanks

Aucun commentaire:

Enregistrer un commentaire