[Tutor] string delimiters

Discussion:

richard kappler

2015-06-03 19:10:42 UTC

for formatting a string and adding descriptors:

test = 'datetimepart1part2part3the_rest'
newtest = 'date=' + test[0:4] + ' time=' + test[4:8] + ' part1=' +
test[8:13] + ' part2=' + test[13:18] + ' part3=' + test[18:23] + ' the
rest=' + test[23:]

and while this may be ugly, it does what I want it to do.

The question is, if instead of 'the_rest' I have ']the_rest' and sometimes
there's not just one. how do I handle that?

In other words, this script will iterate over numerous lines in a file, and
each one is identical up to the delimiter before the rest, and sometimes
there is only one, sometimes there is two, they vary in length.

Can I stop using position numbers and start looking for specific characters
(the delimiter) and proceed to the end (which is always a constant string
btw).

regards, Richard

--
Windows assumes you are an idiot…Linux demands proof.
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.pyt

Alan Gauld

2015-06-03 19:53:14 UTC

Permalink

Post by richard kappler
test = 'datetimepart1part2part3the_rest'

If this is really about parsing dates and times have
you looked at the datetime module and its parsing/formatting
functions (ie strptime/strftime)?

Post by richard kappler
Can I stop using position numbers and start looking for specific characters
(the delimiter) and proceed to the end (which is always a constant string
btw).

The general answer is probably to look at regular expressions.
But they get messy fast so usually I'd suggest trying regular
string searches/replaces and splits first.

But if your pattern is genuinely complex and variable then
regex may be the right solution.

But if its dates check the strptime() functions first.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

richard kappler

2015-06-03 20:13:03 UTC

Permalink

I was trying to keep it simple, you'd think by now I'd know better. My
fault and my apology.

It's definitely not all dates and times, the data and character types
vary. This is the output from my log parser script which you helped on the
other day. there are essentially two types of line:

Tue Jun 2 10:22:42 2015<usertag1
name="SE">SE201506012200310389PS01CT1407166S0011.40009.00007.6IN
000000000018.1LB000258]C10259612019466862270088094]L0223PDF</usertag1>
Tue Jun 2 10:22:43 2015<usertag1
name="SE">SE0389icdim01307755C0038.20033.20012.0IN1000000000
0032]C10259612804038813568089577</usertag1>

I have to do several things:
the first type can be of variable length, everything after the ] is an
identifier that I have to separate, some lines have one, some have more
than one, variable length, always delimited by a ]
the second type (line 2) doesn't have the internal datetime stamp, so I
just need to add 14 x's to fill in the space where that date time stamp
would be.

and finally, I have to break these apart and put a descriptor with each.

While I was waiting for a response to this, I put together a script to
start figuring things out (what could possibly go wrong?!?!?! :-) )

and I can't post the exact script but the following is the guts of it:

f1 = open('unformatted.log', 'r')
f2 = open('formatted.log', 'a')

for line in f1:
for tag in ("icdm"):
if tag in line:
newline = 'log datestamp:' + line[0:24] # + and so on to format
the lines with icdm in them including adding 14 x's for the missing
timestamp
f2.write(newline) #write the formatted output to the new log
else:
newline = 'log datestamp:' + line[0:24] # + and so on to format
the non-icdm lines
f2.write(newline)

The problems are:
1. for some reason this iterates over the 24 line file 5 times, and it
writes the 14 x's to every file, so my non-icdm code (the else:) isn't
getting executed. I'm missing something basic and obvious but have no idea
what.
2. I still don't know how to handle the differences in the end of the
non-icdm files (potentially more than identifier ] delimited as described
above).

regards, Richard

Post by Alan Gauld

Post by richard kappler
test = 'datetimepart1part2part3the_rest'

If this is really about parsing dates and times have
you looked at the datetime module and its parsing/formatting
functions (ie strptime/strftime)?
Can I stop using position numbers and start looking for specific

Post by richard kappler
characters
(the delimiter) and proceed to the end (which is always a constant string
btw).

The general answer is probably to look at regular expressions.
But they get messy fast so usually I'd suggest trying regular
string searches/replaces and splits first.
But if your pattern is genuinely complex and variable then
regex may be the right solution.
But if its dates check the strptime() functions first.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
http://www.flickr.com/photos/alangauldphotos
_______________________________________________
https://mail.python.org/mailman/listinfo/tutor

richard kappler

2015-06-03 20:23:37 UTC

Permalink

hold the phone!!!!

I have no idea why it worked, would love an explanation, but I changed my
previous test script by eliminating

for tag in ("icdm"):

and changing

if tag in line

to

if 'icdm' in line:

and it works perfectly! It only iterates over the file once, and the else
executes so both types of lines format correctly except for the multiple
identifiers in the non-icdm lines

I could still use some help with that bit, please.

regards, Richard

Post by richard kappler
I was trying to keep it simple, you'd think by now I'd know better. My
fault and my apology.
It's definitely not all dates and times, the data and character types
vary. This is the output from my log parser script which you helped on the
Tue Jun 2 10:22:42 2015<usertag1
name="SE">SE201506012200310389PS01CT1407166S0011.40009.00007.6IN
000000000018.1LB000258]C10259612019466862270088094]L0223PDF</usertag1>
Tue Jun 2 10:22:43 2015<usertag1
name="SE">SE0389icdim01307755C0038.20033.20012.0IN1000000000
0032]C10259612804038813568089577</usertag1>
the first type can be of variable length, everything after the ] is an
identifier that I have to separate, some lines have one, some have more
than one, variable length, always delimited by a ]
the second type (line 2) doesn't have the internal datetime stamp, so I
just need to add 14 x's to fill in the space where that date time stamp
would be.
and finally, I have to break these apart and put a descriptor with each.
While I was waiting for a response to this, I put together a script to
start figuring things out (what could possibly go wrong?!?!?! :-) )
f1 = open('unformatted.log', 'r')
f2 = open('formatted.log', 'a')
newline = 'log datestamp:' + line[0:24] # + and so on to
format the lines with icdm in them including adding 14 x's for the missing
timestamp
f2.write(newline) #write the formatted output to the new log
newline = 'log datestamp:' + line[0:24] # + and so on to
format the non-icdm lines
f2.write(newline)
1. for some reason this iterates over the 24 line file 5 times, and it
writes the 14 x's to every file, so my non-icdm code (the else:) isn't
getting executed. I'm missing something basic and obvious but have no idea
what.
2. I still don't know how to handle the differences in the end of the
non-icdm files (potentially more than identifier ] delimited as described
above).
regards, Richard

Post by Alan Gauld

Post by richard kappler
test = 'datetimepart1part2part3the_rest'

Post by richard kappler
characters
(the delimiter) and proceed to the end (which is always a constant string
btw).

The general answer is probably to look at regular expressions.
But they get messy fast so usually I'd suggest trying regular
string searches/replaces and splits first.
But if your pattern is genuinely complex and variable then
regex may be the right solution.
But if its dates check the strptime() functions first.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
http://www.flickr.com/photos/alangauldphotos
_______________________________________________
https://mail.python.org/mailman/listinfo/tutor

--
Windows assumes you are an idiot…Linux demands proof.

Alan Gauld

2015-06-03 20:31:30 UTC

Permalink

Post by richard kappler
hold the phone!!!!
I have no idea why it worked, would love an explanation, but I changed
my previous test script by eliminating

This loops over the string assigning the characters i,c,d and m to tag
This checks if the 4 character string 'icdm' is in the line.
Completely different.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

richard kappler

2015-06-03 20:37:52 UTC

Permalink

figured that out from your last post, and thank you, now I understand how
that works. I thought I was looking for the entire string, not each
character. That bit all makes sense now.

A descriptor is, for example, for the following part of a string '0032.4'
the descriptor would be weight, so the formatted output would be
weight:0032.4, and so on. each bit of the strings in the post where I
provided the two examples has specific meaning, and I have to parse the
lines so that I add a descriptor (okay, bad word, what should I use?) to
each bit of data from the line.

At the moment I'm doing it by position, which is, I'm sure, a really bad
way to do it, but I need this quickly and don't know enough to know if
there is a better way. I have to parse and output the entire line, but
there are, as I said, two 'types' of string and some are variable in
length. I'm eager for direction. What other information would better help
explain?

regards, Richard

Post by Alan Gauld

Post by richard kappler
hold the phone!!!!
I have no idea why it worked, would love an explanation, but I changed my
previous test script by eliminating

Alan Gauld

2015-06-03 20:29:48 UTC

Permalink

So why not just split by ']'?

identifiers = line.split(']')[1:] # lose the first one

Post by richard kappler
and finally, I have to break these apart and put a descriptor with each.

Nope. I don't understand that.
Break what apart? and how do you 'put a descriptor with each'?
What is a descriptor for that matter?!

Post by richard kappler
While I was waiting for a response to this, I put together a script to
start figuring things out (what could possibly go wrong?!?!?! :-) )
f1 = open('unformatted.log', 'r')
f2 = open('formatted.log', 'a')
newline = 'log datestamp:' + line[0:24] # + and so on to
format the lines with icdm in them including adding 14 x's for the
missing timestamp
f2.write(newline) #write the formatted output to the new log
newline = 'log datestamp:' + line[0:24] # + and so on to
format the non-icdm lines
f2.write(newline)

So this checks each line for the 4 tags: i,c,d and m.
if the tag is in the line it does the if clause, including writing to f2
If the tag is not in the line it does the else which also writes to f2.
So you always write 4 lines to f2. Is that correct?

Post by richard kappler
1. for some reason this iterates over the 24 line file 5 times, and it
writes the 14 x's to every file, so my non-icdm code (the else:) isn't
getting executed. I'm missing something basic and obvious but have no
idea what.

That's not what I'd expect. I'd expect it to write 4 lines out for every
input line.
What gets written depending on however many of the 4 tags are found in
the line.

Since we only have partial code we don't know what the formatted lines
look like.

Post by richard kappler
2. I still don't know how to handle the differences in the end of the
non-icdm files (potentially more than identifier ] delimited as
described above).

I'm not clear on this yet either.
I suspect that once you clarify what you are trying to do you will know
how to do it...
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Alex Kleider

2015-06-03 21:16:09 UTC

Permalink

On 2015-06-03 12:53, Alan Gauld wrote:
...

Post by Alan Gauld
If this is really about parsing dates and times have
you looked at the datetime module and its parsing/formatting
functions (ie strptime/strftime)?

I asssume strftime gets its name from 'string from time.'
What about strptime? How did that get its name?

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

richard kappler

2015-06-03 21:35:54 UTC

Permalink

Perhaps the better way for me to have asked this question would have been:

How can I find the location within a string of every instance of a
character such as ']'?

regards, Richard

Post by Alex Kleider
...

Post by Alan Gauld
If this is really about parsing dates and times have
you looked at the datetime module and its parsing/formatting
functions (ie strptime/strftime)?

I asssume strftime gets its name from 'string from time.'
What about strptime? How did that get its name?
_______________________________________________
https://mail.python.org/mailman/listinfo/tutor

--
Windows assumes you are an idiot…Linux demands proof.
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
ht

Peter Otten

2015-06-03 22:00:57 UTC

Permalink

Post by richard kappler
How can I find the location within a string of every instance of a
character such as ']'?

import re
s = "alpha]beta]gamma]delta"
[m.start() for m in re.finditer(r"]", s)]

[5, 10, 16]

But do you really need these locations? Why not just split() as in

Post by richard kappler

s.split("]")

['alpha', 'beta', 'gamma', 'delta']

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Cameron Simpson

2015-06-04 00:54:27 UTC

Permalink

Post by richard kappler
How can I find the location within a string of every instance of a
character such as ']'?

With the str.find method!

s = 'a]b]c'
pos = s.find(']')
while pos >= 0:
print("pos =", pos)
pos = s.find(']', pos + 1)

Obviously you could recast that as a generator funciton yielding positions for
general purpose use.

Cheers,
Cameron Simpson <***@zip.com.au>

Serious error.
All shortcuts have disappeared.
Screen. Mind. Both are blank.
- Haiku Error Messages http://www.salonmagazine.com/21st/chal/1998/02/10chal2.html
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Mark Lawrence

2015-06-03 22:13:24 UTC

Permalink

Post by Alex Kleider
...

Post by Alan Gauld
If this is really about parsing dates and times have
you looked at the datetime module and its parsing/formatting
functions (ie strptime/strftime)?

I asssume strftime gets its name from 'string from time.'
What about strptime? How did that get its name?

'f' for format, 'p' for parse, having originally come from plain old C.
More here
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

As Alan has hinted at earlier in this thread, if you're using dates
and/or times it's certainly far easier to use the built-in functions
rather than try to roll your own.
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Alex Kleider

2015-06-03 22:19:10 UTC

Permalink

Post by Mark Lawrence
'f' for format, 'p' for parse, having originally come from plain old
C. More here
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

So I was wrong about the 'f' as well as having no clue about the 'p'!
Thank you very much for clearing that up for me.
cheers,
Alex
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Alan Gauld

2015-06-03 23:57:01 UTC

Permalink

Post by Alex Kleider
...

Post by Alan Gauld
If this is really about parsing dates and times have
you looked at the datetime module and its parsing/formatting
functions (ie strptime/strftime)?

I asssume strftime gets its name from 'string from time.'
What about strptime? How did that get its name?

f = format - for creating date/time strings
p = parse - for extracting date/time fierlds from a string
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Cameron Simpson

2015-06-04 00:45:21 UTC

Permalink

Post by Alex Kleider
...

Post by Alan Gauld
If this is really about parsing dates and times have
you looked at the datetime module and its parsing/formatting
functions (ie strptime/strftime)?

I asssume strftime gets its name from 'string from time.'
What about strptime? How did that get its name?

No, they both come from the standard C library functions of the same names,
being "(f)ormat a time as a string" and "(p)arse a time from a string". The
shape of the name is because they're "str"ing related functions, hence the
prefix.

See "man 3 strptime" and "man 3 strftime".

Cheers,
Cameron Simpson <***@zip.com.au>

A program in conformance will not tend to stay in conformance, because even if
it doesn't change, the standard will. - Norman Diamond <***@jit.dec.com>
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor