fsj is a file splitter and joiner written in Python. I ported afsj (written in Java) to fsj (written in Python).
Java is a good designed language. Implementing in Java needs to write much code than Python. I love C/C++, Java, Python, Perl and Delphi. The fact that, Delphi generates execution code is quite optimized, just slower than C a little. And FreePascal execution code is slower than Delphi. And Java execution code is slower than in FreePascal (just a little).
In general, Java generates code run very fast. Web development in Java seems leading performance. If you code much Python, Ruby, you will recognize that Java is quite verbose.
Python code is shorter (quite much) but execution is slower than Java.
You should read my related article to compare: Code a file splitter n joiner in Java.1. Code a file extension generator
class ExtGenerator:
def __init__(self, start):
self.current = start
def next(self):
result = self.current
self.current += 1
return str(result).zfill(3)
Method next() returns next file extension in string. For example if the self.current is 1, next() return "002".
In other words, if we create an ExtGenerator instance by passing 0, next() returns "001". Coding a thing called ExtGenerator because we want our file joiner automatically find the next part to join by providing first part (normally first part's extension is ".001").
2. Code SequenceFileExists
This is a important class which makes our splitter and joiner nice hierarchies and design patterns.
class SequenceFileExists:
'''
fileStart should end with .00x such as .001
'''
def __init__(self, fileStart):
self.extGen = ExtGenerator(
int(
fileStart[fileStart.rindex('.')+1:]
)
)
self.pathWithoutExt = fileStart[:fileStart.rindex('.')]
self.current = None
'''
Return True if next file exists
'''
def hasNext(self):
self.current = self.pathWithoutExt + '.' + self.extGen.next()
return True if isfile(self.current) else False
'''
Use with hasNext() for checking next file exists
'''
def next(self):
return self.current if self.current <> None else self.pathWithoutExt + '.' + self.extGen.next()
ExtGenerator only return next file extension (in string) but we don't know that file exists or not. I will pass first part (for example: "file.001") to the constructor and method hasNext() checks whether next file (on this case: next file is file.002) exists or not. Use hasNext() with next(). Method next() return next part for our joiner to join.
3. Code a file joiner
def join(fileStart, fileOutput, joinMode, chunkSize = 1024*4, autoFind = True):
fw = open(fileOutput, 'wb' if joinMode <> 'append' or not isfile(fileOutput) else 'ab')
sf = SequenceFileExists(fileStart)
try:
while sf.hasNext():
try:
fr = open(sf.next(), 'rb')
while True:
chunk = fr.read(chunkSize)
if chunk:
fw.write(chunk)
else:
break
'''
If we use autoFind flag, system will find next file to append to output file
'''
if not autoFind:
break
finally:
fr.close()
finally:
fw.close()
Now writing a file joiner could not be simpler and easier. Technique used is read a chunk of bytes into a buffer and write this chunk of bytes from buffer to the destination (writing binary to file in append mode). This technique used everywhere and this is basic technique. Put these read-write into a loop. Loop ends when hasNext() return false (occurs when no next file part found).
4. Code a file splitter
def splitBySize(fileSource, fileOutputStart, partSize, chunkSize):
sf = SequenceFileExists(fileOutputStart + '.001')
if partSize < chunkSize:
chunkSize = partSize
fileSize = getsize(fileSource)
if partSize > fileSize:
partSize = fileSize
splitted = 0
fr = open(fileSource, 'rb')
try:
while splitted < fileSize:
fw = open(sf.next(), 'wb')
try:
splitted_inner = 0
while True and splitted_inner < partSize:
chunk = fr.read(chunkSize)
if chunk:
fw.write(chunk)
splitted += len(chunk)
splitted_inner += len(chunk)
else:
break
finally:
fw.close()
finally:
fr.close()
def splitByParts(fileSource, fileOutputStart, numParts, chunkSize):
splitBySize(fileSource, fileOutputStart, ceil(getsize(fileSource)/numParts), chunkSize)
We have 2 function splitBySize and splitByParts. splitBySize splits origin file to each xxx-bytes part. We have splitByParts by wrapping splitBySize.
Variable splitted tracks the total bytes read (from origin) and written to parts (output). When splitted equals fileSize, loop ends. Each output part created by open(sf.next(), 'wb') (assign file to write, so create new file). Each file path is initialized by sf.next() which sf is instance of SequenceFileExists.
No comments:
Post a Comment