fsj is a file splitter and joiner written in Python. I ported afsj (written in Java) to fsj (written in Python).
Java is a good designed language. Implementing in Java needs to write much code than Python. I love C/C++, Java, Python, Perl and Delphi. The fact that, Delphi generates execution code is quite optimized, just slower than C a little. And FreePascal execution code is slower than Delphi. And Java execution code is slower than in FreePascal (just a little).
In general, Java generates code run very fast. Web development in Java seems leading performance. If you code much Python, Ruby, you will recognize that Java is quite verbose.
Python code is shorter (quite much) but execution is slower than Java.
You should read my related article to compare:
Code a file splitter n joiner in Java.
1. Code a file extension generator
class ExtGenerator:
def __init__(self, start):
self.current = start
def next(self):
result = self.current
self.current += 1
return str(result).zfill(3)
Method next() returns next file extension in string. For example if the
self.current is 1, next() return "
002".
In other words, if we create an ExtGenerator instance by passing 0, next() returns "
001". Coding a thing called ExtGenerator because we want our file joiner automatically find the next part to join by providing first part (normally
first part's extension is "
.001").
2. Code SequenceFileExists
This is a important class which makes our splitter and joiner nice hierarchies and design patterns.
class SequenceFileExists:
'''
fileStart should end with .00x such as .001
'''
def __init__(self, fileStart):
self.extGen = ExtGenerator(
int(
fileStart[fileStart.rindex('.')+1:]
)
)
self.pathWithoutExt = fileStart[:fileStart.rindex('.')]
self.current = None
'''
Return True if next file exists
'''
def hasNext(self):
self.current = self.pathWithoutExt + '.' + self.extGen.next()
return True if isfile(self.current) else False
'''
Use with hasNext() for checking next file exists
'''
def next(self):
return self.current if self.current <> None else self.pathWithoutExt + '.' + self.extGen.next()
ExtGenerator only return next file extension (in string) but we don't know that file exists or not. I will pass first part (for example: "
file.001") to the constructor and method
hasNext() checks whether
next file (on this case: next file is
file.002) exists or not. Use
hasNext() with
next(). Method
next() return next part for our joiner to join.
3. Code a file joiner
def join(fileStart, fileOutput, joinMode, chunkSize = 1024*4, autoFind = True):
fw = open(fileOutput, 'wb' if joinMode <> 'append' or not isfile(fileOutput) else 'ab')
sf = SequenceFileExists(fileStart)
try:
while sf.hasNext():
try:
fr = open(sf.next(), 'rb')
while True:
chunk = fr.read(chunkSize)
if chunk:
fw.write(chunk)
else:
break
'''
If we use autoFind flag, system will find next file to append to output file
'''
if not autoFind:
break
finally:
fr.close()
finally:
fw.close()
Now writing a file joiner could not be simpler and easier. Technique used is
read a chunk of bytes into a buffer and
write this chunk of bytes from buffer to the destination (writing binary to file in
append mode). This technique used everywhere and this is basic technique. Put these read-write into a loop. Loop ends when hasNext() return false (occurs when no next file part found).
4. Code a file splitter
def splitBySize(fileSource, fileOutputStart, partSize, chunkSize):
sf = SequenceFileExists(fileOutputStart + '.001')
if partSize < chunkSize:
chunkSize = partSize
fileSize = getsize(fileSource)
if partSize > fileSize:
partSize = fileSize
splitted = 0
fr = open(fileSource, 'rb')
try:
while splitted < fileSize:
fw = open(sf.next(), 'wb')
try:
splitted_inner = 0
while True and splitted_inner < partSize:
chunk = fr.read(chunkSize)
if chunk:
fw.write(chunk)
splitted += len(chunk)
splitted_inner += len(chunk)
else:
break
finally:
fw.close()
finally:
fr.close()
def splitByParts(fileSource, fileOutputStart, numParts, chunkSize):
splitBySize(fileSource, fileOutputStart, ceil(getsize(fileSource)/numParts), chunkSize)
We have 2 function splitBySize and splitByParts. splitBySize splits origin file to each xxx-bytes part. We have splitByParts by wrapping splitBySize.
Variable
splitted tracks the
total bytes read (from origin) and written to parts (output). When
splitted equals fileSize, loop ends. Each output part created by
open(sf.next(), 'wb') (assign file to write, so create new file). Each file path is initialized by sf.next() which sf is instance of SequenceFileExists.