Showing posts with label file joiner. Show all posts
Showing posts with label file joiner. Show all posts

Nov 1, 2016

How to code a file splitter and joiner in Python

fsj is a file splitter and joiner written in Python. I ported afsj (written in Java) to fsj (written in Python).

Java is a good designed language. Implementing in Java needs to write much code than Python. I love C/C++, Java, Python, Perl and Delphi. The fact that, Delphi generates execution code is quite optimized, just slower than C a little. And FreePascal execution code is slower than Delphi. And Java execution code is slower than in FreePascal (just a little).

In general, Java generates code run very fast. Web development in Java seems leading performance. If you code much Python, Ruby, you will recognize that Java is quite verbose.

Python code is shorter (quite much) but execution is slower than Java.

You should read my related article to compare: Code a file splitter n joiner in Java.

1. Code a file extension generator


class ExtGenerator:
 def __init__(self, start):
  self.current = start
 
 def next(self):
  result = self.current
  self.current += 1
  return str(result).zfill(3)
Method next() returns next file extension in string. For example if the self.current is 1, next() return "002". In other words, if we create an ExtGenerator instance by passing 0, next() returns "001". Coding a thing called ExtGenerator because we want our file joiner automatically find the next part to join by providing first part (normally first part's extension is ".001").

2. Code SequenceFileExists

This is a important class which makes our splitter and joiner nice hierarchies and design patterns.

class SequenceFileExists:
 '''
 fileStart should end with .00x such as .001
 '''
 def __init__(self, fileStart):
  self.extGen = ExtGenerator(
   int(
    fileStart[fileStart.rindex('.')+1:]
   )
  )
  self.pathWithoutExt = fileStart[:fileStart.rindex('.')]
  self.current = None
  
 ''' 
 Return True if next file exists
 '''
 def hasNext(self):
  self.current = self.pathWithoutExt + '.' + self.extGen.next()
  return True if isfile(self.current) else False
 
 ''' 
 Use with hasNext() for checking next file exists   
 ''' 
 def next(self):
  return self.current if self.current <> None else self.pathWithoutExt + '.' + self.extGen.next()
ExtGenerator only return next file extension (in string) but we don't know that file exists or not. I will pass first part (for example: "file.001") to the constructor and method hasNext() checks whether next file (on this case: next file is file.002) exists or not. Use hasNext() with next(). Method next() return next part for our joiner to join.

3. Code a file joiner


def join(fileStart, fileOutput, joinMode, chunkSize = 1024*4, autoFind = True): 
 fw = open(fileOutput, 'wb' if joinMode <> 'append' or not isfile(fileOutput) else 'ab')
 sf = SequenceFileExists(fileStart) 
 try:
  while sf.hasNext():
   try:
    fr = open(sf.next(), 'rb')
    while True:
     chunk = fr.read(chunkSize)
     if chunk:
      fw.write(chunk)
     else:
      break
    '''
    If we use autoFind flag, system will find next file to append to output file
    '''
    if not autoFind:
     break    
   finally:
    fr.close()
 finally:
fw.close() 
Now writing a file joiner could not be simpler and easier. Technique used is read a chunk of bytes into a buffer and write this chunk of bytes from buffer to the destination (writing binary to file in append mode). This technique used everywhere and this is basic technique. Put these read-write into a loop. Loop ends when hasNext() return false (occurs when no next file part found).

4. Code a file splitter


def splitBySize(fileSource, fileOutputStart, partSize, chunkSize):
 sf = SequenceFileExists(fileOutputStart + '.001')
 
 if partSize < chunkSize:
  chunkSize = partSize
 
 fileSize = getsize(fileSource)
 if partSize > fileSize:
  partSize = fileSize 
 
 splitted = 0
 
 fr = open(fileSource, 'rb')
 try:
  while splitted < fileSize:
   fw = open(sf.next(), 'wb')
   try:
    splitted_inner = 0
    while True and splitted_inner < partSize:
     chunk = fr.read(chunkSize)
     if chunk:
      fw.write(chunk)
      splitted += len(chunk)
      splitted_inner += len(chunk)
     else:
      break
   finally:
    fw.close()
 finally:
  fr.close()

def splitByParts(fileSource, fileOutputStart, numParts, chunkSize):
 splitBySize(fileSource, fileOutputStart, ceil(getsize(fileSource)/numParts), chunkSize) 

We have 2 function splitBySize and splitByParts. splitBySize splits origin file to each xxx-bytes part. We have splitByParts by wrapping splitBySize.

Variable splitted tracks the total bytes read (from origin) and written to parts (output). When splitted equals fileSize, loop ends. Each output part created by open(sf.next(), 'wb') (assign file to write, so create new file). Each file path is initialized by sf.next() which sf is instance of SequenceFileExists.

Oct 31, 2016

How to code a file splitter and joiner in Java

A long time ago, I coded a file splitter and joiner called Afsj. Project hosted at Sourceforge.

Today I will explain object-oriented programming concepts through afsj.

Afsj is a good source code with a nice hierarchy. If you are a beginner, Afsj is a good resource for learning object-oriented programming in Java. That project is small and medium. And you can read (understand) the source code easily.

1. Coding an extension generator. If the input is a file name such as "file.001" the generator generates next string "file.002". When user uses afsj joiner, he just provide/prompt the first file to join. And the joiner automatically find next file to join. In common case, users provide first file, for instance: a.001, our joiner find next files, a.002, a.003... automatically in same directory or user provide next file location.

We define an interface, an abstract class and some classes inherit from it. IExtensionGenerator -> AbstExtensionGenerator -> StandardExtensionGenerator.
src/main/java/khang/iwcjff/afsj/IExtensionGenerator.java:


public interface IExtensionGenerator {    
    boolean hasNext();
    String next();    
    void setPrefix(String prefx);
    String getPrefix();
}
src/main/java/khang/iwcjff/afsj/AbstExtensionGenerator.java:

public abstract class AbstExtensionGenerator implements IExtensionGenerator {
 
    protected int max;
    protected int current;
    
 @Override public boolean hasNext() {
  return current < max;
 }
}
src/main/java/khang/iwcjff/afsj/StandardExtensionGenerator.java (focus on next() method):

public class StandardExtensionGenerator extends AbstExtensionGenerator {
    
    private int totalDigits;
    private int prefixLength;

    public StandardExtensionGenerator(int nmin, int nmax) {
        if (nmin < 1 || nmin > nmax) {
   throw new IllegalArgumentException();
        }
        
        // i don't like this.min = min styles
        // set the current number        
        max = nmax;
        current = nmin - 1;
        
        // set the total digits of file name extensions
        int length = String.valueOf(nmax).length();
        totalDigits = (length > 3) ? length: 3;    
        prefixLength = totalDigits - String.valueOf(current).length();
    }
        
    @Override public String next() { 
        prefixLength = totalDigits - String.valueOf(++current).length();
        String result = ".";
        for(int i = 0; i < prefixLength; ++i) result += "0";        
        return (result + String.valueOf(current));
    }    
    
    // for compatibility
    @Override public void setPrefix(String prefx) {
  throw new UnsupportedOperationException();
 }
 
    @Override public String getPrefix() { return ""; }
}

2. Coding a file joiner. Implementing easily by applying Java built-in SequenceInputStream

src/main/java/khang/iwcjff/afsj/AjikJoiner.java:

        SequenceInputStream f = new SequenceInputStream(Collections.enumeration(fisCollection));         
        // create new file ouput on disk
        fileDest = new FileOutputStream(fileOutput);
        byte[] inBytes = new byte[chunk];        
        int byteRead;
        while((byteRead = f.read(inBytes)) != -1) {
            bytesRead += byteRead;
            fileDest.write(inBytes, 0, byteRead);
        }      
        fileDest.flush();
        f.close();
Important techniques: Each time we read a chunk of bytes into a buffer and write those bytes from this buffer into file destination (aka: write file in append mode).

3. Coding a file joiner. Firstly we need to know the number of parts will be splitted:

src/main/java/khang/iwcjff/afsj/AjikSplitter.java:

        // get the total size of source
        totalSize = fileInput.length();
        partSize = psize;
        totalParts = (long)Math.ceil((double)totalSize / partSize);