Sometime back a question was asked to develop a well performant parser– there was no restriction defined in the question whatsoever as to what technology, logic, flow/etc should be applied. Just the input block, and expected output result format, and maximum time that the parser may take.
This may answer questions:
- How to query a .CSV and save the result in another CSV?
- Custom .CSV Parser?
- How to ETL .CSV into a .CSV
- High performance .CSV parser
Calculate the usage of different country and dial codes for a particular customer, and write result in a separate file.
- Read from CSV data source
- Query for the data (select, group by, sum, count, etc)
- Write into a separate .CSV file
Data source contains the ~1.2 million records, and the module is required to complete the whole procedure in less than 5 seconds.
We have a CustomerData.CSV file already exists, to which we will query.
CustomerData.CSV has the following schema:
|Field 1||This field contains the customer id (sorted in ascending order)|
|Field 2||Country code|
|Field 3||Dial Code|
|Field 4||Start time|
|Field 5||Call duration|
Result.CSV has the following schema:
|Field 1||Country code|
|Field 2||Dial Code|
|Field 3||Total duration (in minutes and seconds)|
SOLUTION 1: (Use Jet OLE DB Text Driver)
Easiest, quickest, fastest, and very well performant!
Step 1: Define the following schema.ini file in some folder that you like:
If you would like, then look into the Schema.ini File (Text File Driver). Following content goes in schema.ini:
STEP 2: Stub in the backend code in some .cs file
SOLUTION 2: (Write a custom class, create indexes, and apply bisection search)
For instance, following code performs following operations to achieve the same:
- Perform indexing on the selected column
- Select specific records from the .CSV file
- Perform aggregate function, call SUM() – Use LINQ
Btw, far more interesting code would have been the following implementation:
parser .Select(Col1, Col2, Col3) .Where(Col1,"AMANTEL") .Sum(Col3);
Let me know if you can help me with that (0: Note that, I have not applied bisection search, yet; but the method is there.
SOLUTION 3: (Use TextFieldParser class)
Checkout this solution, but its a VB turned into C# solution. Let me know if you enjoy?
SOLUTION 4: (Using LINQ to CSV)
Here is how.
SOLUTION 5: (Use XmlCSVReader, convert CSV to XML and use XPath to query the data)
How so? Here is the method.
SOLUTION 6: (Load .CSV in a database and use SELECT query to get result)
See Importing CSV Data and saving it in database; I hope you get the idea; ping me if you did not. Another, just as interesting A Fast CSV Reader is also there; this is interesting because it has benchmarks.
SOLUTION 7 (Bonus): (Use Text Driver with DSN)
Just to retrieve data, quick and easy.