05 December 2008

Merge two sorted integer arrays

My current programming task is to merge two sorted arrays of integers into one array while removing duplicate integers. There are no duplicates in the source arrays A and B. The source arrays can be altered. The integers in each array may tend to be in blocks which don't overlap, but this isn't guaranteed. The arrays may currently have up to 100,000 integers but will often have less than this. The output array should be truncated to 100,000 elements. I'm using C# .NET but that's irrelevant really - it's just the algorithm that's important.

I came up with several possible solutions, and am currently trying the last. What would you do?

1.
Walk through A and B simultaneously. If an integer in B exists in A then ignore it; if it doesn't exist then add it to the end of A.
At the end of this process, call Array.Sort on A (ie a QuickSort).
(Note that inserting a B element into A would surely take too long as it would involve moving all other A entries up.)
Quite efficient but I am concerned at speed of final sort.

2.
Create a new linked list with all the A entries.
Walk through the linked list and B simultaneously; for integers in B that need inserting, alter the linked list to add a new entry in the correct place.
At end, convert linked list back into an array.
Not very efficient creating a linked list and converting back from linked list.

3.
Walk through A and B simultaneously. Create a list structure where each element represents a block of integers in A or B, eg use integers 0-4 from A, integers 3-15 from B, integers 5-10 from A, etc.
At the end, create a new array from the list structure.
This is a more complicated version of my final simpler solution.

4.
Not quite sure if I thought this through properly.
Walk through A and B simultaneously. If the current integer in B is less than the current integer in A, then swap the two. This sounds simple but in fact the integer swapped into B will now possibly be out of order in B, so the element will need bubbling up until it finds its correct place.
At the end, simply add the remaining B elements to A.
Superficially attractive but more complicated than it looks.

5.
Walk through A and B simultaneously creating a new table C as you go.
Simple

Unless you can suggest otherwise...

3 comments:

Martin Ankerl said...

What do you want to improve? performance, memory requirement, or simplicity? In Ruby I would do it like this:

ary = (a+b).sort.uniq[0...100_000]

Thats as simple as possible and most likely fast enough. When you go for performance use a Merge sort:
http://en.wikipedia.org/wiki/Merge_sort

When memory is an issue, see the paper "Practical in place merge sort" at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.8523

but it is complicated and slow.


What you w describing is Merge sort

http://en.wikipedia.org/wiki/Merge_sort#cite_note-1

Jason said...

this is what i did jasonseagul.blogspot.com

sathish said...

I think we can try another way too:
Take an array which has smallest length. Iterate through it one by one. Insert it to the other array. Since the other array is sorted, we can split it by 2 (till the end), then compare the last element.